Docs
Content Sources

Content Sources

Tatry provides access to a wide range of content sources, from free public resources to premium copyrighted content. This guide explains how these sources are managed and used.

Available Sources

Free Sources

  • Wikipedia
  • Open access academic papers
  • Public domain books
  • Government documents
  • Open documentation

Premium Sources

  • Academic journals
  • Professional publications
  • Industry reports
  • News archives
  • Specialized databases

Source Management

Listing Available Sources

from tatry import TatryRetriever
 
retriever = TatryRetriever(api_key="your-api-key")
 
# List all available sources
sources = retriever.list_sources()
 
# List only free sources
free_sources = retriever.list_sources(type="free")
 
# List only premium sources
premium_sources = retriever.list_sources(type="premium")
# List all sources
curl -X GET https://api.tatry.dev/v1/sources \
  -H "Authorization: Bearer your-api-key"
 
# List free sources
curl -X GET https://api.tatry.dev/v1/sources?type=free \
  -H "Authorization: Bearer your-api-key"
 
# List premium sources
curl -X GET https://api.tatry.dev/v1/sources?type=premium \
  -H "Authorization: Bearer your-api-key"

Source Selection

# Use specific sources
retriever = ContentRetriever(
    api_key="your-api-key",
    allowed_sources=["wikipedia", "academic_journals"]
)
 
# Exclude sources
retriever = ContentRetriever(
    api_key="your-api-key",
    excluded_sources=["news_archives"]
)

Source Priority

ContentRetriever uses an intelligent source selection system:

  1. Relevance Based

    • Matches query intent to source type
    • Evaluates source quality
    • Considers content freshness
  2. Cost Optimization

    • Prefers free sources when appropriate
    • Balances cost vs. quality
    • Respects budget constraints
  3. User Preferences

    • Honors source restrictions
    • Applies source weights
    • Follows content filters

Source Metrics

Each source provides metadata:

# Get source details
source_info = retriever.get_source_details("academic_journals")
print(source_info)
{
    "name": "academic_journals",
    "type": "premium",
    "cost_per_query": 0.05,
    "update_frequency": "daily",
    "coverage_topics": ["science", "technology", "medicine"],
    "quality_score": 0.95
}

Content Quality

Quality Assurance

  1. Source Verification

    • Validated content providers
    • Regular quality audits
    • Content freshness monitoring
  2. Content Standards

    • Accuracy requirements
    • Completeness checks
    • Attribution verification

Quality Metrics

# Get quality metrics for a source
metrics = retriever.get_source_metrics("academic_journals")
print(metrics)
{
    "accuracy_score": 0.98,
    "completeness_score": 0.95,
    "update_frequency": "daily",
    "peer_reviewed": True
}

Source Configuration

Global Settings

retriever = ContentRetriever(
    api_key="your-api-key",
    source_settings={
        "prefer_free": True,
        "min_quality_score": 0.8,
        "max_age_days": 365
    }
)

Per-Query Settings

docs = retriever.get_relevant_documents(
    "quantum computing advances",
    source_filters={
        "type": ["academic", "research"],
        "peer_reviewed": True,
        "max_age_days": 90
    }
)

Usage Optimization

Cost Management

# Set source weights for cost optimization
retriever.set_source_weights({
    "wikipedia": 1.0,  # Free
    "academic_journals": 0.5,  # Use less due to cost
    "news_archives": 0.3  # Use sparingly
})

Quality vs Cost

# Optimize for quality
retriever.configure_source_priority(
    priority="quality",
    budget_limit=10.0
)
 
# Optimize for cost
retriever.configure_source_priority(
    priority="cost",
    min_quality_score=0.7
)

Best Practices

  1. Source Selection

    • Start with free sources
    • Add premium sources as needed
    • Monitor source usage
  2. Quality Control

    • Set minimum quality thresholds
    • Regular source evaluation
    • Track content freshness
  3. Cost Management

    • Use source weights
    • Set budget limits
    • Monitor usage patterns

Next Steps