Understanding Retrievers

Docs

Tatry is designed to be a simple yet powerful document retrieval system that works seamlessly with LangChain applications.

What is a Retriever?

A retriever is a system that finds and returns relevant documents based on a query. In the context of LLM applications, retrievers are crucial for:

Finding relevant context for questions
Supporting knowledge-intensive tasks
Providing up-to-date information

Tatry Architecture

graph LR
    Query --> TatryRetriever
    TatryRetriever --> Sources[Content Sources]
    Sources --> Documents[Relevant Documents]
    QueryProcessor[Query Processing] --> TatryRetriever
    SourceSelector[Source Selection] --> Sources

Key Components

Query Processor
- Analyzes the query intent
- Determines required content types
- Sets search parameters
Source Selector
- Evaluates available sources
- Prioritizes based on relevance
- Manages source access
Document Ranker
- Scores document relevance
- Filters irrelevant content
- Ensures quality results

Integration Options

from langchain.chains import RetrievalQA
from tatry import TatryRetriever
 
# Initialize retriever
retriever = TatryRetriever(api_key="your-api-key")
 
# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=your_llm,
    retriever=retriever
)
 
# Use in your application
result = qa_chain.run("Your question here")

# Initialize a retrieval request
curl -X POST https://api.tatry.dev/v1/retrieve \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Your question here",
    "max_results": 5
  }'

How Retrieval Works

Query Reception

documents = retriever.get_relevant_documents("Your query here")

Source Selection
- System evaluates available sources
- Prioritizes based on relevance and cost
- Selects optimal mix of sources
Document Retrieval
- Fetches documents from selected sources
- Ranks by relevance
- Returns best matches
Result Delivery
- Returns relevant documents
- Includes source metadata
- Tracks usage for billing

Retrieval Methods

Synchronous Retrieval

# Basic retrieval
docs = retriever.get_relevant_documents(query)
 
# With filters
docs = retriever.get_relevant_documents(
    query,
    max_results=5,
    min_relevance=0.7
)

Asynchronous Retrieval

# Async retrieval
docs = await retriever.aget_relevant_documents(query)

Best Practices

Query Formulation
- Be specific in your queries
- Include relevant context
- Use appropriate filters
Result Handling
- Check document relevance scores
- Verify source attribution
- Monitor usage costs
Performance Optimization
- Use async methods for batch processing
- Implement caching when appropriate
- Set appropriate result limits

Common Use Cases

Question Answering

chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever
)

Document Analysis

docs = retriever.get_relevant_documents(
    "Topic of interest",
    filters={"type": "research_paper"}
)

Information Gathering

docs = retriever.get_relevant_documents(
    query,
    max_results=10,
    prefer_recent=True
)

Next Steps

Learn about Content Sources
Understand our Billing Model
Try our Guides for detailed tutorials

Content Sources Billing Model