Docs
Understanding Retrievers
Understanding Retrievers
Tatry is designed to be a simple yet powerful document retrieval system that works seamlessly with LangChain applications.
What is a Retriever?
A retriever is a system that finds and returns relevant documents based on a query. In the context of LLM applications, retrievers are crucial for:
- Finding relevant context for questions
- Supporting knowledge-intensive tasks
- Providing up-to-date information
Tatry Architecture
graph LR
Query --> TatryRetriever
TatryRetriever --> Sources[Content Sources]
Sources --> Documents[Relevant Documents]
QueryProcessor[Query Processing] --> TatryRetriever
SourceSelector[Source Selection] --> Sources
Key Components
-
Query Processor
- Analyzes the query intent
- Determines required content types
- Sets search parameters
-
Source Selector
- Evaluates available sources
- Prioritizes based on relevance
- Manages source access
-
Document Ranker
- Scores document relevance
- Filters irrelevant content
- Ensures quality results
Integration Options
from langchain.chains import RetrievalQA
from tatry import TatryRetriever
# Initialize retriever
retriever = TatryRetriever(api_key="your-api-key")
# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
llm=your_llm,
retriever=retriever
)
# Use in your application
result = qa_chain.run("Your question here")
# Initialize a retrieval request
curl -X POST https://api.tatry.dev/v1/retrieve \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"query": "Your question here",
"max_results": 5
}'
How Retrieval Works
-
Query Reception
documents = retriever.get_relevant_documents("Your query here")
-
Source Selection
- System evaluates available sources
- Prioritizes based on relevance and cost
- Selects optimal mix of sources
-
Document Retrieval
- Fetches documents from selected sources
- Ranks by relevance
- Returns best matches
-
Result Delivery
- Returns relevant documents
- Includes source metadata
- Tracks usage for billing
Retrieval Methods
Synchronous Retrieval
# Basic retrieval
docs = retriever.get_relevant_documents(query)
# With filters
docs = retriever.get_relevant_documents(
query,
max_results=5,
min_relevance=0.7
)
Asynchronous Retrieval
# Async retrieval
docs = await retriever.aget_relevant_documents(query)
Best Practices
-
Query Formulation
- Be specific in your queries
- Include relevant context
- Use appropriate filters
-
Result Handling
- Check document relevance scores
- Verify source attribution
- Monitor usage costs
-
Performance Optimization
- Use async methods for batch processing
- Implement caching when appropriate
- Set appropriate result limits
Common Use Cases
-
Question Answering
chain = RetrievalQA.from_chain_type( llm=llm, retriever=retriever )
-
Document Analysis
docs = retriever.get_relevant_documents( "Topic of interest", filters={"type": "research_paper"} )
-
Information Gathering
docs = retriever.get_relevant_documents( query, max_results=10, prefer_recent=True )
Next Steps
- Learn about Content Sources
- Understand our Billing Model
- Try our Guides for detailed tutorials