Docs
Understanding Retrievers

Understanding Retrievers

Tatry is designed to be a simple yet powerful document retrieval system that works seamlessly with LangChain applications.

What is a Retriever?

A retriever is a system that finds and returns relevant documents based on a query. In the context of LLM applications, retrievers are crucial for:

  1. Finding relevant context for questions
  2. Supporting knowledge-intensive tasks
  3. Providing up-to-date information

Tatry Architecture

graph LR
    Query --> TatryRetriever
    TatryRetriever --> Sources[Content Sources]
    Sources --> Documents[Relevant Documents]
    QueryProcessor[Query Processing] --> TatryRetriever
    SourceSelector[Source Selection] --> Sources

Key Components

  1. Query Processor

    • Analyzes the query intent
    • Determines required content types
    • Sets search parameters
  2. Source Selector

    • Evaluates available sources
    • Prioritizes based on relevance
    • Manages source access
  3. Document Ranker

    • Scores document relevance
    • Filters irrelevant content
    • Ensures quality results

Integration Options

from langchain.chains import RetrievalQA
from tatry import TatryRetriever
 
# Initialize retriever
retriever = TatryRetriever(api_key="your-api-key")
 
# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=your_llm,
    retriever=retriever
)
 
# Use in your application
result = qa_chain.run("Your question here")
# Initialize a retrieval request
curl -X POST https://api.tatry.dev/v1/retrieve \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Your question here",
    "max_results": 5
  }'

How Retrieval Works

  1. Query Reception

    documents = retriever.get_relevant_documents("Your query here")
  2. Source Selection

    • System evaluates available sources
    • Prioritizes based on relevance and cost
    • Selects optimal mix of sources
  3. Document Retrieval

    • Fetches documents from selected sources
    • Ranks by relevance
    • Returns best matches
  4. Result Delivery

    • Returns relevant documents
    • Includes source metadata
    • Tracks usage for billing

Retrieval Methods

Synchronous Retrieval

# Basic retrieval
docs = retriever.get_relevant_documents(query)
 
# With filters
docs = retriever.get_relevant_documents(
    query,
    max_results=5,
    min_relevance=0.7
)

Asynchronous Retrieval

# Async retrieval
docs = await retriever.aget_relevant_documents(query)

Best Practices

  1. Query Formulation

    • Be specific in your queries
    • Include relevant context
    • Use appropriate filters
  2. Result Handling

    • Check document relevance scores
    • Verify source attribution
    • Monitor usage costs
  3. Performance Optimization

    • Use async methods for batch processing
    • Implement caching when appropriate
    • Set appropriate result limits

Common Use Cases

  1. Question Answering

    chain = RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever
    )
  2. Document Analysis

    docs = retriever.get_relevant_documents(
        "Topic of interest",
        filters={"type": "research_paper"}
    )
  3. Information Gathering

    docs = retriever.get_relevant_documents(
        query,
        max_results=10,
        prefer_recent=True
    )

Next Steps