fenn.agents.rag¶
- class fenn.agents.rag.RAG(model_provider=None, model=None, model_api_key=None, base_url=None, faiss=False, embedding_provider='local', embedding_model='all-MiniLM-L6-v2', embedding_api_key=None, chunk_mode='smart', persist_path=None, memory=False, max_history=None, system_prompt=None)[source]¶
Bases:
objectMain RAG (Retrieval-Augmented Generation) class.
Loads documents, indexes them, retrieves relevant chunks for a query, and sends them to an LLM to generate an answer.
- Parameters:
model_provider (str, optional) – LLM provider name. One of: openrouter, openai, anthropic, gemini, mistral, groq, cohere, deepseek, xai, together, perplexity, fireworks, cerebras, nvidia, deepinfra, anyscale, ollama, lmstudio, llamacpp. Auto-detected from model name when possible. Default: “openrouter”.
model (str, optional) – Model identifier for the chosen provider. Default: provider’s default model (e.g. “arcee-ai/trinity-large-preview:free” for openrouter, “gpt-4o-mini” for openai).
model_api_key (str, optional) – API key for the LLM provider. If omitted, reads from the corresponding environment variable (e.g. OPENROUTER_API_KEY). Not required for local providers (ollama, lmstudio, llamacpp).
base_url (str, optional) – Custom base URL for any OpenAI-compatible API endpoint. Overrides the default URL for the detected provider.
faiss (bool, optional) – If True, uses FAISS semantic vector search instead of BM25 keyword search. Requires pip install “cofone[faiss]”. Default: False.
embedding_provider (str, optional) – Provider for text embeddings (used only when faiss=True). One of: local, openai, gemini, cohere, mistral, voyage, jina, nvidia, together, openrouter, ollama. Default: “local” (sentence-transformers, no API key needed).
embedding_model (str, optional) – Embedding model identifier for the chosen embedding provider. Default: “all-MiniLM-L6-v2” (local sentence-transformers).
embedding_api_key (str, optional) – API key for the embedding provider. If omitted, reads from the corresponding environment variable. Not required for local/ollama.
chunk_mode (str, optional) – How to split documents before indexing. One of: “smart” (default), “paragraphs”, “sentences”, “fixed”.
persist_path (str or Path, optional) – Folder path to save/load the FAISS index to/from disk. Avoids recomputing embeddings on subsequent runs. Only used when faiss=True.
system_prompt (str, optional) – Instructions prepended to every prompt as a system message. Tells the LLM how to behave: role, tone, language, format, etc. If None, a sensible default is used. Default: None (uses built-in default prompt).
memory (bool, optional) – If True, keeps conversation history across .run() calls. Default: False.
Examples
- Minimal (reads OPENROUTER_API_KEY from .env):
>>> from dotenv import load_dotenv >>> from cofone import RAG >>> load_dotenv() >>> answer = RAG().add_source("docs/").run("What is this about?")
- Explicit provider:
>>> RAG(model_provider="openai", model="gpt-4o-mini", ... model_api_key="sk-...").add_source("notes.txt").run("Summarize")
- Fully local (no internet, no keys):
>>> RAG(model_provider="ollama", model="llama3", ... faiss=True, ... embedding_provider="ollama", ... embedding_model="nomic-embed-text").add_source("docs/").run("question")
- __init__(model_provider=None, model=None, model_api_key=None, base_url=None, faiss=False, embedding_provider='local', embedding_model='all-MiniLM-L6-v2', embedding_api_key=None, chunk_mode='smart', persist_path=None, memory=False, max_history=None, system_prompt=None)[source]¶
- add_source(source)[source]¶
Load and index a document source.
Accepts: file path (.txt, .md, .pdf), folder path (recursive), web URL, Wikipedia URL, or YouTube URL.
Can be chained: RAG().add_source(“a.txt”).add_source(“b.txt”).run(…)
- Parameters:
source (str or Path) – Path to a file/folder, or a URL.
- Returns:
self – Returns self for fluent chaining.
- Return type:
- add_tool(fn)[source]¶
Attach a custom Python function as a tool.
The function’s name and docstring are passed to the LLM as additional context alongside the retrieved chunks.
- Parameters:
fn (callable) – A Python function. Should have a docstring describing what it does.
- Returns:
self
- Return type:
- chat(query)[source]¶
Run a query with memory enabled.
Equivalent to calling .run(query) with memory=True. Conversation history is preserved across calls on the same instance.
- Parameters:
query (str) – The question or instruction.
- Returns:
The LLM’s answer.
- Return type:
str
- debug()[source]¶
Enable verbose logging.
Prints: provider, model, number of loaded docs, number of retrieved chunks, and an 80-character preview of each retrieved chunk.
- Returns:
self
- Return type:
- reset_memory()[source]¶
Clear the conversation history.
After calling this, the next .chat() or .run() call will have no knowledge of previous exchanges.
- Returns:
self
- Return type:
- run(query, schema=None)[source]¶
Run a single query against the indexed documents.
Retrieves the most relevant chunks, builds a prompt, and calls the LLM. Stateless by default (no memory between calls).
- Parameters:
query (str) – The question or instruction to send to the LLM.
schema (pydantic.BaseModel, optional) – If provided, the LLM is instructed to return a JSON object matching this schema. Returns a validated Pydantic model instance instead of a string.
- Returns:
The LLM’s answer, either as a string or a validated schema instance.
- Return type:
str or pydantic.BaseModel
- stream(query)[source]¶
Run a query and stream the response token by token.
Returns a generator that yields string tokens as they arrive from the LLM. No waiting for the full response.
- Parameters:
query (str) – The question or instruction.
- Yields:
str – Individual tokens from the LLM response.
Example
- for token in rag.stream(“Explain this document”):
print(token, end=””, flush=True)
print()
- class fenn.agents.rag.RAGNode(sources=None, query_key='query', context_key='rag_context', chunks_key='rag_chunks', top_k=5, next_action='default', faiss=False, embedding_provider='local', embedding_model='all-MiniLM-L6-v2', embedding_api_key=None, chunk_mode='smart', persist_path=None)[source]¶
Bases:
NodeFlow node that retrieves relevant context from indexed sources.
Loads and indexes all sources once at construction time, then per run queries the index using
shared[query_key]and writes the results intoshared[chunks_key]andshared[context_key].- Parameters:
sources (str or list of str, optional) – File paths, folder paths, or URLs to load and index on init. Additional sources can be indexed later with
add_source().query_key (str) – Key in
sharedthat holds the user query. Default:"query".context_key (str) – Key written into
sharedwith the concatenated chunk text. Default:"rag_context".chunks_key (str) – Key written into
sharedwith the raw list of chunks. Default:"rag_chunks".top_k (int) – Maximum number of chunks to retrieve. Default: 5.
next_action (str) – Action string returned by
post(), used byFlow.get_next_node(). Default:"default".faiss (bool) – Use FAISS semantic search instead of BM25. Default: False.
embedding_provider (str) – Embedding provider (only used when faiss=True). Default:
"local".embedding_model (str) – Embedding model (only used when faiss=True). Default:
"all-MiniLM-L6-v2".embedding_api_key (str, optional) – API key for the embedding provider.
chunk_mode (str) – Document chunking strategy. One of
"smart","paragraphs","sentences","fixed". Default:"smart".persist_path (str or Path, optional) – Directory to save/load the FAISS index. Only used when faiss=True.