fenn.agents.rag¶

class fenn.agents.rag.RAG(model_provider=None, model=None, model_api_key=None, base_url=None, faiss=False, embedding_provider='local', embedding_model='all-MiniLM-L6-v2', embedding_api_key=None, chunk_mode='smart', persist_path=None, memory=False, max_history=None, system_prompt=None)[source]¶

Bases: object

Main RAG (Retrieval-Augmented Generation) class.

Loads documents, indexes them, retrieves relevant chunks for a query, and sends them to an LLM to generate an answer.

Parameters:

model_provider (str, optional) – LLM provider name. One of: openrouter, openai, anthropic, gemini, mistral, groq, cohere, deepseek, xai, together, perplexity, fireworks, cerebras, nvidia, deepinfra, anyscale, ollama, lmstudio, llamacpp. Auto-detected from model name when possible. Default: “openrouter”.
model (str, optional) – Model identifier for the chosen provider. Default: provider’s default model (e.g. “arcee-ai/trinity-large-preview:free” for openrouter, “gpt-4o-mini” for openai).
model_api_key (str, optional) – API key for the LLM provider. If omitted, reads from the corresponding environment variable (e.g. OPENROUTER_API_KEY). Not required for local providers (ollama, lmstudio, llamacpp).
base_url (str, optional) – Custom base URL for any OpenAI-compatible API endpoint. Overrides the default URL for the detected provider.
faiss (bool, optional) – If True, uses FAISS semantic vector search instead of BM25 keyword search. Requires pip install “cofone[faiss]”. Default: False.
embedding_provider (str, optional) – Provider for text embeddings (used only when faiss=True). One of: local, openai, gemini, cohere, mistral, voyage, jina, nvidia, together, openrouter, ollama. Default: “local” (sentence-transformers, no API key needed).
embedding_model (str, optional) – Embedding model identifier for the chosen embedding provider. Default: “all-MiniLM-L6-v2” (local sentence-transformers).
embedding_api_key (str, optional) – API key for the embedding provider. If omitted, reads from the corresponding environment variable. Not required for local/ollama.
chunk_mode (str, optional) – How to split documents before indexing. One of: “smart” (default), “paragraphs”, “sentences”, “fixed”.
persist_path (str or Path, optional) – Folder path to save/load the FAISS index to/from disk. Avoids recomputing embeddings on subsequent runs. Only used when faiss=True.
system_prompt (str, optional) – Instructions prepended to every prompt as a system message. Tells the LLM how to behave: role, tone, language, format, etc. If None, a sensible default is used. Default: None (uses built-in default prompt).
memory (bool, optional) – If True, keeps conversation history across .run() calls. Default: False.

Examples

Minimal (reads OPENROUTER_API_KEY from .env):

>>> from dotenv import load_dotenv
>>> from cofone import RAG
>>> load_dotenv()
>>> answer = RAG().add_source("docs/").run("What is this about?")

Explicit provider:

>>> RAG(model_provider="openai", model="gpt-4o-mini",
...     model_api_key="sk-...").add_source("notes.txt").run("Summarize")

Fully local (no internet, no keys):

>>> RAG(model_provider="ollama", model="llama3",
...     faiss=True,
...     embedding_provider="ollama",
...     embedding_model="nomic-embed-text").add_source("docs/").run("question")

__init__(model_provider=None, model=None, model_api_key=None, base_url=None, faiss=False, embedding_provider='local', embedding_model='all-MiniLM-L6-v2', embedding_api_key=None, chunk_mode='smart', persist_path=None, memory=False, max_history=None, system_prompt=None)[source]¶

add_source(source)[source]¶

Load and index a document source.

Accepts: file path (.txt, .md, .pdf), folder path (recursive), web URL, Wikipedia URL, or YouTube URL.

Can be chained: RAG().add_source(“a.txt”).add_source(“b.txt”).run(…)

Parameters:: source (str or Path) – Path to a file/folder, or a URL.
Returns:: self – Returns self for fluent chaining.
Return type:: RAG

add_tool(fn)[source]¶

Attach a custom Python function as a tool.

The function’s name and docstring are passed to the LLM as additional context alongside the retrieved chunks.

Parameters:: fn (callable) – A Python function. Should have a docstring describing what it does.
Returns:: self
Return type:: RAG

chat(query)[source]¶

Run a query with memory enabled.

Equivalent to calling .run(query) with memory=True. Conversation history is preserved across calls on the same instance.

Parameters:: query (str) – The question or instruction.
Returns:: The LLM’s answer.
Return type:: str

debug()[source]¶

Enable verbose logging.

Prints: provider, model, number of loaded docs, number of retrieved chunks, and an 80-character preview of each retrieved chunk.

Returns:: self
Return type:: RAG

reset_memory()[source]¶

Clear the conversation history.

After calling this, the next .chat() or .run() call will have no knowledge of previous exchanges.

Returns:: self
Return type:: RAG

run(query, schema=None)[source]¶

Run a single query against the indexed documents.

Retrieves the most relevant chunks, builds a prompt, and calls the LLM. Stateless by default (no memory between calls).

Parameters:

query (str) – The question or instruction to send to the LLM.
schema (pydantic.BaseModel, optional) – If provided, the LLM is instructed to return a JSON object matching this schema. Returns a validated Pydantic model instance instead of a string.

Returns:

The LLM’s answer, either as a string or a validated schema instance.

Return type:

str or pydantic.BaseModel

stream(query)[source]¶

Run a query and stream the response token by token.

Returns a generator that yields string tokens as they arrive from the LLM. No waiting for the full response.

Parameters:: query (str) – The question or instruction.
Yields:: str – Individual tokens from the LLM response.

Example

for token in rag.stream(“Explain this document”):: print(token, end=””, flush=True)

print()

class fenn.agents.rag.RAGNode(sources=None, query_key='query', context_key='rag_context', chunks_key='rag_chunks', top_k=5, next_action='default', faiss=False, embedding_provider='local', embedding_model='all-MiniLM-L6-v2', embedding_api_key=None, chunk_mode='smart', persist_path=None)[source]¶

Bases: Node

Flow node that retrieves relevant context from indexed sources.

Loads and indexes all sources once at construction time, then per run queries the index using shared[query_key] and writes the results into shared[chunks_key] and shared[context_key].

Parameters:

sources (str or list of str, optional) – File paths, folder paths, or URLs to load and index on init. Additional sources can be indexed later with add_source().
query_key (str) – Key in shared that holds the user query. Default: "query".
context_key (str) – Key written into shared with the concatenated chunk text. Default: "rag_context".
chunks_key (str) – Key written into shared with the raw list of chunks. Default: "rag_chunks".
top_k (int) – Maximum number of chunks to retrieve. Default: 5.
next_action (str) – Action string returned by post(), used by Flow.get_next_node(). Default: "default".
faiss (bool) – Use FAISS semantic search instead of BM25. Default: False.
embedding_provider (str) – Embedding provider (only used when faiss=True). Default: "local".
embedding_model (str) – Embedding model (only used when faiss=True). Default: "all-MiniLM-L6-v2".
embedding_api_key (str, optional) – API key for the embedding provider.
chunk_mode (str) – Document chunking strategy. One of "smart", "paragraphs", "sentences", "fixed". Default: "smart".
persist_path (str or Path, optional) – Directory to save/load the FAISS index. Only used when faiss=True.

__init__(sources=None, query_key='query', context_key='rag_context', chunks_key='rag_chunks', top_k=5, next_action='default', faiss=False, embedding_provider='local', embedding_model='all-MiniLM-L6-v2', embedding_api_key=None, chunk_mode='smart', persist_path=None)[source]¶

add_source(source)[source]¶: Index an additional source. Returns self for chaining.

exec(query)[source]¶

post(shared, query, chunks)[source]¶

prep(shared)[source]¶