Back to Home

💬 SUJBOT2 User Search Pipeline

Phase 7: Od user query k AI odpovědi (Claude SDK Agent + 14 Tools)

User Input (CLI)

0ms

Úkol: Uživatel zadá dotaz v interaktivní CLI (REPL) nebo použije speciální commands.
Příklad query: "Jaká je sankce za porušení GDPR?"

# Spuštění agenta (s automatickým dependency check) $ ./run_cli.sh # Nebo s vlastním vector store: $ ./run_cli.sh output/my_doc/vector_store # Debug mode: $ ./run_cli.sh --debug # Interactive REPL ╭─────────────────────────────────────────────────────────────╮ │ RAG Agent - Document Assistant │ │ Type your question or use /help for commands │ ╰─────────────────────────────────────────────────────────────╯ SUJBOT2> Jaká je sankce za porušení GDPR? # Agent začíná zpracování...

🔧 CLI Features:

✅ Streaming output (real-time odpověď)
✅ Prompt caching (90% cost reduction)
✅ Cost tracking ($X.XX per message)
✅ Debug mode (--debug flag)
✅ History tracking (šipky nahoru/dolů)

📖 CLI Commands:
• /help, /h - Zobrazit nápovědu
• /model, /m - Seznam modelů nebo přepnutí (haiku, sonnet, gpt-5-mini, gpt-5-nano)
• /stats, /s - Statistiky tool execution & cost tracking
• /config, /c - Zobrazit aktuální konfiguraci
• /clear, /reset - Vymazat konverzaci (reset)
• /exit, /quit, /q - Ukončit agenta

Implementace: src/agent/cli.py, run_cli.sh

# Příklady CLI commands: SUJBOT2> /help 📖 Available Commands: /help, /h - Show this help /model, /m - List available models or switch model /stats, /s - Show tool execution and cost statistics ... SUJBOT2> /model 📊 Available Models: haiku - Claude 4.5 Haiku (fast & cheap, $0.80/$4.00, ✅ caching) sonnet - Claude 4.5 Sonnet (best quality, $3.00/$15.00, ✅ caching) gpt-5-mini - GPT-5 Mini (balanced, $0.15/$0.60, ❌ no caching) gpt-5-nano - GPT-5 Nano (ultra fast, $0.04/$0.16, ❌ no caching) 💡 Usage: /model <name> 📊 Current model: claude-haiku-4-5 (Anthropic) SUJBOT2> /model sonnet ✅ Model switched to: claude-sonnet-4-5-20250929 SUJBOT2> /stats 📊 Tool Usage Statistics: Total calls: 142 search: 89 calls (62.7%) get_document_list: 28 calls (19.7%) ... 💰 Cost: $1.23 total ($0.0087 per message avg)

↓

Agent Core (Claude SDK)

~5-10ms

Úkol: Agent core zpracuje query, vytvoří system prompt, registruje 14 tools a odešle request do Claude/OpenAI API.
Model (default): Claude 4.5 Haiku (rychlý & levný)
Přepínatelné: Claude 4.5 Sonnet, GPT-5 Mini, GPT-5 Nano (via /model command)

# Agent vytvoří messages pro Claude API messages = [ { "role": "user", "content": [ { "type": "text", "text": "Jaká je sankce za porušení GDPR?", "cache_control": {"type": "ephemeral"} # Prompt caching } ] } ] # Tools registry (14 tools) tools = registry.get_claude_sdk_tools() # → 14 tool definitions # Call Claude/OpenAI API s streaming response = client.messages.stream( model="claude-haiku-4-5-20251001", # Default: Haiku 4.5 messages=messages, tools=tools, max_tokens=4096 ) # Přepnutí modelu: /model sonnet, /model gpt-5-mini, /model gpt-5-nano

🧠 Agent rozhoduje:

Agent dostane user query a **vybere si správný tool** z 17 dostupných:

Pro dotaz "Jaká je sankce za porušení GDPR?":
→ Agent pravděpodobně použije search (unified hybrid search) s num_expands=1
→ Pokud chce víc kontextu: expand_context
→ Pokud selhává: filtered_search s search_method="bm25_only" (BM25 keyword)

Implementace: src/agent/agent_core.py

↓

Tool Execution

~200-2000ms (depends on tier)

Úkol: Agent zavolá vybraný tool (např. search) s parametry. Tool provede retrieval pomocí Phase 5 komponent (Hybrid Search + Reranker + KG).

# Agent volá tool tool_result = registry.execute_tool( name="search", query="sankce porušení GDPR", k=5, num_expands=1 # Query expansion: original + 1 paraphrase ) # Tool result obsahuje: { "success": True, "data": [ { "content": "Článek 83 GDPR stanovuje...", "doc_id": "GDPR_cz", "section": "8.2", "score": 0.92 }, ... ], "citations": ["GDPR_cz (Section 8.2)", ...], "metadata": { "query": "sankce porušení GDPR", "expanded_queries": ["sankce porušení GDPR", "pokuty za nedodržení GDPR"], "results_count": 5, "search_time_ms": 487 } }

🔍 Search internals (example: search tool):

1. Query Expansion (if num_expands > 0):
   → LLM generates paraphrases: ["sankce porušení GDPR", "pokuty za nedodržení GDPR"]

2. Hybrid Search (for each query):
   → BM25 search (lexical): top-50 candidates
   → FAISS search (semantic): top-50 candidates
   → RRF fusion (k=60): merge results → top-50

3. Multi-query Fusion (if multiple queries):
   → RRF fusion across all query results → top-20

4. Cross-encoder Reranking:
   → Score all (query, chunk) pairs → select top-k (e.g., 5)

5. Context Assembly (Phase 6):
   → Strip SAC summaries from chunks
   → Add citations and metadata
   → Return clean chunks to agent

↓

LLM Response Generation

~1-3s (streaming)

Úkol: Claude dostane tool results (retrieved chunks), zpracuje je a vygeneruje finální odpověď pro uživatele.

# Claude dostane tool results a generuje odpověď Claude thinking: "Tool 'search' vrátil 5 chunků o GDPR sankcích. Chunk 1 (score 0.92) obsahuje Článek 83... → Syntezuji odpověď s citacemi." # Streaming output (real-time): Assistant: Podle GDPR (Článek 83) jsou sankce za porušení ochrany osobních údajů následující: 1. **Administrativa**: Do 10 mil EUR nebo 2% globálního obratu 2. **Závažná porušení**: Do 20 mil EUR nebo 4% globálního obratu Konkrétní výše sankce závisí na závažnosti, délce trvání a spolupráci s úřadem... # Citace: Sources: GDPR_cz (Section 8.2), GDPR_cz (Section 8.3)

💰 Cost Tracking (automaticky):

Po každé odpovědi:


💰 This message: $0.0042

   Input (new): 1,234 tokens ($0.0012)

   Input (cached): 8,456 tokens ($0.0008) — 90% savings!

   Output: 567 tokens ($0.0022)



💵 Session total: $0.0156 (4 messages)

Implementace: src/cost_tracker.py

↓

Output to User

0ms

Úkol: Zobrazit finální odpověď uživateli v CLI s citacemi a cost info.

# CLI output: SUJBOT2> Jaká je sankce za porušení GDPR? Assistant: Podle GDPR (Článek 83) jsou sankce za porušení ochrany osobních údajů následující: 1. **Administrativa**: Do 10 mil EUR nebo 2% globálního obratu 2. **Závažná porušení**: Do 20 mil EUR nebo 4% globálního obratu Konkrétní výše sankce závisí na závažnosti, délce trvání a spolupráci s úřadem... Sources: GDPR_cz (Section 8.2), GDPR_cz (Section 8.3) 💰 This message: $0.0042 Input (new): 1,234 tokens Input (cached): 8,456 tokens (90% savings) Output: 567 tokens SUJBOT2> _ # Ready for next query

🛠️ 14 Agent Tools (3 Tiers)

TIER 1

Basic Retrieval Tools

⚡ Fast (100-300ms) • 80% queries

Unified hybrid search with optional query expansion. Combines BM25 + Dense + RRF fusion + reranking.

Query Expansion:
• num_expands=0: Fast (~200ms, 1 query)
• num_expands=1: Balanced (~500ms, 2 queries, +10-15% recall)
• num_expands=2: Better (~800ms, 3 queries, +15-25% recall)

200-800ms k: 1-10 num_expands: 0-5

get_tool_help

Get detailed documentation for any tool. Shows parameters, use cases, performance characteristics.

~50ms Meta tool

get_document_list

List all indexed documents with metadata (title, page count, index date, sections count).

~20ms Metadata

list_available_tools

List all 14 tools grouped by tier with availability status (requires KG/reranker).

~30ms Meta tool

get_document_info

Get detailed info/metadata for specific document: sections, hierarchy, chunk count, embedding stats.

~50ms doc_id required

TIER 2

Advanced Retrieval Tools

🔧 Quality (500-1000ms) • Complex queries

graph_search

Unified knowledge graph search with 4 modes: entity_mentions (find chunks), entity_details (full info), relationships (query connections), multi_hop (BFS traversal).

Use case: "What topics are covered by standards issued by GSSB?" (multi_hop)
Replaces: multi_hop_search + entity_tool

~300ms-2s Requires KG 4 modes

compare_documents

Compare two documents to find similarities, differences, conflicts. Uses semantic similarity across sections.

~1-2s 2 doc_ids required

explain_search_results

Explain search result scores: BM25, Dense, RRF, Reranker score breakdown per chunk.

Debug tool for understanding why certain chunks ranked higher.

~300ms Debug

filtered_search

Unified search with 3 methods (hybrid/bm25_only/dense_only) + 5 filter types (document, section, temporal, metadata, content).

Use case: "Search only in Document X, Section 5" or fast BM25-only keyword search
Replaces: exact_match_search (via search_method parameter)

~50ms-600ms 3 search methods

similarity_search

Find chunks similar to a given chunk_id or content. Pure semantic search (no BM25).

~200ms FAISS only

expand_context

Expand chunk context by retrieving surrounding chunks (before/after in section). Useful when chunk is too small.

~150ms chunk_id required

TIER 3

Analysis & Insights Tools

🔍 Deep (1-3s) • Specialized analysis

timeline_view

Extract temporal timeline from documents. Uses pattern matching + LLM to identify dates and events.

Use case: "Show me timeline of GDPR amendments"

~1-3s LLM processing Higher cost

summarize_section

Generate detailed summary of a document section using LLM. More detailed than Phase 2 generic summaries.

~2-4s LLM processing Higher cost

get_stats

Get corpus/index statistics: total documents, chunks, embeddings, tool usage stats, cache hit rates.

~100ms Analytics