# Advanced RAG Features Guide

This guide covers the advanced Retrieval-Augmented Generation features in the Research Development Framework v3.0.

## Core Features
1. **Cross-Encoder Re-ranking** - Improved search relevance
2. **GraphRAG** - Knowledge graph-based retrieval
3. **Semantic Chunking** - Embedding-based boundary detection
4. **Research Agent** - Autonomous iterative research workflow

## Enhanced in v3.0
5. **Smart Context Selection** - Token-aware chunk selection with pinning priority
6. **Semantic Graph Traversal** - Query by relationship type (supports, influences, etc.)
7. **Interactive Sessions** - Pausable research with user approval checkpoints

---

## Table of Contents

1. [Overview](#overview)
2. [Cross-Encoder Re-ranking](#cross-encoder-re-ranking)
3. [GraphRAG Retrieval](#graphrag-retrieval)
4. [Semantic Chunking](#semantic-chunking)
5. [Research Agent](#research-agent)
6. [Installation](#installation)
7. [Configuration](#configuration)
8. [API Reference](#api-reference)

---

## Overview

### The Problem with Standard RAG

Standard RAG (Retrieval-Augmented Generation) has limitations:

1. **Fuzzy retrieval**: Semantic search finds "related" documents but often misses the exact answer
2. **Implicit connections**: Standard search can't find information connected through relationships
3. **Arbitrary boundaries**: Fixed-size chunks may split topics unnaturally
4. **Single-pass**: One search may not gather enough context for complex questions

### Framework Solutions

This framework addresses these problems with features introduced across v2.1 and v3.0:

| Feature | Version | Description |
|---------|---------|-------------|
| Cross-Encoder Re-ranking | v2.1 | Precise relevance scoring via query-document pairs |
| GraphRAG (Co-occurrence) | v2.1 | Knowledge graph traversal via concept co-occurrence |
| GraphRAG (Triple-based) | v3.0 | S-P-O relationships with typed edges (supports, influences, etc.) |
| Semantic Chunking | v2.1 | Topic-coherent boundaries using embeddings |
| Research Agent | v2.1 | Autonomous iterative research |
| Smart Context Selection | v3.0 | Token-aware chunk selection with document pinning |
| Interactive Sessions | v3.0 | Pausable research with user approval checkpoints |

> **Note on GraphRAG**: The framework supports both **lightweight co-occurrence graphs** (default, always available) and **full triple-based GraphRAG** (v3.0, requires `--extract-relations` flag). See the [GraphRAG Retrieval](#graphrag-retrieval) section for details.

### v2.1+ Pipeline Architecture

```
┌─────────────────────────────────────────────────────────────────────┐
│                 ADVANCED RAG PIPELINE (v2.1+/v3.0)                   │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   User Query                                                         │
│       │                                                              │
│       ▼                                                              │
│   ┌─────────────────────────────────────────┐                       │
│   │         RESEARCH AGENT (Optional)        │                       │
│   │  Plan → Search → Analyze → Iterate       │                       │
│   └─────────────────────────────────────────┘                       │
│       │                                                              │
│       ▼                                                              │
│   ┌─────────────────────────────────────────┐                       │
│   │            HYBRID SEARCH                 │                       │
│   │  Keyword (FTS) + Semantic (Vector)       │                       │
│   │  Merged via Reciprocal Rank Fusion       │                       │
│   └─────────────────────────────────────────┘                       │
│       │                            │                                 │
│       ▼                            ▼                                 │
│   ┌──────────────┐          ┌──────────────┐                        │
│   │   GraphRAG   │          │  Standard    │                        │
│   │  (Optional)  │          │   Search     │                        │
│   │              │          │              │                        │
│   │ Graph        │          │              │                        │
│   │ Traversal    │          │              │                        │
│   └──────────────┘          └──────────────┘                        │
│       │                            │                                 │
│       └────────────┬───────────────┘                                 │
│                    ▼                                                 │
│   ┌─────────────────────────────────────────┐                       │
│   │        CROSS-ENCODER RE-RANKING          │                       │
│   │  Query+Doc pairs → Precise relevance     │                       │
│   └─────────────────────────────────────────┘                       │
│                    │                                                 │
│                    ▼                                                 │
│   ┌─────────────────────────────────────────┐                       │
│   │         SEMANTIC CHUNKS                  │                       │
│   │  Topic-coherent boundaries               │                       │
│   │  Parent context retrieval                │                       │
│   └─────────────────────────────────────────┘                       │
│                    │                                                 │
│                    ▼                                                 │
│              Final Results                                           │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘
```

---

## Cross-Encoder Re-ranking

### What It Does

Standard bi-encoder semantic search embeds queries and documents separately, then compares them. This is fast but "fuzzy."

Cross-encoders read the query AND document together, providing much more accurate relevance scores.

### How It Works

1. **Initial retrieval**: Fetch top-50 results using hybrid search (keyword + semantic)
2. **Pair creation**: Create (query, document) pairs for each result
3. **Cross-encoder scoring**: Pass pairs through BGE-reranker or MS-MARCO model
4. **Re-sort**: Return results ordered by cross-encoder score

### Usage

#### Python API

```python
from db_utils import rerank_results, hybrid_search_with_rerank

# Option 1: Re-rank existing results
results = semantic_search(query_embedding, limit=50)
reranked = rerank_results("What influences the etheric body?", results, top_k=10)

# Option 2: Use the combined function (recommended)
results = hybrid_search_with_rerank(
    query_text="What influences the etheric body?",
    limit=10,           # Final results to return
    initial_fetch=50,   # Candidates to re-rank
    use_rerank=True     # Enable cross-encoder
)

for r in results:
    print(f"Score: {r['rerank_score']:.3f} - {r['title']}")
```

#### Command Line

```bash
# The research agent uses re-ranking by default
python pipeline/research_agent.py "What influences the etheric body?"

# Disable re-ranking if needed
python pipeline/research_agent.py "query" --no-rerank
```

### Performance

| Metric | Without Re-ranking | With Re-ranking |
|--------|-------------------|-----------------|
| MRR@10 | ~0.65 | ~0.82 |
| Latency | ~50ms | ~200ms |
| "Needle in haystack" | Often missed | Usually top 3 |

### Requirements

```bash
pip install sentence-transformers
```

Models used:
- `BAAI/bge-reranker-base` (preferred, ~400MB)
- `cross-encoder/ms-marco-MiniLM-L-6-v2` (fallback, ~80MB)

---

## GraphRAG Retrieval

### What It Does

Standard search only finds documents containing query terms. GraphRAG traverses relationships in your knowledge graph to find conceptually connected content.

> **Implementation Options**: This framework supports two GraphRAG approaches:
>
> 1. **Co-occurrence Graph (Default)**: Concepts are linked when they appear together in the same chunks. Lightweight and always available.
>
> 2. **Triple-based Graph (v3.0)**: When using the `--extract-relations` flag with NER extractors (GLiNER, OpenAI, or hybrid), the system extracts Subject-Predicate-Object triples and stores them in `concept_relationships` with typed relationships (supports, influences, derived_from, contradicts, etc.). This enables semantic graph traversal queries like "find concepts that support X" or "trace the origins of Y".
>
> For full triple-based GraphRAG, use: `python3 pipeline/extract_concepts.py --use-ner --ner-extractor hybrid --extract-relations --relation-backend openai`

### Enhanced Entity Extraction (Optional)

For richer concept extraction, enable NER (Named Entity Recognition) using GLiNER or LLaMA:

```bash
# Standard extraction (pattern matching)
python3 pipeline/extract_concepts.py

# Enhanced extraction with GLiNER (CPU-friendly, recommended)
python3 pipeline/extract_concepts.py --use-ner

# Enhanced extraction with LLaMA (GPU, extracts relationships too)
python3 pipeline/extract_concepts.py --use-ner --ner-extractor llama --extract-relations
```

| Method | Speed | Requirements | Capabilities |
|--------|-------|--------------|--------------|
| Pattern matching | Fast | None | Matches known concepts |
| GLiNER | Medium | `pip install gliner` | Zero-shot entity discovery |
| LLaMA | Slow | GPU + GGUF model | Entities + S-P-O relationships |

When `--extract-relations` is used with LLaMA, extracted triples are stored in `concept_relationships` table with typed edges (e.g., "influences", "part_of", "developed").

See [CLI_USER_GUIDE.md](CLI_USER_GUIDE.md#ner-entity-extraction-enhanced) and [SETUP.md](SETUP.md#ner-entity-extraction-graphrag-enhancement) for details.

### How It Works

1. **Entity extraction**: Identify concepts in the query via fuzzy matching (or NER if enabled)
2. **Graph traversal**: Find related concepts via co-occurrence relationships (weighted edges)
3. **Chunk retrieval**: Get chunks containing any of these concepts
4. **Fusion**: Combine with standard search results using RRF

### Example

**Query**: "What influences the etheric body?"

**Standard search finds**:
- Documents mentioning "etheric body"

**GraphRAG finds**:
- Documents about "etheric body" (direct)
- Documents about "life forces" (1 hop - related concept)
- Documents about "memory" (1 hop - related concept)
- Documents about "sleep" (2 hops - connected through "life forces")

### Usage

#### Python API

```python
from db_utils import graphrag_search

result = graphrag_search(
    query="What influences the etheric body?",
    limit=10,
    hop_depth=2,              # How many relationship hops
    min_cooccurrence=2,       # Minimum co-occurrence for relationship
    include_direct_search=True # Also include standard hybrid search
)

print(f"Found {len(result['concepts'])} concepts in query")
print(f"Traversed {len(result['graph_paths'])} relationship paths")
print(f"Retrieved {len(result['results'])} chunks")

# See the graph paths
for path in result['graph_paths']:
    print(f"  {path['from']} -> {path['to']} (strength: {path['strength']})")
```

#### Related Functions

```python
from db_utils import (
    find_concept_by_name,    # Fuzzy concept lookup
    get_related_concepts,    # Multi-hop traversal
    get_chunks_for_concepts, # Retrieve chunks by concept
    get_chunk_with_context   # Get surrounding context
)

# Find a concept
concept = find_concept_by_name("etheric body", fuzzy=True)

# Get related concepts
related = get_related_concepts(
    concept['concept_id'],
    depth=2,
    min_cooccurrence=2
)

for r in related:
    print(f"{r['name']} ({r['hop_distance']} hops, {r['cooccurrence_count']} co-occurrences)")
```

### When to Use GraphRAG

| Scenario | Standard Search | GraphRAG |
|----------|-----------------|----------|
| "Find documents about X" | Good | Similar |
| "What relates to X?" | Poor | Excellent |
| "How does X affect Y?" | Fair | Excellent |
| Multi-hop reasoning | Poor | Excellent |

---

## Semantic Chunking

### What It Does

Standard chunking splits text at arbitrary token boundaries (e.g., every 750 tokens). Semantic chunking detects natural topic boundaries using embeddings.

### How It Works

1. **Sentence splitting**: Parse text into sentences
2. **Window embedding**: Embed sliding windows of 3 sentences
3. **Similarity calculation**: Measure cosine similarity between consecutive windows
4. **Boundary detection**: Create chunk boundaries where similarity drops
5. **Size constraints**: Enforce min/max chunk sizes

### Visual Comparison

```
STANDARD CHUNKING (arbitrary boundaries):
┌─────────────────────────────────────────────────────────────┐
│ Topic A: Introduction to concept...                         │
│ ...continued discussion of concept...                       │
│ ...more about concept... ████ CHUNK BREAK ████ ...now      │
│ discussing Topic B which is different...                    │
│ ...Topic B continues... ████ CHUNK BREAK ████ ...and       │
│ finally Topic C begins here...                              │
└─────────────────────────────────────────────────────────────┘
  ↑ Chunks split mid-topic, mixing content

SEMANTIC CHUNKING (topic boundaries):
┌─────────────────────────────────────────────────────────────┐
│ Topic A: Introduction to concept...                         │
│ ...continued discussion of concept...                       │
│ ...conclusion of Topic A.                                   │
├─────────────────────────────────────────────────────────────┤
│ Topic B: Now we discuss something different...              │
│ ...Topic B continues and concludes.                         │
├─────────────────────────────────────────────────────────────┤
│ Topic C: Finally this new section begins...                 │
└─────────────────────────────────────────────────────────────┘
  ↑ Each chunk is topically coherent
```

### Usage

#### Command Line

```bash
# Use semantic chunking for new documents
python pipeline/chunk_documents.py --semantic

# Adjust sensitivity (lower = more boundaries)
python pipeline/chunk_documents.py --semantic --similarity-threshold 0.4

# Re-chunk existing documents with semantic mode
python pipeline/chunk_documents.py --semantic --rechunk
```

#### Python API

```python
from chunk_documents import SemanticChunker

chunker = SemanticChunker(
    min_tokens=100,
    max_tokens=1000,
    target_tokens=500,
    similarity_threshold=0.5,  # Lower = more sensitive
    window_size=3              # Sentences per window
)

chunks = chunker.chunk_semantic(text, document_id)

for chunk in chunks:
    print(f"Chunk {chunk['chunk_sequence']}: {chunk['chunk_tokens']} tokens")
    print(f"  Method: {chunk.get('chunk_method', 'unknown')}")
```

### Comparison

| Chunking Mode | Best For | Trade-offs |
|--------------|----------|------------|
| **Standard** | General use, fast | May split topics |
| **Hierarchical** | RAG with context | More chunks |
| **Semantic** | Topic coherence | Slower, requires embeddings |

### Requirements

```bash
pip install sentence-transformers
```

Model used: `all-MiniLM-L6-v2` (~80MB, runs locally)

---

## Research Agent

### What It Does

The Research Agent automates the entire research workflow:

1. **Plan**: Break complex questions into searchable sub-queries
2. **Search**: Execute each query with hybrid search + re-ranking
3. **Analyze**: Evaluate if gathered information is sufficient
4. **Iterate**: If gaps found, generate new queries and repeat
5. **Synthesize**: Combine all findings into a coherent report

### Example Workflow

```
User: "Compare Steiner and Jung on dreams"

Agent thinks: "I need information about both authors' views"

Step 1: Search "Steiner views on dreams"
  → Found 5 relevant chunks

Step 2: Search "Jung views on dreams"
  → Found 3 relevant chunks

Agent analyzes: "I have Steiner's views but need more on Jung's archetypes"

Step 3: Search "Jung archetypes dreams"
  → Found 4 more chunks

Agent analyzes: "Sufficient information gathered"

Step 4: Synthesize findings into structured report
```

### Usage

#### Command Line

```bash
# Basic usage
python pipeline/research_agent.py "Compare Steiner and Jung on dreams"

# Save report to file
python pipeline/research_agent.py "What is the etheric body?" --output report.md

# JSON output format
python pipeline/research_agent.py "query" --format json --output report.json

# Adjust iteration depth
python pipeline/research_agent.py "query" --max-iterations 7

# Batch mode (process multiple questions)
python pipeline/research_agent.py --batch questions.txt --output-dir reports/

# Disable GraphRAG (faster, simpler)
python pipeline/research_agent.py "query" --no-graphrag

# Disable re-ranking
python pipeline/research_agent.py "query" --no-rerank
```

#### Python API

```python
from research_agent import ResearchAgent, format_markdown_report

agent = ResearchAgent(
    max_iterations=5,
    min_results_per_query=3,
    use_graphrag=True,
    use_rerank=True
)

# Execute research
session = agent.research("Compare Steiner and Jung on dreams")

# Access results
print(f"Question: {session.original_question}")
print(f"Sub-queries: {session.sub_queries}")
print(f"Iterations: {session.iterations}")
print(f"Total chunks: {session.total_chunks}")
print(f"Unique documents: {session.unique_documents}")

# Get formatted report
report = format_markdown_report(session)
print(report)
```

### Output Format

The agent produces structured reports:

```markdown
# Research Report

**Question:** Compare Steiner and Jung on dreams

**Date:** 2024-01-15

**Sources:** 8 documents, 15 text chunks

---

## Summary

[LLM-generated synthesis of findings with citations]

---

## Research Process

### Sub-queries Explored
1. Steiner views on dreams
2. Jung views on dreams
3. Jung archetypes dreams

### Search Steps
**Iteration 1:** `Steiner views on dreams`
- Results: 5 chunks
- Time: 2024-01-15T10:30:15

...

---

## Sources Referenced

- **Philosophy of Freedom** (DOC_001)
- **Psychology and Alchemy** (DOC_045)
...
```

### Requirements

- LLM access (OpenAI API or local Ollama)
- Optional: sentence-transformers for re-ranking
- Optional: Knowledge graph for GraphRAG

---

## Installation

### Full Installation (All Features)

```bash
cd /var/www/html/research/Research_development

# Install sentence-transformers for re-ranking and semantic chunking
pip install sentence-transformers

# Verify installation
python -c "from sentence_transformers import CrossEncoder; print('OK')"
```

### Minimal Installation (Research Agent Only)

The research agent works without sentence-transformers but with reduced accuracy:

```bash
# Requires only OpenAI API or Ollama
# Re-ranking and semantic chunking will be disabled
```

### Model Downloads

Models are downloaded automatically on first use:

| Model | Size | Purpose |
|-------|------|---------|
| `BAAI/bge-reranker-base` | ~400MB | Cross-encoder re-ranking |
| `all-MiniLM-L6-v2` | ~80MB | Semantic chunking |

---

## Configuration

### config/project.yaml

```yaml
intelligence:
  mode: "auto"  # or cloud, local, statistical

  cloud:
    classification_model: "gpt-4o-mini"
    chat_model: "gpt-4o"
    embedding_model: "text-embedding-3-small"

  local:
    endpoint: "http://localhost:11434/v1"
    model: "llama3"

# Advanced RAG settings
advanced_rag:
  # Re-ranking
  rerank_enabled: true
  rerank_model: "BAAI/bge-reranker-base"
  rerank_fallback: "cross-encoder/ms-marco-MiniLM-L-6-v2"

  # GraphRAG
  graphrag_enabled: true
  graphrag_hop_depth: 2
  graphrag_min_cooccurrence: 2

  # Semantic chunking
  semantic_chunking_model: "all-MiniLM-L6-v2"
  semantic_similarity_threshold: 0.5
  semantic_window_size: 3

  # Research agent
  agent_max_iterations: 5
  agent_min_results_per_query: 3
```

---

## API Reference

### db_utils.py

#### Re-ranking Functions

```python
def rerank_results(
    query: str,
    results: List[Dict],
    top_k: int = 10,
    text_field: str = 'chunk_text'
) -> List[Dict]:
    """Re-rank results using cross-encoder."""

def hybrid_search_with_rerank(
    query_text: str,
    query_embedding: List[float] = None,
    limit: int = 10,
    initial_fetch: int = 50,
    document_id: str = None,
    min_quality: str = 'fair',
    use_rerank: bool = True
) -> List[Dict]:
    """Combined hybrid search with RRF fusion and re-ranking."""
```

#### GraphRAG Functions

```python
def find_concept_by_name(
    concept_name: str,
    fuzzy: bool = True
) -> Optional[Dict]:
    """Find concept by name with optional fuzzy matching."""

def get_related_concepts(
    concept_id: int,
    depth: int = 1,
    min_cooccurrence: int = 2
) -> List[Dict]:
    """Get related concepts via graph traversal."""

def get_chunks_for_concepts(
    concept_ids: List[int],
    limit: int = 20
) -> List[Dict]:
    """Get chunks containing specified concepts."""

def graphrag_search(
    query: str,
    limit: int = 10,
    hop_depth: int = 2,
    min_cooccurrence: int = 2,
    include_direct_search: bool = True
) -> Dict[str, Any]:
    """Full GraphRAG search with graph traversal."""

def get_chunk_with_context(
    chunk_id: str,
    context_chunks: int = 1,
    use_parent: bool = True
) -> Dict[str, Any]:
    """Get chunk with surrounding context."""
```

### chunk_documents.py

```python
class SemanticChunker:
    def __init__(
        self,
        min_tokens: int = 100,
        max_tokens: int = 1000,
        target_tokens: int = 500,
        similarity_threshold: float = 0.5,
        window_size: int = 3
    ):
        """Initialize semantic chunker."""

    def chunk_semantic(
        self,
        text: str,
        document_id: str,
        page_map: list = None
    ) -> List[Dict[str, Any]]:
        """Create chunks using semantic boundary detection."""
```

### research_agent.py

```python
class ResearchAgent:
    def __init__(
        self,
        max_iterations: int = 5,
        min_results_per_query: int = 3,
        use_graphrag: bool = True,
        use_rerank: bool = True
    ):
        """Initialize research agent."""

    def research(self, question: str) -> ResearchSession:
        """Execute complete research session."""

    def plan_research(self, question: str) -> List[str]:
        """Break question into sub-queries."""

    def execute_search(self, query: str) -> List[SearchResult]:
        """Execute single search query."""

    def synthesize_findings(
        self,
        question: str,
        all_results: List[SearchResult]
    ) -> str:
        """Synthesize findings into report."""
```

---

## Troubleshooting

### "sentence-transformers not installed"

```bash
pip install sentence-transformers
```

### "Model download failed"

```bash
# Manually download models
python -c "from sentence_transformers import CrossEncoder; CrossEncoder('BAAI/bge-reranker-base')"
```

### "GraphRAG returning no results"

Check that concepts have been extracted:

```bash
python pipeline/extract_concepts.py
```

Verify concept co-occurrences exist:

```sql
SELECT COUNT(*) FROM chunk_concepts;
```

### "Research Agent hanging"

Check LLM connectivity:

```bash
# For OpenAI
curl https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY"

# For Ollama
curl http://localhost:11434/api/tags
```

### "Semantic chunking falling back to standard"

The semantic chunker requires sentence-transformers. Check installation:

```bash
python -c "from sentence_transformers import SentenceTransformer; print('OK')"
```

---

## Performance Tips

1. **Batch re-ranking**: Re-rank more candidates for better recall (initial_fetch=100)
2. **Limit GraphRAG depth**: hop_depth=2 is usually sufficient; 3+ is slow
3. **Cache embeddings**: Semantic chunker reuses embeddings if run multiple times
4. **Use hierarchical chunks**: Parent chunks provide better context than neighbors
5. **Index tuning**: For large datasets, increase IVFFlat lists

---

## Relation to Book Workflow

The 5-phase [Book Research Workflow](BOOK_RESEARCH_GUIDE.md) builds directly on the Advanced RAG layer:

| Book Workflow Phase | Advanced RAG Features Used |
|---------------------|----------------------------|
| Phase 1: Initial Research | Research Agent, GraphRAG, Re-ranking |
| Phase 2: Gap Analysis | Smart Context Selection |
| Phase 3: Gap Filling | Web Search Integration, Re-ranking |
| Phase 4: Synthesis | Semantic Chunking, Context Selection |
| Phase 5: Draft Generation | All retrieval features |

For workflow orchestration, see [Book Research Guide](BOOK_RESEARCH_GUIDE.md).

---

## Next Steps

- [Book Research Guide](BOOK_RESEARCH_GUIDE.md) - 5-phase book workflow
- [CLI User Guide](CLI_USER_GUIDE.md) - Full command reference
- [Developer Guide](DEVELOPER_GUIDE.md) - Extending the framework
- [Database Schema](DATABASE_SCHEMA.md) - Understanding the data model
