# Future Enhancements: Architectural Review

> **ARCHIVED: 2025-12-14**
>
> All enhancements in this document have been implemented. This file is preserved for historical reference.
>
> | Enhancement | Status | Implementation Date |
> |-------------|--------|---------------------|
> | Opt-in Generation Model | ✅ Complete | 2025-12-14 |
> | `rdf context` command | ✅ Complete | 2025-12-14 |
> | `rdf capabilities` command | ✅ Complete | 2025-12-14 |
> | Queue feedback enhancement | ✅ Complete | 2025-12-14 |
> | `rdf entity` suite | ✅ Complete | 2025-12-14 |
> | BibTeX export | ✅ Complete | 2025-12-14 |

---

Comprehensive analysis of architectural feedback and proposed improvements for the Research Development Framework.

---

## Table of Contents

1. [Current System Inventory](#1-current-system-inventory)
2. [Issue 1: Split Brain Architecture](#2-issue-1-split-brain-architecture)
3. [Issue 2: State Recovery & Context Loading](#3-issue-2-state-recovery--context-loading)
4. [Issue 3: Governance Layer - Queue Feedback](#4-issue-3-governance-layer---queue-feedback)
5. [Issue 4: Entity Management & Resolution](#5-issue-4-entity-management--resolution)
6. [Issue 5: Documentation & Citations](#6-issue-5-documentation--citations)
7. [Issue 6: Meta-Tool for Agent Bootstrapping](#7-issue-6-meta-tool-for-agent-bootstrapping)
8. [Implementation Matrix](#8-implementation-matrix)
9. [Solution to Issue 1: Opt-in Generation Model](#9-solution-to-issue-1-opt-in-generation-model)

---

## 1. Current System Inventory

### 1.1 Complete Command List

| Command | Category | Current Behavior | Uses LLM? |
|---------|----------|------------------|-----------|
| `rdf ingest <path>` | Data | Extracts text, chunks, embeds, stores | Yes (embeddings only) |
| `rdf search "<query>"` | Data | Semantic + keyword search | Yes (embeddings) |
| `rdf fetch <chunk_id>` | Data | Returns chunk text with metadata | No |
| `rdf research "<question>"` | Action | Multi-iteration autonomous research | Yes (gpt-4o-mini) |
| `rdf quotes "<topic>"` | Data | Extracts evidence passages | No |
| `rdf outline "<topic>"` | Action | Generates book outline | Yes (gpt-4o) |
| `rdf book <outline.yaml>` | Action | Full book workflow (5 phases) | Yes (gpt-4o for drafts) |
| `rdf essay "<topic>"` | Action | Essay generation | No (returns placeholders) |
| `rdf write "<topic>"` | Action | Universal entry | Depends on target |
| `rdf validate <file>` | Governance | Claim verification against sources | Yes (gpt-4o-mini) |
| `rdf polish <file>` | Action | Style refinement | Yes (gpt-4o direct) |
| `rdf queue <action>` | Governance | Review queue management | No |
| `rdf status` | Utility | Workflow state display | No |
| `rdf health` | Utility | Library health scan | No |
| `rdf edit-meta <DOC_ID>` | Data | Metadata corrections | No |
| `rdf assess <DOC_ID>` | Data | Document quality assessment | No |
| `rdf graph query "<concept>"` | Data | Knowledge graph queries | No |
| `rdf diff <a> <b>` | Utility | File comparison | No |
| `rdf config [action]` | Utility | Configuration management | No |

### 1.2 LLM Usage Summary

**Commands that call LLMs internally:**

| Command | LLM Backend | Model | Cost Tracking | Purpose |
|---------|-------------|-------|---------------|---------|
| `rdf research` | LLMInterface | gpt-4o-mini | Yes | Synthesize findings |
| `rdf outline` | LLMInterface | gpt-4o | Yes | Generate structure |
| `rdf book` (Phase 5) | LLMInterface | gpt-4o | Yes | Draft chapters |
| `rdf validate` | LLMInterface | gpt-4o-mini | Yes | Verify claims |
| `rdf polish` | Direct OpenAI | gpt-4o | No | Style refinement |

---

## 2. Issue 1: Split Brain Architecture

### 2.1 The Problem Explained

The documentation establishes a principle: **"RDF does data operations, Claude Code does thinking."** However, the codebase currently violates this by allowing the CLI to autonomously generate prose using its own internal LLM (OpenAI), creating a "Split Brain" scenario where the Agent (Claude) and the Tool (CLI) compete for authorship.

```
┌─────────────────────────────────────────────────────────────────┐
│                     TEXT GENERATION SOURCES                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Claude Code (Opus 4.5)          CLI Internal (gpt-4o)          │
│  ─────────────────────           ─────────────────────          │
│  • Essay content (expected)      • Book chapter drafts          │
│  • Revision decisions            • Research synthesis           │
│  • Editorial judgment            • Outline generation           │
│  • Gap analysis                  • Polish refinements           │
│                                  • Validation checks            │
│                                                                  │
│           WHO IS THE AUTHOR?                                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

**The Inconsistencies:**

1. **Essay vs Book:** `rdf essay` expects Claude Code to write, but `rdf book` generates internally
2. **Two OpenAI Clients:** `polish_draft.py` creates its own client; `research_agent.py` uses LLMInterface
3. **Model Inconsistency:** Some tools use gpt-4o, others gpt-4o-mini, with no clear rationale exposed to the agent

### 2.2 Resolution Strategy

We will adopt an **"Opt-in Generation"** model. The internal LLMs will remain available for heavy lifting (GraphRAG, sorting, analysis), but prose generation will default to Claude Code unless explicitly delegated.

**See [Section 9](#9-solution-to-issue-1-opt-in-generation-model) for the detailed implementation plan.**

---

## 3. Issue 2: State Recovery & Context Loading

### 3.1 The Problem

When Claude Code resumes a session, it must reconstruct context by running `rdf status` (which gives metadata) and then reading multiple files (`project.json`, `checkpoint.json`, `gaps.md`). This requires 5+ file reads and manual context assembly, which is token-expensive and error-prone.

### 3.2 Option A: New `rdf context` Command

**Description:** Single command that returns a "Context Packet" optimized for agent warm-start.

**Implementation:**

```bash
rdf context BOOK_xxx --format json
```

**Returns:**

```json
{
  "meta": {
    "project_id": "BOOK_xxx",
    "last_activity": "2025-12-12T15:23:10"
  },
  "position": {
    "current_phase": 3,
    "phase_name": "Gap Resolution"
  },
  "active_context": {
    "last_synthesis": "The Kilwinning Lodge, claimed to be...[500 words]...",
    "pending_decisions": [
      { "type": "gap", "query": "Schaw Statutes" }
    ]
  },
  "blocking_items": {
    "queue_items": 3
  }
}
```

### 3.3 Recommendation

**Implement Option A (`rdf context`)**. It acts as a "Save Game" loader for the agent.

---

## 4. Issue 3: Governance Layer - Queue Feedback

### 4.1 The Problem

The binary Approve/Reject model is too simple for nuanced research decisions. If Claude Code identifies a gap but wants to impose constraints (e.g., "Reject this gap, but search for X instead using only academic sources"), the current `rdf queue reject` command only accepts a reason code.

### 4.2 Option A: Structured Feedback Field

**Description:** Add a `feedback` field with optional structured constraints.

**Implementation:**

```bash
rdf queue reject GAP_045 --feedback "Use web search with constraints: academic sources only"
```

### 4.3 Recommendation

**Implement Option A** to allow richer steering of the research process without complex re-queueing logic initially.

---

## 5. Issue 4: Entity Management & Resolution

### 5.1 The Duplicate Problem

In historical/esoteric research, entity fragmentation is severe (e.g., "Rudolf Steiner" vs "R. Steiner"). The current system extracts entities but offers no CLI commands to manage, merge, or alias them, leading to a fragmented Knowledge Graph.

### 5.2 Option A: Full `rdf entity` Command Suite

**Commands:**

- `rdf entity duplicates`: Find potential duplicates
- `rdf entity merge <primary_id> <duplicate_id>`: Merge entities
- `rdf entity alias`: Create alias mappings

### 5.3 Recommendation

**Implement Option A** for functional management, but pair it with a **Queue-Based Review** step for safety.

---

## 6. Issue 5: Documentation & Citations

### 6.1 The Problem

While the system tracks sources and generates a markdown bibliography, academic users require standard formats like BibTeX or RIS for integration with reference managers (Zotero, Mendeley).

### 6.2 Recommendation

Add bibliography support to `rdf export`:

```bash
rdf export bibliography --project BOOK_xxx --format bibtex --output refs.bib
```

---

## 7. Issue 6: Meta-Tool for Agent Bootstrapping

### 7.1 The Problem

When Claude Code starts a session, it has no efficient way to discover the RDF toolset's full capabilities without reading multiple token-heavy markdown files.

### 7.2 Recommendation

Implement `rdf capabilities --format json` to return a compressed capability manifest (commands, inputs, outputs, error codes) optimized for agent consumption.

---

## 8. Implementation Matrix

### 8.1 Effort Estimates

| Implementation | Files Changed | New Files | Estimated Lines |
|----------------|---------------|-----------|-----------------|
| Opt-in Generation | 3 | 0 | ~100 |
| `rdf context` | 2 | 1 | ~250 |
| Queue feedback | 2 | 0 | ~150 |
| `rdf entity` suite | 2 | 1 | ~400 |
| BibTeX export | 2 | 1 | ~200 |
| `rdf capabilities` | 2 | 1 | ~200 |

### 8.2 Priority Order

| Priority | Enhancement | Rationale | Status |
|----------|-------------|-----------|--------|
| **1** | Opt-in Generation | Resolves core architectural conflict | **Complete** |
| **2** | `rdf context` | Critical for agent session resumption | **Complete** |
| **3** | `rdf capabilities` | Improves agent bootstrapping | **Complete** |
| **4** | Queue feedback | Enhances research steering | **Complete** |
| **5** | `rdf entity` suite | Knowledge graph maintenance | **Complete** |
| **6** | BibTeX export | Academic user value-add | **Complete** |

---

## 9. Solution to Issue 1: Opt-in Generation Model

This implementation resolves the "Split Brain" architecture by strictly defining the roles of the **Agent (Claude)** and the **Tool (CLI)**.

### 9.1 The Strategy: "Claude-First, Tool-Assisted"

**Default Behavior:** `rdf essay` and `rdf book` return outlines, research data, and context structures. The actual writing is left to Claude Code.

**Opt-in Behavior:** The user (or Agent) can explicitly delegate specific tasks (polishing, drafting) to the CLI's internal LLMs using flags.

**Internal Role:** The CLI's internal LLMs default to "Utility" tasks (GraphRAG, sorting, analysis) rather than creative generation.

```
┌─────────────────────────────────────────────────────────────────┐
│                   OPT-IN GENERATION MODEL                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  DEFAULT MODE (Claude-First)                                     │
│  ───────────────────────────                                     │
│                                                                  │
│  Claude Code                          RDF CLI                    │
│  ───────────                          ───────                    │
│  • Writes all prose                   • Returns research data    │
│  • Makes editorial decisions          • Provides outlines        │
│  • Controls voice/style               • Handles GraphRAG/search  │
│  • Performs revisions                 • Validates claims         │
│                                                                  │
│  OPT-IN MODE (Delegated)                                         │
│  ───────────────────────                                         │
│                                                                  │
│  Claude Code                          RDF CLI                    │
│  ───────────                          ───────                    │
│  • Reviews output                     • Writes draft (gpt-4o)    │
│  • Approves/edits                     • Polishes text (gpt-4o)   │
│  • Maintains oversight                • Heavy lifting tasks      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

### 9.2 Configuration Updates (project.yaml)

Add a `generation` section to explicitly control this behavior:

```yaml
generation:
  # Default behavior: Does the CLI write text automatically?
  # "agent" = Claude Code writes (CLI returns data)
  # "cli"   = CLI writes (using internal LLMs)
  default_author: "agent"

  # Available models for opt-in tasks
  models:
    primary: "gpt-4o"           # For heavy lifting (polishing, drafting if delegated)
    utility: "gpt-4o-mini"      # For GraphRAG, sorting, analysis

  # Cost control for internal models
  cost_tracking: true
  max_budget_usd: 10.00
```

### 9.3 CLI Command Behavior Updates

#### A. Polishing (Opt-in)

The user can ask Claude: *"Please polish this chapter using the OpenAI model defined in the config."*

Claude translates this to:

```bash
# Explicitly uses the internal model defined in project.yaml
./rdf polish chapter_01.md --use-internal-model
```

| Mode | Behavior |
|------|----------|
| **Without Flag** | Returns a linting/suggestion report; Claude performs the edits |
| **With Flag** | Returns the rewritten text generated by gpt-4o |

#### B. Drafting (Book/Essay)

When running a book workflow:

**Standard Mode (Default):**

```bash
./rdf book outline.yaml --phase 5
```

- **Result:** Returns a JSON packet containing the research, outline, and chapter brief
- **Action:** Claude Code takes this JSON and writes the chapter prose itself

**Delegated Mode (Opt-in):**

```bash
./rdf book outline.yaml --phase 5 --delegate-drafting
```

- **Result:** The CLI uses gpt-4o to write the draft internally based on the research
- **Action:** Claude Code receives the completed markdown file to review

### 9.4 Workflow Comparison

```
┌─────────────────────────────────────────────────────────────────┐
│                    STANDARD MODE (Default)                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  User: "Write chapter 3"                                         │
│           │                                                      │
│           ▼                                                      │
│  Claude Code: "I'll gather the research first"                   │
│           │                                                      │
│           ▼                                                      │
│  ./rdf book outline.yaml --phase 5                               │
│           │                                                      │
│           ▼                                                      │
│  CLI returns: { research, outline, chapter_brief }               │
│           │                                                      │
│           ▼                                                      │
│  Claude Code: [Writes chapter prose using the brief]             │
│           │                                                      │
│           ▼                                                      │
│  Output: Chapter draft in Claude's voice                         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                    DELEGATED MODE (Opt-in)                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  User: "Write chapter 3, use the internal model"                 │
│           │                                                      │
│           ▼                                                      │
│  Claude Code: "I'll delegate drafting to the CLI"                │
│           │                                                      │
│           ▼                                                      │
│  ./rdf book outline.yaml --phase 5 --delegate-drafting           │
│           │                                                      │
│           ▼                                                      │
│  CLI: [gpt-4o writes chapter from research]                      │
│           │                                                      │
│           ▼                                                      │
│  Claude Code: [Reviews and edits the draft]                      │
│           │                                                      │
│           ▼                                                      │
│  Output: Chapter draft reviewed by Claude                        │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

### 9.5 Benefits

| Benefit | Description |
|---------|-------------|
| **Architecture Alignment** | Restores the "Agent = Brain, CLI = Tool" principle |
| **Cost Control** | "Hidden" API costs from the CLI are now explicit and opt-in |
| **Voice Consistency** | Claude Code maintains consistent voice across all prose |
| **Flexibility** | Allows use of internal models for bulk tasks without forcing them for creative work |
| **Debugging Clarity** | Clear ownership of output—either Claude wrote it or the CLI did (flagged) |

### 9.6 Implementation Checklist

| Task | File(s) | Status |
|------|---------|--------|
| Add `generation:` section to config | `config/project.yaml` | **Complete** |
| Load generation config | `pipeline/config.py` | **Complete** |
| Add `--use-internal-model` to polish | `pipeline/polish_draft.py` | **Complete** |
| Add `--delegate-drafting` to book | `pipeline/book_workflow_phases.py`, `pipeline/book.py` | **Complete** |
| Modify essay.py to return data structure | `pipeline/essay.py` | **Complete** |
| Update documentation | `docs/CLI_USER_GUIDE.md` | **Complete** |
