# Research Development Framework

A CLI + data substrate for research document management, optimized for Claude Code orchestration. RDF handles data operations (ingest/search/extract/validate/govern) while Claude Code serves as the reasoning engine (planning, judgment, synthesis).

---

## Table of Contents

1. [System Overview](#system-overview)
2. [Core Workflows](#core-workflows)
3. [Data & Artifact Model](#data--artifact-model)
4. [CLI Tool Registry](#cli-tool-registry)
5. [JSON Output Contracts](#json-output-contracts)
6. [Configuration](#configuration)
7. [Review Queues & Governance](#review-queues--governance)

---

## System Overview

RDF transforms a directory of documents into searchable memory and auditable writing outputs.

### Division of Responsibility

| Layer | Owner | Responsibilities |
|-------|-------|------------------|
| **Reasoning** | Claude Code | Plan workflows, pick strategies, interpret results, evaluate quality, decide gap resolution |
| **Data Substrate** | RDF Tools | Ingest/extract/OCR, chunk + embed, store + retrieve, validate claims, enforce review gates |

### Architecture

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           CLAUDE CODE (Agent)                                │
│     Planning  │  Judgment  │  Synthesis  │  Quality Evaluation              │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              RDF CLI LAYER                                   │
│   rdf write  │  rdf book  │  rdf essay  │  rdf research  │  rdf validate    │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           DATA SUBSTRATE                                     │
│   Ingest/OCR  │  Chunk  │  Embed  │  Search  │  Quote Extract  │  Validate  │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              STORAGE                                         │
│              PostgreSQL + pgvector  │  File System (artifacts)              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Key Principles

1. **CLI + JSON only** — No TUI, no interactive prompts
2. **Agent-as-brain** — RDF doesn't orchestrate LLM calls for reasoning
3. **Stateless UX, stateful artifacts** — Continuity via `workflow_state.json` + resume tokens
4. **Minimal provenance** — Enough for citations/validation, not enterprise audit

---

## Core Workflows

### Quick Start

**Write anything:**

```bash
# System infers essay vs book from parameters
rdf write "Freemasonry in the 1800s" --pages 20              # → essay
rdf write "Freemasonry in the 1800s" --chapters 5 --pages 100  # → book
```

### Book Workflow (3 Steps)

```bash
# 1. Generate outline from idea
rdf outline "Freemasonry during the 1800's" --chapters 5 --pages 100

# 2. Research and draft (supervised pauses after research + draft)
rdf book projects/books/BOOK_xxx/outline.yaml --autonomy supervised

# 3. Validate and polish each chapter
rdf validate projects/books/BOOK_xxx/chapter_01.md --format json
rdf polish projects/books/BOOK_xxx/chapter_01.md --preset academic
```

### Essay Workflow (1 Command)

```bash
# Full autonomy - runs to completion
rdf essay "Freemasonry in the 1800's" --pages 20 --evidence-first

# Output: essays/ESSAY_xxx/essay_polished.md
```

### Autonomy Levels

| Level | Flag | Checkpoints | Best For |
|-------|------|-------------|----------|
| **Full** | `--autonomy full` | None (errors only) | "Just write it" |
| **Supervised** | `--autonomy supervised` | After research, after draft | Balanced oversight |
| **Interactive** | `--autonomy interactive` | All checkpoints | Careful, iterative work |

**Defaults:** `full` for essays, `supervised` for books.

---

## Data & Artifact Model

### ID Standards

| Artifact | Format | Example |
|----------|--------|---------|
| Document | `DOC_{NNN}` | `DOC_023` |
| Chunk | `chunk_{NNN}_{index}` | `chunk_023_045` |
| Book workflow | `BOOK_{timestamp}_{hash}` | `BOOK_20251212134640_2904d1` |
| Essay workflow | `ESSAY_{timestamp}_{hash}` | `ESSAY_20251212_def456` |
| Web source | `WEB_{hash}` | `WEB_abc123` |

### Minimal Provenance (Required Fields)

```json
{
  "doc_id": "DOC_023",
  "source_type": "library",
  "page_range": [44, 45],
  "chunk_id": "chunk_023_045",
  "extraction": "standard",
  "content_hash": "sha256:a1b2c3..."
}
```

| Field | Type | Description |
|-------|------|-------------|
| `doc_id` | string | Parent document ID |
| `source_type` | enum | `"library"` or `"web"` |
| `page_range` | array | `[start, end]` page numbers |
| `chunk_id` | string | Chunk identifier |
| `extraction` | enum | `"standard"`, `"academic_pdf"`, or `"ocr"` |
| `content_hash` | string | SHA-256 of chunk content |

### Used-In Trace (Optional, Recommended)

Links provenance to output artifacts for audit:

```json
{
  "prov": {
    "doc_id": "DOC_023",
    "page_range": [44, 45],
    "chunk_id": "chunk_023_045"
  },
  "used_in": {
    "file": "chapter_01.md",
    "line_start": 41,
    "line_end": 47
  }
}
```

### Deterministic Artifact Paths

```
projects/books/BOOK_xxx/
├── outline.yaml              # Project configuration (Phase 2)
├── workflow_state.json       # Current phase and resume tokens
├── scratchpad.md             # Cross-chapter notes (Phase 6-7)
│
├── research/                 # Phase 3: Raw research materials
│   ├── ch01_research.md
│   ├── ch02_research.md
│   └── quotes_collection.md
│
├── briefs/                   # Phase 5: Synthesized chapter briefs
│   ├── ch01_brief.md         # Evidence-mapped structure
│   ├── ch02_brief.md
│   └── ...
│
├── drafts/                   # Phase 6: First drafts
│   ├── chapter_01_draft.md
│   └── ...
│
├── revised/                  # Phase 7: After editing
│   ├── chapter_01_revised.md
│   └── ...
│
├── polished/                 # Phase 8: Final chapters
│   ├── chapter_01.md
│   └── ...
│
├── gaps.md                   # Gap analysis output
├── validation_report.json    # Claim verification
└── compiled/                 # Phase 9: Final manuscript
    ├── bibliography.md
    └── manuscript.md

projects/essays/ESSAY_xxx/
├── research_summary.md
├── quote_bank.json
├── essay_draft.md
├── essay_revised.md
├── validation_report.json
└── essay_polished.md
```

### Artifact Naming Rules

| Artifact | Pattern | Example |
|----------|---------|---------|
| Chapter research | `research/ch{NN}_research.md` | `ch01_research.md` |
| Chapter brief | `briefs/ch{NN}_brief.md` | `ch01_brief.md` |
| Chapter draft | `drafts/chapter_{NN}_draft.md` | `chapter_01_draft.md` |
| Chapter revised | `revised/chapter_{NN}_revised.md` | `chapter_01_revised.md` |
| Chapter polished | `polished/chapter_{NN}.md` | `chapter_01.md` |
| Gaps list | `gaps.md` | Always this name |
| Scratchpad | `scratchpad.md` | Always this name |
| Validation report | `validation_report.json` | Always this name |
| Workflow state | `workflow_state.json` | Contains resume tokens |

---

## CLI Tool Registry

All tools support `--format json` and return the Standard Response Wrapper.

### I. Ingestion & Library

| Command | Function | Notes |
|---------|----------|-------|
| `rdf ingest <path>` | Ingest/extract/OCR into DB | Supports `--ocr-profile` |
| `rdf health` | Library health scan | Optional auto-fix via queue |
| `rdf edit-meta <DOC_###>` | Metadata corrections | Direct edit, no LLM guessing |
| `rdf assess <DOC_###>` | Mechanical metadata assessment | Replaces persona curation |

**rdf assess output:**

```json
{
  "status": "success",
  "data": {
    "doc_id": "DOC_023",
    "ocr_quality": 0.87,
    "language": "en",
    "page_count": 245,
    "has_toc": true,
    "extraction_warnings": ["footnotes_merged_with_body"],
    "duplicate_likelihood": 0.12
  }
}
```

### II. Retrieval (Memory Layer)

| Command | Function | Notes |
|---------|----------|-------|
| `rdf search "<query>"` | Semantic/keyword/hybrid search | Supports `--summary`, `--limit` |
| `rdf fetch <chunk_id>` | Return bounded text | Supports `--max-chars` |
| `rdf graph query "<concept>"` | JSON graph traversal | No HTML rendering |
| `rdf diff <a> <b>` | Unified or JSON diff | Replaces TUI diff panel |

**rdf fetch example:**

```bash
rdf fetch chunk_023_045 --max-chars 2000 --format json
```

**rdf diff example:**

```bash
rdf diff chapter_01.md chapter_01_polished.md --format unified
rdf diff chapter_01.md chapter_01_polished.md --format json
```

### III. Research & Evidence (Action Layer)

| Command | Function | Notes |
|---------|----------|-------|
| `rdf research "<question>"` | Iterative research + retrieval | Defaults `--strict` (library only) |
| `rdf quotes "<topic>"` | Extract verbatim evidence | Produces `quote_bank.json` |
| `rdf outline "<topic>"` | Generate `outline.yaml` | Supports `--chapters`, `--pages` |
| `rdf draft ...` | Draft from evidence/outline | Prefer `--evidence-first` |
| `rdf book <outline.yaml>` | End-to-end book workflow | Produces chapter files |
| `rdf essay "<topic>"` | End-to-end essay | Single command |
| `rdf write "<topic>"` | Universal entry point | Infers essay vs book |

### IV. Quality & Governance (Review Layer)

| Command | Function | Notes |
|---------|----------|-------|
| `rdf validate <file>` | Claim/source verification | Creates queue items for issues |
| `rdf polish <file>` | Style refinement | Optional if Claude does polishing |
| `rdf queue list [type]` | List pending review items | Types: `gap`, `validation`, `web` |
| `rdf queue approve <id>` | Approve queue item | With optional `--strategy` |
| `rdf queue reject <id>` | Reject queue item | Requires `--reason` |
| `rdf status` | Current workflow state | Supports `--last <id>` |

---

## JSON Output Contracts

### Standard Response Wrapper

All commands return this structure:

```json
{
  "status": "success",
  "code": "SUCCESS",
  "message": "Operation completed",
  "data": {},
  "warnings": [],
  "queue_items_created": 0,
  "next_suggested_commands": []
}
```

| Field | Type | Description |
|-------|------|-------------|
| `status` | enum | `"success"` or `"error"` only |
| `code` | string | Machine-readable code |
| `message` | string | Human-readable summary |
| `data` | object | Command-specific payload |
| `warnings` | array | Non-fatal issues |
| `queue_items_created` | int | Items added to review queues |
| `next_suggested_commands` | array | Recommended follow-up commands |

### Pause/Review Contract

Use `status: "success"` with `code: "PAUSED_FOR_REVIEW"`. Do NOT introduce new status values.

```json
{
  "status": "success",
  "code": "PAUSED_FOR_REVIEW",
  "message": "Research complete. Review gaps before drafting.",
  "data": {
    "checkpoint": "post_research",
    "workflow_id": "BOOK_20251212134640_2904d1",
    "resume_token": "RESUME_2904d1_post_research",
    "review_artifacts": [
      {
        "path": "projects/books/BOOK_.../gaps.md",
        "description": "Gaps requiring strategy decision"
      }
    ],
    "decision_packet": {
      "decision_id": "gap_fill_strategy",
      "question": "How should gaps be handled?",
      "options": [
        {"id": "library_only", "label": "Library only", "risk": "May be incomplete"},
        {"id": "allow_web", "label": "Allow web", "risk": "Requires approval of web sources"},
        {"id": "skip", "label": "Skip gaps", "risk": "Draft will note missing coverage"}
      ],
      "default": "library_only"
    }
  },
  "next_suggested_commands": [
    "rdf queue list gap",
    "rdf book --resume BOOK_xxx --phases 5"
  ]
}
```

### Error Contract

```json
{
  "status": "error",
  "code": "GAP_THRESHOLD_NOT_MET",
  "message": "Coverage 40% below threshold 50%",
  "data": {
    "coverage": 40,
    "threshold": 50,
    "gaps": ["topic_a", "topic_b"]
  },
  "actionable_advice": "Run rdf research on missing topics or lower threshold."
}
```

### Standard Error Codes

| Code | Meaning | Recovery |
|------|---------|----------|
| `SUCCESS` | Operation completed | None needed |
| `PAUSED_FOR_REVIEW` | Human decision required | Review artifacts, choose option |
| `GAP_THRESHOLD_NOT_MET` | Insufficient research coverage | Research gaps or lower threshold |
| `VALIDATION_FAILED` | Claims don't match sources | Fix via queue items |
| `QUEUE_GATE_BLOCKED` | Cannot proceed, constraint violation | Resolve queue items first |
| `FILE_NOT_FOUND` | Artifact missing | Check path |
| `CORRUPT_PDF` | PDF extraction failed | Use `--ocr-profile` or re-ingest |

---

## Configuration

```yaml
# config/project.yaml

library:
  path: "./library_data"

database:
  url: ${DATABASE_URL}

embeddings:
  provider: "openai"              # or "local"
  model: "text-embedding-3-small"
  dimension: 1536
  local_endpoint: "http://localhost:11434/v1"  # if provider=local

agent_safety:
  allow_web_search: false                 # default: library only
  require_approved_web_sources: true      # web sources need queue approval
  validation_required_before_polish: true # enforce ordering

execution:
  parallelism_default: 1                  # allow --parallel N per command

defaults:
  autonomy_essay: "full"
  autonomy_book: "supervised"
  strict_library_only: true
```

### Environment Variables

| Variable | Purpose |
|----------|---------|
| `DATABASE_URL` | PostgreSQL connection string |
| `OPENAI_API_KEY` | For embeddings (if provider=openai) |
| `TAVILY_API_KEY` | For web search (if allowed) |

---

## Review Queues & Governance

### Queue Types

| Queue | Source | Auto-Approve | Use Case |
|-------|--------|--------------|----------|
| `gap` | Research phases | Never | Approve gap-fill strategy |
| `validation` | `rdf validate` | Never | Fix claim issues |
| `web` | Web search results | Never | Approve web sources |
| `metadata` | `rdf health` | 85% confidence | Title/author corrections |

### Queue CLI

```bash
# List pending items
rdf queue list
rdf queue list gap --format json

# Approve with strategy
rdf queue approve gap_001 --strategy library_only

# Reject with reason
rdf queue reject gap_002 --reason "out_of_scope" --note "Not relevant to chapter"

# Stats
rdf queue stats --format json
```

### Rejection Reasons (Required)

| Reason Code | Description |
|-------------|-------------|
| `hallucination` | Claim/entity doesn't exist in sources |
| `wrong_entity` | Confused similar entities |
| `wrong_date` | Temporal error |
| `wrong_attribution` | Correct fact, wrong source |
| `irrelevant` | Valid but off-topic |
| `duplicate` | Already covered |
| `out_of_scope` | Outside project boundaries |

### Queue Gating

Certain operations are blocked until queue constraints are met:

```json
{
  "status": "error",
  "code": "QUEUE_GATE_BLOCKED",
  "message": "Cannot polish: 3 validation issues unresolved",
  "data": {
    "blocking_queue": "validation",
    "pending_count": 3,
    "blocking_items": ["val_001", "val_002", "val_003"]
  },
  "actionable_advice": "Resolve validation queue items before polishing."
}
```

---

## Complete Tool Reference

### Core Commands (12)

| Command | Purpose |
|---------|---------|
| `rdf ingest` | Add documents to library |
| `rdf search` | Search the library |
| `rdf fetch` | Retrieve bounded chunk text |
| `rdf research` | Autonomous research |
| `rdf quotes` | Extract evidence |
| `rdf outline` | Generate book outline |
| `rdf book` | Book compilation workflow |
| `rdf essay` | Essay generation |
| `rdf write` | Universal entry point |
| `rdf validate` | Claim verification |
| `rdf polish` | Style refinement |
| `rdf queue` | Review queue management |
| `rdf status` | Workflow state |

### Utility Commands (10)

| Command | Purpose |
|---------|---------|
| `rdf health` | Library health scan |
| `rdf edit-meta` | Metadata corrections |
| `rdf assess` | Mechanical document assessment |
| `rdf graph` | Knowledge graph queries (JSON) |
| `rdf diff` | File comparison |
| `rdf config` | Configuration management (entity extraction setup) |
| `rdf context` | Agent session warm-start (resume context) |
| `rdf capabilities` | Agent capability manifest (bootstrapping) |
| `rdf entity` | Entity/concept management (duplicates, merge, alias) |
| `rdf export` | Export bibliography (BibTeX, RIS, CSL-JSON) |

### Pipeline Scripts (Retained)

These remain available for direct use but are wrapped by `rdf` commands:

| Script | Wrapped By |
|--------|------------|
| `ingest_documents.py` | `rdf ingest` |
| `chunk_documents.py` | (internal) |
| `generate_embeddings.py` | (internal) |
| `search_export.py` | `rdf search` |
| `research_agent.py` | `rdf research` |
| `extract_quotes.py` | `rdf quotes` |
| `validate_draft.py` | `rdf validate` |
| `polish_draft.py` | `rdf polish` |
| `book.py` | `rdf book` |

---

## Appendix: Outline Schema

For `rdf outline` output and `rdf book` input:

```yaml
title: "Book Title"
subtitle: "Optional Subtitle"
target_pages: 100
style: academic  # academic, accessible, narrative, popular

research:
  strict_library_only: true
  min_coverage_threshold: 50

chapters:
  - number: 1
    title: "Chapter Title"
    target_pages: 20
    synopsis: "Brief description"
    key_topics:
      - "Topic A"
      - "Topic B"
    research_queries:  # optional, auto-generated if omitted
      - "Query 1"
      - "Query 2"
```

**Minimal outline (auto-expands):**

```yaml
title: "Book Title"
target_pages: 100
chapters:
  - title: "Chapter One"
  - title: "Chapter Two"
  - title: "Chapter Three"
```

