Stage 1: Reading¶
The reading stage runs two LLM substages in sequence. 1a (structure) segments the transcript and attributes speakers in a single pass. 1b (summarize) produces claim bullets per segment. The human reviews the combined output as a single reading — one review gate.
Test this stage in isolation with
auto-lorebook seed-ingest --at=structure(Stage 1a + 1b) or--at=summarize(Stage 1b only). See QA seeding.
Design drivers¶
Two properties of the intended use dominate the design of this stage:
- The human reviews claims, not transcript. A two-hour actual-play VOD at a realistic review budget (10–20 minutes per hour of footage) means the human cannot read the full transcript. The review surface is claim bullets with localized timestamps and context windows; everything else is scaffolding for producing good bullets.
- Missed claims are worse than spurious ones. An omitted claim in a one-shot ingest is a permanent gap nothing downstream will surface. A spurious claim costs the human seconds to reject. The pipeline tilts toward over-inclusion: surface anything plausibly claim-bearing and let the human filter.
These drive three design decisions: 1a covers the whole transcript (no scope filter); segmentation and attribution run as one pass; a mechanical gap check sits between 1a and 1b.
Stage 1a: Structure¶
Purpose. Segment the full transcript by topic and attribute speakers in a single pass — with sub-segment overrides where speakers change mid-segment — and flag uncertainty. Segmentation and attribution are combined because topic boundaries and speaker changes are heavily correlated in actual-play content and line content is a strong attribution signal. Splitting them across two passes throws away information the joint pass has.
Segments are contiguous and cover the whole transcript — every moment belongs to some segment. If the pass cannot identify a topic for a stretch (long pause, unintelligible audio), it still emits a segment with an appropriate title ("silence", "inaudible"): explicit is better than implicit.
Input. Raw transcript (after literal-substitution corrections
applied from .transcription-corrections.yaml) and the full preamble
(including recurring_speakers and interpretation_defaults).
Output. pending/<ingest_id>/reading/structure.yaml:
schema_version: 1
source_id: yt-abc123
generated_at: 2026-04-20T14:32:00Z
default_speaker: DM
segments:
- id: seg-001
start: "0:00:00"
end: "0:02:15"
title: "Introduction"
speaker: DM
- id: seg-002
start: "0:02:15"
end: "0:04:30"
title: "Rules discussion: grappling"
speaker: mixed
notes: "Off-topic rules lookup; unlikely to yield claims."
- id: seg-003
start: "0:04:30"
end: "0:08:00"
title: "Founding of Aldara"
speaker: DM
- id: seg-004
start: "0:08:00"
end: "0:12:00"
title: "The War of the Dusk"
speaker: DM
overrides:
- start: "0:09:45"
end: "0:10:12"
speaker: "Innkeeper NPC"
voiced_by: DM
note: "DM voicing an NPC in conversation with the party."
uncertainty_flags:
- locator: "0:05:47"
span: "a place name starting with V"
kind: name # name | attribution | other
note: "proper noun unclear; sounds like Val- or Vel-"
Mechanical checks. Segment start/end correspond to real transcript timestamps. Segments cover the full transcript duration without gaps. Override ranges fall within their parent segment. Uncertainty flag locators fall within some segment.
Uncertainty over-flagging. The prompt instructs the model to err on the side of flagging — dismissing a flag costs seconds; a silently- swallowed uncertain name pollutes a downstream fact.
Mechanical gap check¶
After 1a completes, a deterministic check (no LLM) identifies any
contiguous transcript stretch longer than a configurable threshold
(default: 5 minutes) whose segments all have thin claim-bearing
signals: titles matching patterns like "rules discussion", "break",
"off-topic", "silence", or segments with notes suggesting low yield.
This is a heuristic sanity check — the tool does not act on it, only surfaces it in reading review:
⚠ Possible coverage gap:
0:34:10–0:48:22 covered only by segments titled
"Pizza discussion", "Break", "Rules: initiative".
If this stretch contained worldbuilding, regenerate with a hint.
The human confirms the stretch is genuinely low-yield or regenerates 1a with a hint about what to look for.
Warnings are persisted in reading.yaml (gap_warnings: field, schema v2) at generate / regenerate time and re-rendered below the segment list in the approve-reading outer view, so the human sees them on every iteration without re-running the generate command.
Stage 1b: Summarize¶
Purpose. For each segment from 1a, produce claim bullets — or explicitly none. This is the only substage that can invent content.
Input. Segmented, speaker-attributed transcript from 1a and the
full preamble (including interpretation_defaults).
Output. Per-segment files under
pending/<ingest_id>/reading/segments/seg-NNN.md (one per segment,
frontmatter + rendered bullets), plus a sidecar
pending/<ingest_id>/reading/reading.yaml (default_speaker,
name_corrections, session_date). The wiki-side reading.md is
assembled from these at approval time, not written during generation.
Per-segment extraction. 1b processes each segment independently (trivially parallelizable). Empty bullet lists are allowed and expected — a "Rules discussion: grappling" segment typically yields no bullets, and that's the correct output. The bullet list's emptiness is information at review time.
Locator hints for downstream stages¶
Alongside each bullet, 1b emits a locator_hint range — a small
window around the bullet's anchor timestamp that downstream stages can
use to narrow search. The hint is internal pipeline metadata: it flows
from 1b through the planner into the extractor and is never surfaced
in reading.md.
Shape, per bullet:
bullet_index: 0
text: "King Theron's grandfather founded Aldara in the Second Age"
anchor: "0:04:32" # the point timestamp shown in reading.md
locator_hint: "0:04:25-0:04:50" # search window for Stage 3
The hint is a window, not a precise range: 1b picks an anchor that's approximately where the claim lands and pads it generously (default ±15s). The authoritative locator on the final proposal is produced by Stage 3, not by this hint.
Hand-edits to bullet timestamps in reading.md sync back to the
bullet's anchor; the locator_hint window is recentered on the
edited anchor at save time. This preserves the hint's usefulness after
routine timestamp corrections without requiring the human to think
about windows.
Anchor tolerance. When an LLM returns an anchor a few seconds
outside a segment's bounds — common with plain-text (.txt) sources
where Stage 1a invents second-based bounds — Stage 1b clamps the
anchor to the nearest boundary rather than failing. Anchors within
DEFAULT_ANCHOR_TOLERANCE_SECONDS (default 2.0s) of a boundary are
silently clamped and a warning is logged; anchors further outside
still raise Stage1bError. The anchor_tolerance_seconds kwarg on
run() overrides the default when needed.
Reading assembly¶
At approval, the wiki-side reading.md is assembled from all segment
files plus the sidecar. The assembled document interleaves segment
headers (from 1a) with their bullet lists (from 1b):
---
schema_version: 1
source_id: yt-abc123
source_name: "Worldbuilding Session 3: The Founding of Aldara"
source_url: https://youtube.com/watch?v=abc123
source_type: youtube
session_date: null # human fills in during review
ingested_at: 2026-04-20T14:35:12Z
default_speaker: DM
name_corrections:
# empty initially; human adds transcription fixes here
# "Fair-on": "Theron"
---
# Reading: Worldbuilding Session 3
## [[0:00:00-0:02:15]](https://youtube.com/watch?v=abc123&t=0) Introduction
Speaker: DM
The DM introduces the session and previews the topics covered.
## [[0:02:15-0:04:30]](https://youtube.com/watch?v=abc123&t=135) Rules discussion: grappling
Speaker: mixed
_No claims extracted from this segment._
## [[0:04:30-0:08:00]](https://youtube.com/watch?v=abc123&t=270) Founding of Aldara
Speaker: DM
- King Theron's grandfather founded Aldara in the Second Age [[0:04:32]](https://youtube.com/watch?v=abc123&t=272)
- The founding displaced an earlier elven presence [[0:05:14]](https://youtube.com/watch?v=abc123&t=314)
- In-world scholars dispute the exact founding year [[0:06:02]](https://youtube.com/watch?v=abc123&t=362)
Uncertainty flags from 1a are preserved in the assembled reading as inline markers the human can resolve. Segments with no extracted claims are rendered with an explicit "No claims extracted" marker so that empty segments are visible rather than invisible — the marker lets the human notice a segment that should have contributed but didn't.
See timestamps for how timestamps render as clickable links.
Name corrections¶
When the human notices a mishearing (e.g., "Fair-on" should be
"Theron"), they add it to the name_corrections map in
reading.yaml rather than find-replacing throughout the reading. The
tool applies the substitutions during rendering and passes the map to
downstream stages. Corrections are preserved across regenerations.
Corrections from approved readings can be promoted to the global
.transcription-corrections.yaml so future sources benefit
automatically — see
entity model.
Uncertainty flags¶
1a flags words, names, or attributions it's unsure about. Uncertainty flags appear inline in the assembled reading:
- [0:05:47] A proper noun here was unclear; appears to be a place name starting with V
The human resolves by listening to the audio (or using setting context), then replaces with the correct content.
Reading review¶
The reading-review engine operates over per-segment files under
pending/<ingest_id>/reading/segments/. Each segment carries one of
four statuses:
| Status | Meaning |
|---|---|
draft |
Fresh — not yet decided. |
accepted |
Reviewer approved; included in the assembled reading. |
skipped |
Reviewer skipped; body replaced with the "no claims" marker in the assembled reading. |
regenerating |
Flagged for re-summarisation (slice #5); blocks the gate. |
Deferred-commit semantics. The engine accumulates pending marks during the walk — nothing is written to disk until the reviewer commits. On commit, changed segment files are written atomically, then the gate predicate is evaluated.
Gate predicate. Every segment is accepted or skipped. When the
gate fires, reading_assembly.assemble renders the wiki-side
reading.md and writes it to
<wiki-repo>/sources/<source_id>/reading.md. The presence of this
file is the approval artefact — there is no reading_status
frontmatter flag.
Decision verbs: accept, skip-bullets,
regenerate-again (queue segment for quit-time re-summarisation; marks
segment regenerating, blocks gate, triggers parallel Stage 1b call on
[q]uit),
undo (clears the pending mark for one segment), commit (the quit path
that writes and evaluates the gate).
All committed status changes are produced by the reading-review engine; the command layer only translates keystrokes into engine decisions.
auto-lorebook approve-reading <source_id> --yes
--yes drives an AutoAcceptReviewer that marks every still-draft
segment accepted and commits unconditionally. The gate always fires
for fixtures where every segment is decidable.
approve-reading opens a hierarchical interactive session over the draft.
Outer view — numbered list of all segments with their current status and any pending mark for the session:
| Key | Action |
|---|---|
# |
Open the numbered segment in the per-segment prompt. |
n |
Jump to the next undecided draft segment. |
m |
Open reading.yaml (sidecar) in $EDITOR. |
q |
Commit pending marks. If every segment is now decided, write wiki-side reading.md (gate fires). |
Below the segment list, any persisted gap-check warnings are rendered as ⚠ Possible coverage gap: blocks (one per stretch, transcript order).
Per-segment prompt — shows segment body (up to 60 lines) and current / pending status:
| Key | Action |
|---|---|
a |
Accept: queue this segment for accepted status; return to outer. |
s |
Skip-bullets: queue this segment for skipped status; return to outer. |
g |
Regenerate-again: queue this segment for regenerating status; on [q]uit, this segment is re-summarised in parallel against a snapshot of accepted segments and returns to draft for re-decision. |
e |
Edit: open the segment file (seg-NNN.md) in $EDITOR. Stays in per-segment prompt on return. |
u |
Undo: clear this segment's pending mark. Stays in per-segment prompt. |
b |
Back: return to outer without changing any pending mark. |
Pending marks live in memory until [q]. On [q], the engine commits
all marks in one transaction and evaluates the gate. Ctrl-C at any
prompt exits 130 with no committed mutations; pending marks are not
persisted.
The outer segment list shows →regenerating for pending regenerate-again
marks.
Quit-time regeneration batch¶
When [q] commits and at least one segment has status regenerating, the
engine returns a RegenBatch instead of (or alongside) the gate check.
The gate cannot fire on the same quit that includes regenerating segments —
regenerating is not a decided status.
Snapshot. After the commit-write loop, the pipeline takes a snapshot
of all committed segments with status accepted. This snapshot becomes
the accepted-context for every re-summarised segment; flagged segments do
not see each other's regenerations.
Stage 1b user message for a regen call. The system preamble is unchanged. The user message gains a compact accepted-segments block before the target segment's transcript slice:
Accepted segments (context only — do not re-extract):
## seg-001 [0:00:00–0:02:15] Introduction (DM)
- Intro bullet [0:00:15]
---
Segment seg-002: "Rules discussion"
Range: ...
Transcript for this segment:
<sliced transcript>
After regen. Regenerated segments' bullets.yaml entries and
seg-NNN.md files are rewritten; status is reset to draft for
re-decision in the next review session.
Exit message. [q] with a regen batch prints "Still N undecided" —
the gate cannot fire on the same quit that regenerates.
--yes skips the loop and auto-approves; required for non-TTY runs
(scripts, CI).
After approval, the reading is committed to the wiki alongside the
raw transcript. The intermediate structure.yaml is retained in the
pending directory as an audit artifact for the lifetime of the ingest,
then discarded when the ingest is fully completed or rejected. Future
re-runs of extraction operate on the approved reading.
Running Stage 2 and Stage 3 is done via separate commands after
approval — auto-lorebook plan <id> and auto-lorebook extract <id>.
See the CLI reference for details.
Regenerating substages¶
If reading review reveals the structure (segmentation or attribution) is badly wrong in ways that are tedious to fix by hand, re-run from a given point:
auto-lorebook regenerate-reading <source_id> --from=structure # reruns 1a, 1b
auto-lorebook regenerate-reading <source_id> --from=summarize # reruns 1b only
auto-lorebook regenerate-reading <source_id> --from=summarize --segments seg-003,seg-007
# reruns 1b on listed segments only
Per-segment 1b regeneration is cheap because 1b is parallelized per-segment; if one segment's bullets are clearly wrong but the rest are fine, this leaves the rest of the review work untouched.
name_corrections in frontmatter are preserved across all
regenerations. Human edits to the reading body are preserved by
per-segment 1b regeneration but discarded by full-reading
regenerations. If hand edits are worth keeping, approve the reading;
if the machine output is too broken to edit, regenerate from scratch.
Next stage: Stage 2 planner.