Report index / reports-and-code
SAME_AUTHOR_FIX_TOOLBOX.md
Source: /Users/borker/dev/hybrid-blog-writer-26-voice-pipeline/docs/improvements/SAME_AUTHOR_FIX_TOOLBOX.md
# Same-author fix toolbox
Date: 2026-05-18
Goal: reach at least 52/61 Pete articles passing `same_author_llm` while
preserving author voice and avoiding repeated batch tells.
## Current result
The best post-hoc path is still only 3/10 on the first-ten sample:
- baseline repeated same-author vote: 2/10
- hybrid full-brief opening + section-brief body: 3/10
- portfolio selection across existing raw candidates: 3/10
That makes the answer fairly sharp: post-hoc repair is not the complete fix.
It is a safety net for individual articles, not the production strategy.
## Tools that help
1. `experiments/same_author_lift/run.py`
- runs single/multi/any same-author judges
- generates candidate transforms
- gates candidates by same-author, word count, slop, drift, scaffold hits,
and hostile quality guard
- useful transforms: `hybrid-brief-section`, `section-brief`,
`plain-brief`, `brief`, `slop-cleanup`, `candidate-pool`
2. `experiments/same_author_lift/portfolio.py`
- selects the best existing raw candidate per article
- now uses cheap deterministic gates and a one-vote same-author probe before
expensive three-vote confirmation and full quality guard
- useful as a final selector, not as a generator
3. Deterministic fingerprint checks
- `voice_pipeline.metrics.analyze_text`
- `voice_pipeline.drift.detect_drift`
- `voice_pipeline.slop.audit_text`
- scaffold document frequency from `run.py`
- repeated 5/6-gram batch scan excluding Scripture quotations
4. LLM quality guards
- production `same_author_llm` for the target metric
- multi-excerpt author judge for a fairer voice signal
- full hostile-editor guard for meaning, facts, voice delta, and new
repetitive tells
5. Other-agent worktree tools
- adopt optional diagnostic scalpel for local paragraph issues
- use model routing only where evidence supports it, especially Kimi for
tech topics
- do not adopt moves rewrite, drift forcing, or aggressive rule polish
## Tools that do not fix it
- Rule-only polish: hits metrics but strips voice-bearing quirks.
- Drift-only surgical edit: tries to manufacture floor metrics and does not
move authorship.
- Moves-augmented rewrite: creates visible checklist/listicle prose.
- Opener-only repair: preserves meaning but usually does not move the
production first-300-word same-author judge.
- Directly imitating the 68-word devotional seed excerpt: creates a new
repetitive fingerprint and still fails.
## Complete-fix path
The next implementation should be a generation-time variant of
`simple_writer`, not another retrofit pass.
Required changes:
1. Replace "soul amplification" with seed-first prompting.
- Do not ask for a clever hook.
- Do not require a scene or invented personal anecdote.
- Remove the "10% useless residue" instruction for this author.
- Avoid punchline titles and dramatic first-person premises.
- Use the seed corpus and measured examples as the source of voice, not the
derived persona summary alone.
2. Generate candidates from the topic/brief, not from the finished synthetic
prose.
- Preserve SEO facts, headings, citations, and claims.
- Let the model compose fresh article prose from a factual brief.
- Run per-article model routing only after hostile same-author and quality
gates.
3. Add a batch-level publication gate.
- max document frequency for scaffold phrases
- max document frequency for non-quote repeated 5/6-grams
- systematic drift offender report
- same-author majority vote over multiple excerpts
- production single-excerpt same-author retained only because it is the
current target metric
4. Use portfolio selection only after generation.
- keep source when source already passes
- accept generated candidates only when they pass same-author without
meaning/fact loss or new repetitive tells
5. Increase seed volume if possible.
- 5k words is enough to make a caricature and a fragile judge
- 20k-50k words would make the voice target materially less noisy
## Recommendation
Fold into production only after a first-ten regeneration variant clears at
least 8/10 under repeated same-author vote with no batch-tell regression. At
that point scale to all 61. If first ten remains around 3/10, stop; the prompt
is still protecting the synthetic house voice rather than the real seed voice.