Report index / reports-and-code
SAME_AUTHOR_FIX_TOOLBOX.md

Source: /Users/borker/dev/hybrid-blog-writer-26-voice-pipeline/docs/improvements/SAME_AUTHOR_FIX_TOOLBOX.md
# Same-author fix toolbox

Date: 2026-05-18

Goal: reach at least 52/61 Pete articles passing `same_author_llm` while
preserving author voice and avoiding repeated batch tells.

## Current result

The best post-hoc path is still only 3/10 on the first-ten sample:

- baseline repeated same-author vote: 2/10
- hybrid full-brief opening + section-brief body: 3/10
- portfolio selection across existing raw candidates: 3/10

That makes the answer fairly sharp: post-hoc repair is not the complete fix.
It is a safety net for individual articles, not the production strategy.

## Tools that help

1. `experiments/same_author_lift/run.py`
   - runs single/multi/any same-author judges
   - generates candidate transforms
   - gates candidates by same-author, word count, slop, drift, scaffold hits,
     and hostile quality guard
   - useful transforms: `hybrid-brief-section`, `section-brief`,
     `plain-brief`, `brief`, `slop-cleanup`, `candidate-pool`

2. `experiments/same_author_lift/portfolio.py`
   - selects the best existing raw candidate per article
   - now uses cheap deterministic gates and a one-vote same-author probe before
     expensive three-vote confirmation and full quality guard
   - useful as a final selector, not as a generator

3. Deterministic fingerprint checks
   - `voice_pipeline.metrics.analyze_text`
   - `voice_pipeline.drift.detect_drift`
   - `voice_pipeline.slop.audit_text`
   - scaffold document frequency from `run.py`
   - repeated 5/6-gram batch scan excluding Scripture quotations

4. LLM quality guards
   - production `same_author_llm` for the target metric
   - multi-excerpt author judge for a fairer voice signal
   - full hostile-editor guard for meaning, facts, voice delta, and new
     repetitive tells

5. Other-agent worktree tools
   - adopt optional diagnostic scalpel for local paragraph issues
   - use model routing only where evidence supports it, especially Kimi for
     tech topics
   - do not adopt moves rewrite, drift forcing, or aggressive rule polish

## Tools that do not fix it

- Rule-only polish: hits metrics but strips voice-bearing quirks.
- Drift-only surgical edit: tries to manufacture floor metrics and does not
  move authorship.
- Moves-augmented rewrite: creates visible checklist/listicle prose.
- Opener-only repair: preserves meaning but usually does not move the
  production first-300-word same-author judge.
- Directly imitating the 68-word devotional seed excerpt: creates a new
  repetitive fingerprint and still fails.

## Complete-fix path

The next implementation should be a generation-time variant of
`simple_writer`, not another retrofit pass.

Required changes:

1. Replace "soul amplification" with seed-first prompting.
   - Do not ask for a clever hook.
   - Do not require a scene or invented personal anecdote.
   - Remove the "10% useless residue" instruction for this author.
   - Avoid punchline titles and dramatic first-person premises.
   - Use the seed corpus and measured examples as the source of voice, not the
     derived persona summary alone.

2. Generate candidates from the topic/brief, not from the finished synthetic
   prose.
   - Preserve SEO facts, headings, citations, and claims.
   - Let the model compose fresh article prose from a factual brief.
   - Run per-article model routing only after hostile same-author and quality
     gates.

3. Add a batch-level publication gate.
   - max document frequency for scaffold phrases
   - max document frequency for non-quote repeated 5/6-grams
   - systematic drift offender report
   - same-author majority vote over multiple excerpts
   - production single-excerpt same-author retained only because it is the
     current target metric

4. Use portfolio selection only after generation.
   - keep source when source already passes
   - accept generated candidates only when they pass same-author without
     meaning/fact loss or new repetitive tells

5. Increase seed volume if possible.
   - 5k words is enough to make a caricature and a fragile judge
   - 20k-50k words would make the voice target materially less noisy

## Recommendation

Fold into production only after a first-ten regeneration variant clears at
least 8/10 under repeated same-author vote with no batch-tell regression. At
that point scale to all 61. If first ten remains around 3/10, stop; the prompt
is still protecting the synthetic house voice rather than the real seed voice.