Autonomous experiment system in autoresearch/. Inspired by karpathy/autoresearch. Instead of minimising training loss, it maximises contact extraction quality.

Source: docs/SYSTEM_OVERVIEW.md § The autoresearch loop and autoresearch/program.md.

The loop

read latest results
→ pick one change (from a backlog or freeform)
→ edit allowed files
→ run: USE_CRAWLEE=true bun autoresearch/experiment.ts --tag <name>
→ compare composite score to previous
→ if score increased ≥ 2 points AND false_positive_rate = 0%: git commit, continue
→ otherwise: revert, try something else

Composite score formula

score = extraction_rate          × 40
      + min(avg_contacts / 5, 1) × 20
      + (1 − false_positive_rate)× 20
      + email_coverage            × 10
      + phone_coverage            × 10

Allowed file scope (agent)

Can modify:

  • crawlee.ts, maps.ts, nameUtils.ts, emailUtils.ts, phoneUtils.ts, config.ts, domain.ts, companies.json

Cannot touch:

  • pipeline.ts, experiment.ts, metrics.ts, types.ts

FPR rule

Warning

Decision rule: FPR > 0% triggers immediate revert. Several early experiments (including jsonld-v2, the current best, FPR 21.2%) ran before the FPR gate was tightened and were committed despite non-zero FPR. The rule has been enforced strictly since round 15.

Progress

  • 29 rounds completed
  • 145 unique companies tested
  • 78 (54%) produced ≥ 1 contact
  • ~450+ total contacts extracted
  • 82/82 unit tests passing

Biggest single gain: JSON-LD Extraction in round 1 (+18.7 composite vs baseline).

Three runner variants

autoresearch/ now ships three different loop entry points. They share the metric module (metrics.ts) but pick companies and report progress differently.

experiment.ts — original tagged runner

The canonical one referenced in the loop description above.

  • Reads companies from autoresearch/companies.json (a fixed test set).
  • Runs once, writes results/<tag>.json and appends one line to history.jsonl.
  • Single-source: Crawlee only (USE_CRAWLEE=true required).
  • This is the runner the FPR rule and the 29-round progression are scored against.

loop-v2.ts — production-style multi-source

Untracked file added 2026-04-06 (~498 lines). Tests the current production pipeline rather than an isolated experiment.

  • Picks active AB companies straight from the database, skipping any cached recently.
  • Performs live domain discovery (no pre-known domains in companies.json).
  • Runs all three sources head-to-head: Crawlee, Firecrawl, Google Places.
  • CLI: bun autoresearch/loop-v2.ts [--companies N] [--source crawlee|firecrawl|maps] [--compare].
  • Output: in-process metrics summary; does not write a tagged JSON.

Use this when you want to know how the live pipeline performs on real DB rows, not when you want a clean before/after diff for the loop.

loop-continuous.ts — never-stops monitor

Untracked file added 2026-04-06 (~469 lines). A long-running variant of loop-v2.

  • Runs indefinitely until SIGINT; rotates through DB companies.
  • Live dashboard prints running stats and tracks the best-performing configuration over the session.
  • Per-company results appended to results/continuous-history.jsonl (one line per company; see Autoresearch Result Types for the schema).
  • Convenience wrapper: autoresearch/run-loop.sh (untracked, 510B).

Operational, not experimental — leave it running to gather drift data, not to score a single change.

See also

Experiment Results, Autoresearch Result Types, JSON-LD Extraction, Name Validation, Crawlee Scraper, Google Places.