Autonomous experiment system in autoresearch/. Inspired by karpathy/autoresearch. Instead of minimising training loss, it maximises contact extraction quality.
Source: docs/SYSTEM_OVERVIEW.md § The autoresearch loop and autoresearch/program.md.
The loop
read latest results
→ pick one change (from a backlog or freeform)
→ edit allowed files
→ run: USE_CRAWLEE=true bun autoresearch/experiment.ts --tag <name>
→ compare composite score to previous
→ if score increased ≥ 2 points AND false_positive_rate = 0%: git commit, continue
→ otherwise: revert, try something elseComposite score formula
score = extraction_rate × 40
+ min(avg_contacts / 5, 1) × 20
+ (1 − false_positive_rate)× 20
+ email_coverage × 10
+ phone_coverage × 10Allowed file scope (agent)
Can modify:
crawlee.ts,maps.ts,nameUtils.ts,emailUtils.ts,phoneUtils.ts,config.ts,domain.ts,companies.json
Cannot touch:
pipeline.ts,experiment.ts,metrics.ts,types.ts
FPR rule
Warning
Decision rule: FPR > 0% triggers immediate revert. Several early experiments (including
jsonld-v2, the current best, FPR 21.2%) ran before the FPR gate was tightened and were committed despite non-zero FPR. The rule has been enforced strictly since round 15.
Progress
- 29 rounds completed
- 145 unique companies tested
- 78 (54%) produced ≥ 1 contact
- ~450+ total contacts extracted
- 82/82 unit tests passing
Biggest single gain: JSON-LD Extraction in round 1 (+18.7 composite vs baseline).
Three runner variants
autoresearch/ now ships three different loop entry points. They share the metric module (metrics.ts) but pick companies and report progress differently.
experiment.ts — original tagged runner
The canonical one referenced in the loop description above.
- Reads companies from
autoresearch/companies.json(a fixed test set). - Runs once, writes
results/<tag>.jsonand appends one line tohistory.jsonl. - Single-source: Crawlee only (
USE_CRAWLEE=truerequired). - This is the runner the FPR rule and the 29-round progression are scored against.
loop-v2.ts — production-style multi-source
Untracked file added 2026-04-06 (~498 lines). Tests the current production pipeline rather than an isolated experiment.
- Picks active AB companies straight from the database, skipping any cached recently.
- Performs live domain discovery (no pre-known domains in
companies.json). - Runs all three sources head-to-head: Crawlee, Firecrawl, Google Places.
- CLI:
bun autoresearch/loop-v2.ts [--companies N] [--source crawlee|firecrawl|maps] [--compare]. - Output: in-process metrics summary; does not write a tagged JSON.
Use this when you want to know how the live pipeline performs on real DB rows, not when you want a clean before/after diff for the loop.
loop-continuous.ts — never-stops monitor
Untracked file added 2026-04-06 (~469 lines). A long-running variant of loop-v2.
- Runs indefinitely until SIGINT; rotates through DB companies.
- Live dashboard prints running stats and tracks the best-performing configuration over the session.
- Per-company results appended to
results/continuous-history.jsonl(one line per company; see Autoresearch Result Types for the schema). - Convenience wrapper:
autoresearch/run-loop.sh(untracked, 510B).
Operational, not experimental — leave it running to gather drift data, not to score a single change.
See also
Experiment Results, Autoresearch Result Types, JSON-LD Extraction, Name Validation, Crawlee Scraper, Google Places.