Experiment History
Complete record of all 29 enrichment optimization rounds
Each round tested on 10 Swedish companies. Composite score formula: extraction_rate×40 + min(avg_contacts/5,1)×20 + (1-FPR)×20 + email_coverage×10 + phone_coverage×10
Experiment Results Summary
| Tag | Date | Score | Extraction | FPR | Avg Time | Companies | Notes |
|---|---|---|---|---|---|---|---|
| baseline | — | 46.7 | 60% | 37.5% | — | 10 | Initial config |
| jsonld-extraction | — | 65.4 | 80% | 6.3% | — | 10 | Added JSON-LD strategy |
| google-places-v2 | 2026-04-02 | 76.1 | 80% | 23.4% | 24.7s | 10 | Enhanced Maps integration |
| jsonld-v2 | 2026-04-02 | 81.6 | 90% | 21.2% | 19.4s | 10 | Best overall score |
| extraction-v7 | 2026-04-02 | 63.7 | 50% | 21.4% | 43.8s | 10 | Production config |
| quick-test | 2026-04-02 | 93.8 | 100% | 0% | 26.6s | 1 | Single company (inviatech AB) |
| active-companies | 2026-04-02 | 78.9 | 80% | 0% | 20.5s | 10 | Active company subset |
| final-clean | 2026-04-02 | 76.7 | 70% | 0% | 28.0s | 10 | Clean run |
| db-companies | 2026-04-02 | — | — | — | — | 10 | Real DB companies |
| current-test | 2026-04-06 | 20.0 | 0% | 0% | 1.3s | 5 | Failed run (cache hit?) |
| current-test-3 | 2026-04-06 | 50.7 | 67% | 0% | 23.5s | 3 | Partial test |
| stockholm-ab-v3 | 2026-04-02 | — | — | — | — | — | Stockholm subset |
| uppsala-ab-v2 | 2026-04-02 | — | — | — | — | — | Uppsala subset |
| email-association | 2026-04-02 | — | — | — | — | — | Email matching test |
Key Learnings from Experiments
1. JSON-LD is the Biggest Quality Win
- Round 1 added JSON-LD extraction: +18.7 composite score points vs baseline
jsonld-v2achieved best-ever score: 81.6 (90% extraction, 21.2% FPR)- JSON-LD structured data (schema.org Person/Organization) is gold for contact extraction
2. Production Config Lags Best Config
| Metric | Best (jsonld-v2) | Production (extraction-v7) | Gap |
|---|---|---|---|
| Score | 81.6 | 63.7 | -17.9 |
| Extraction | 90% | 50% | -40% |
| FPR | 21.2% | 21.4% | ~same |
| Time | 19.4s | 43.8s | +2.3x |
Why the gap?
- Production includes timeouts (
konsultopia.se,frimedia.se) - One parked domain in test set
- FPR still 21.4% — UI phrases slipping through name validation
3. False Positive Rate is the Hardest Problem
- Best achieved: 0% (single well-structured site,
quick-test) jsonld-v2: 21.2% FPRextraction-v7: 21.4% FPR- Root cause: UI phrases accepted as names (“Kontakta oss”, “Om oss”, etc.)
- Fix is batchable: add UI phrases to
INVALID_NAME_STANDALONE_WORDS
4. Name Validation Evolution
The blocklist has been refined through painful iteration:
| Change | Reason |
|---|---|
| REMOVED ‘juni’, ‘juli’, ‘augusti’ | Too aggressive — blocked valid names like “Julia” |
| REMOVED ‘popular’, ‘cities’, ‘visit’ | Too aggressive — blocked valid names |
| REMOVED ‘hall’, ‘hallen’ | Valid Swedish surnames (e.g., “Rafaela Hall”) |
| REMOVED generic web/commerce words | Blocked “Christoffersson” |
Lesson: Aggressive blocklists hurt more than they help. Better to have 21% FPR than miss valid contacts.
5. Single-Company Best Case
quick-teston inviatech AB: 93.8 score, 100% extraction, 0% FPR, 26.6s- 32 contacts extracted from one company (best single result)
- Shows the pipeline CAN achieve perfection on well-structured sites
6. Domain Discovery Accuracy
- When contacts are found, domain accuracy is 100%
- Domain discovery is NOT the bottleneck — contact extraction is
- IIS .se zone registry (1.4M domains) + fuzzy matching works well
7. Time vs Quality Trade-off
jsonld-v2: 19.4s avg — fastest high-quality configextraction-v7: 43.8s avg — slowest (includes Firecrawl fallback)quick-test: 26.6s — single company baseline
Experiment System Architecture
| File | Purpose |
|---|---|
autoresearch/experiment.ts | Single-run experiment: domain discovery + Crawlee scraping |
autoresearch/loop.ts | Autonomous loop: run → analyze → suggest → repeat |
autoresearch/loop-v2.ts | Production-style testing: real DB companies, all sources |
autoresearch/loop-continuous.ts | Infinite loop with live dashboard |
autoresearch/analyze.ts | Results analyzer: compare, rank, report, suggest |
autoresearch/metrics.ts | Composite score calculation |
autoresearch/regression.test.ts | Guards against quality degradation |
Test Companies
10 Swedish companies with known domains:
- More PR AB → morepr.se
- VendFox AB → vendfox.se
- Uppsala Innovation Centre AB → uic.se
- Gordons Project AB → gordonsproject.se
- Opusett AB → opusett.se
- Ungdomsbarometern AB → ungdomsbarometern.se
- Samkonsult AB → samkonsult.se
- Navet Uppsala AB → navetuppsala.se
- Aptus Uppsala AB → aptus.se
- inviatech AB → inviatech.se (best performer)
Regression Tests
autoresearch/regression.test.ts guards against:
- Name validation regressions (known valid/invalid names)
- Role mapping regressions (known role → bucket mappings)
- Must pass before any config change is committed