Domain Blocklist
140+ domains in src/enrichment/config.ts (under INVALID_DOMAINS). Never accepted by Domain Discovery regardless of score.
Source: docs/SYSTEM_OVERVIEW.md § Domain discovery → Domain blocklist.
Categories
- Swedish company directories —
allabolag.se,ratsit.se,proff.se,eniro.se,hitta.se - B2B data vendors —
rocketreach.co,apollo.io,zoominfo.com,lusha.com - Social platforms —
facebook.com,linkedin.com,instagram.com, etc. - Generic hosts — parking pages, CDN landings, registrar placeholders
Why directories are blocked
ToS of allabolag.se, ratsit.se, proff.se explicitly prohibits commercial data extraction. Treating them as the canonical company website would also degrade extraction quality (their pages list many companies, not one).
Maintenance
The list is hand-curated. Additions come from autoresearch false-positive analysis: when a wrong-domain match shows up in an experiment, the domain is added here. Full file: src/enrichment/config.ts (~790 lines includes name blocklists too).
See also
Domain Discovery, Name Validation, Known Issues.