Domain Discovery

File: src/enrichment/sources/domain.ts. Three-tier resolution, in order.

Source: docs/SYSTEM_OVERVIEW.md § Domain discovery.

Tiers

  1. Redis cache — 30-day TTL on previously resolved domains.
  2. IIS .se zone registry — fuzzy pg_trgm search across 1.4M .se domains stored in Postgres.
  3. Direct HTTP check — generates candidate domains from company name and trade names, fetches each, scores the response.

Serper-based domain search was removed (ToS uncertainty + exhausted credits). No search-API fallback exists.

Scoring

Threshold: a domain passes if score ≥ 0.40.

if (companyNameInPage(companyName, html))    score += 0.40;
if (html.includes(orgNr))                    score += 0.30;
if (!nameInPage && domainContainsName(...))  score += 0.25;
if (/kontakt|mailto:|tel:/i.test(html))       score += 0.15;
if (/om oss|about us/i.test(html))            score += 0.15;
if (businessTypeMismatch(companyName, html))  score -= 0.30;

Domain blocklist

See Domain Blocklist. 140+ domains never accepted regardless of score.

Accuracy

100% domain accuracy among the 78 companies (of 145 tested) that produced contacts — i.e. no wrong-domain false matches when a domain was found and the crawl produced contacts. Source: docs/SYSTEM_OVERVIEW.md § Results → 29-round aggregate.

See also

Domain Blocklist, Crawlee Scraper, EnrichV7.