System Overview

DBPOC is a B2B data enrichment backend for Swedish SMEs. Input: an org number. Output: company website, named contacts (name/role/email/phone), Google Maps data, and a 0–10 lead score, persisted to Postgres.

Source of truth: docs/SYSTEM_OVERVIEW.md (last updated 6 April 2026, based on 29 experiment rounds and 145 companies tested end-to-end).

What it does

  • Ingests bulk public registries: Bolagsverket (1,794,801 rows, 651,611 active) and SCB (646,127 rows). See Bolagsverket Import, SCB Import.
  • Looks up the company via fetchBolagsverketOrganisation() (VärdefullaDatamängder API).
  • Discovers the website via Domain Discovery (Redis cache → IIS .se zone → HTTP scoring).
  • Pulls phone/address/rating from Google Places (direct API, not Serper).
  • Scrapes the website with the Crawlee Scraper — 6 extraction strategies, up to 12 pages.
  • Validates every contact name with Name Validation.
  • Scores the lead 0–10 (see Lead Scoring).
  • Stores result in companies.enriched_data (JSONB).

Scale

  • 810,824 rows in bolagsverket_companies (current DB)
  • 646,127 rows in scb_foundations
  • 16 companies enriched end-to-end so far (POC)
  • 78/145 (54%) test companies produced ≥ 1 contact across all rounds
  • ~450+ total contacts extracted across all experiments

Warning

This is a POC. Multiple P0 GDPR/correctness bugs are open — see Known Issues before considering production.

Stack

See Stack. Bun, TypeScript strict, Postgres 15 + pgvector, BullMQ on Redis 7, Crawlee + Playwright, Keycloak 22.

Layout

See Repository Layout for the src/, migrations/, autoresearch/, docs/ tree.

Pipeline

Three BullMQ workers in sequence: Scrape_Job → Enrich_Job → Update_Job. See Pipeline and EnrichV7.

See also

Stack, Repository Layout, Pipeline, EnrichV7, Known Issues, Experiment Results, Autoresearch Loop, GDPR Legitimate Interest.