System Overview
DBPOC is a B2B data enrichment backend for Swedish SMEs. Input: an org number. Output: company website, named contacts (name/role/email/phone), Google Maps data, and a 0–10 lead score, persisted to Postgres.
Source of truth: docs/SYSTEM_OVERVIEW.md (last updated 6 April 2026, based on 29 experiment rounds and 145 companies tested end-to-end).
What it does
- Ingests bulk public registries: Bolagsverket (1,794,801 rows, 651,611 active) and SCB (646,127 rows). See Bolagsverket Import, SCB Import.
- Looks up the company via
fetchBolagsverketOrganisation()(VärdefullaDatamängder API). - Discovers the website via Domain Discovery (Redis cache → IIS .se zone → HTTP scoring).
- Pulls phone/address/rating from Google Places (direct API, not Serper).
- Scrapes the website with the Crawlee Scraper — 6 extraction strategies, up to 12 pages.
- Validates every contact name with Name Validation.
- Scores the lead 0–10 (see Lead Scoring).
- Stores result in
companies.enriched_data(JSONB).
Scale
- 810,824 rows in
bolagsverket_companies(current DB) - 646,127 rows in
scb_foundations - 16 companies enriched end-to-end so far (POC)
- 78/145 (54%) test companies produced ≥ 1 contact across all rounds
- ~450+ total contacts extracted across all experiments
Warning
This is a POC. Multiple P0 GDPR/correctness bugs are open — see Known Issues before considering production.
Stack
See Stack. Bun, TypeScript strict, Postgres 15 + pgvector, BullMQ on Redis 7, Crawlee + Playwright, Keycloak 22.
Layout
See Repository Layout for the src/, migrations/, autoresearch/, docs/ tree.
Pipeline
Three BullMQ workers in sequence: Scrape_Job → Enrich_Job → Update_Job. See Pipeline and EnrichV7.
See also
Stack, Repository Layout, Pipeline, EnrichV7, Known Issues, Experiment Results, Autoresearch Loop, GDPR Legitimate Interest.