Pipeline
Three BullMQ workers in src/queues/workers.ts, called in sequence per org_nr.
Flow
Scrape_Job
→ check reklamspärr (SCB advertising_block)
→ check opt-out (OptOut_Hashes)
→ if clear: enqueue Enrich_Job
Enrich_Job
→ run enrichV7(input)
→ enqueue Update_Job
Update_Job
→ upsert into companies (enriched_data JSONB)
→ append to RoPA_Log
→ set Redis cache (6-month TTL)
Source: docs/SYSTEM_OVERVIEW.md § How enrichment works → The queue pipeline.
Stages
- Scrape_Job: gates only (compliance + dedup). Should call
isScbAdvertisingBlocked()and check Opt-Out Hashes. - Enrich_Job: runs EnrichV7 (BV API + Domain Discovery + Google Places + Crawlee Scraper + Lead Scoring).
- Update_Job: persists to
companies, appends to RoPA Log, sets Redis 6-month TTL cache.
P0 — Reklamspärr gap
Warning
src/queues/workers.tsdoes NOT callisScbAdvertisingBlocked(). The check lives only insideenrichV7(), which runs after Scrape_Job has already enqueued. Companies withadvertising_block = trueget enriched anyway. GDPR compliance gap. See Reklamspärr and Known Issues.
Concurrency limit
Warning
Playwright runs synchronously inside the Enrich_Job worker thread. At >40 concurrent jobs the worker pool will OOM or deadlock. Decoupling Playwright into its own pool is required before raising concurrency. P1 in Known Issues.
Cache
- Key:
enrich:${org_nr} - TTL: 6 months
- Bypass with
input.bypass_cache = true