Tests for the EnrichV7 pipeline, contact extractors, and supporting processors. Mostly fast and deterministic — uses inline HTML fixtures and injected mock clients. Two integration tests hit the live network.
Files
| File | Lines | Network? |
|---|---|---|
tests/enrichment/processors.test.ts | 183 | No |
tests/enrichment/crawlee-quality.test.ts | 308 | No |
tests/enrichment/firecrawl.test.ts | 512 | No (mock client) |
tests/enrichment/board-members-integration.test.ts | 31 | Yes — calls enrichV7() end-to-end |
src/enrichmentEngine.v7.test.ts | 339 | No |
autoresearch/regression.test.ts | 283 | No — see Autoresearch Loop |
processors.test.ts
Unit tests for src/enrichment/processors/:
inferRoleType: VD/CEO → VD; CFO/Ekonomichef → CFO; PR-konsult / Copywriter / Kommunikationskonsult → Marknadschef; Rekryterare / Talent Manager → Personalansvarig; unknown → ÖvrignormalizeName: strips Swedish/accent diacritics (André→andre,Björn→bjorn,Åsa→asa)maybeFlipSwedishName: flipsKarlsson Anna→Anna Karlsson; leaves three-word names alonenormalizeCompanyName: strips trailingAB,HB,EFextractBrandName:VendFox Solutions AB→VendFox(strips generic word + suffix); returnsundefinedwhen no generic wordisValidPersonName: see Name Validation — accepts Swedish full names, rejects company suffixes, emails, single words, nav phrasesnormSe: Swedish-char + non-alphanumeric strip (Åsa→asa)extractEmails: dedupe, lowercasegenerateEmailGuesses: producesfirstname.lastname@domainfirst, allconfidence: 'low'andsource: 'generated_guess'detectEmailPattern: pattern detection from observed addressesextractPhones: Swedish mobile (+46/07x) taggedmobile, landline (e.g.08-) taggedlandlinecalculateLeadScore: 0 for empty, capped at 10, awards points for VD contact
crawlee-quality.test.ts
Tests extractContactsFromText and extractContactsFromImgAlts from src/enrichment/sources/crawlee.ts using inline HTML/text fixtures.
Key assertions:
- Three-contact vendfox-style fixture: name + role + email + phone correctly attached, no email bleed-over to the next contact
- Phone type: mobile vs landline classified
- Rejects nav phrases (
Om Oss), single words, company-suffix names (Bolaget AB) - Image-alt extraction:
<img src="/team/anna.jpg" alt="Anna Karlsson VD" />→ contact withname='Anna Karlsson',role='VD' - Strips company-name words from extracted role (
vendfoxremoved when company =VendFox Solutions AB) - All-caps role tokens (VD, CEO) treated as role, not part of name
- Path separator handling:
team-anna.jpg,team_photo_anna.jpgboth match the team-image heuristic - Filename false-positive guards: rejects
More_PR_Team,Photo1 - Single-quoted HTML attributes parsed (regression for the bug noted in
feedback_crawlee_patterns.md) - Dutch/Swedish name particles (
van,de,af) accepted inferRoleTypePR/communications expansion verified
firecrawl.test.ts
Tests src/enrichment/sources/firecrawl.ts without a real Firecrawl API key. Uses the test-only hooks _setClientForTest() and _resetClientForTest().
Coverage:
discoverContactPages: caps at 3 results; accepts/kontakt,/contact,/om-oss,/team,/medarbetare,/ledning,/styrelse; depth filter accepts depth ≤ 2 (/en/contactok,/nyheter/post/aboutrejected)CONTACT_PAGE_PATTERNSregex matches expected paths and rejects/blog,/products,/nyheterhttps://guard:https://acme.senot double-prefixed tohttps://https://acme.se- Missing
FIRECRAWL_API_KEYthrows an informative error (does not silently return failed) - With mock client: returns shape compatible with
scrapeWebsite()—{contacts, emails, phones, tech_stack, social_links, services, method, headline, description} isValidPersonNamefilter applied to LLM output:World Trade CenterandKontaktfiltered out- Dedupe by lowercase name; emails lowercased and deduped; services capped at 8 and deduped
tech_stackalways[](Firecrawl extraction does not produce it)- Multi-page merge: contacts combine across pages; headline from first successful page only
guessContactUrls: probes common paths via injected fetch, falls back to homepage on all-404, capped at 3extractContactsFromHtmlcheerio fallback: finds names from<h3>, attaches nearbymailto:email, setsconfidence: 'low',source: 'firecrawl-html-fallback'- HTML fallback used when Firecrawl returns 0 contacts but HTML contains names
board-members-integration.test.ts
Two integration tests that invoke enrichV7() against org_nr 5565672655 (Gordons Project AB) with bypass_cache: true. 30 second timeout each.
Warning
These tests hit live BV API + Serper + websites and will fail without network or with rate-limited keys. They assert only structural shape (
contactsis array,updated_fieldsis array,company.org_nrmatches) — not specific contact content.
src/enrichmentEngine.v7.test.ts
Unit tests for the v7 helpers exported from src/enrichment/index.ts. Overlaps significantly with processors.test.ts — same inferRoleType / isValidPersonName / normalizeCompanyName / normSe / extractPhones / extractEmails cases — plus the Article 14 notification module (src/lib/article14Notification.ts) tested with a mock pg pool: NO_EMAIL short-circuit, alreadySent detection via SELECT, RoPA insert on attempt, bulk notifyDataSubjects. See Article 14.
Note
processors.test.tsandsrc/enrichmentEngine.v7.test.tsduplicate ~40% of their assertions. Either could be deleted with no coverage loss. Same applies to the role-mapping cases inautoresearch/regression.test.ts.
See also
Test Strategy, EnrichV7, Crawlee Scraper, Firecrawl, Name Validation, Autoresearch Loop.