Repository Layout Complete

Every file in src/ mapped to its purpose

89 TypeScript files across 12 directories. Complete catalog for navigation.

File Count by Directory

DirectoryFilesPurpose
src/api/16REST API endpoints + middleware
src/enrichment/16Enrichment pipeline + sources + processors
src/lib/9Shared utilities (Redis, SMTP, notifications, etc.)
src/fetchers/9External data fetchers (Bolagsverket, SCB, news)
src/import/8Bulk import scripts
src/workers/6BullMQ worker implementations
src/validation/3Validation layer implementations
src/extractors/1Website extraction (Playwright)
src/db/3Database connection + schema
src/compliance/1GDPR compliance utilities
src/mocks/2Mock data for testing
Root src/15Entry points, engines, scrapers

API Layer (src/api/)

FileExportsPurpose
index.tsapp (Express/Bun.serve)API entry point, route registration
auth.tsauthHandlersLogin, register, JWT token refresh
companies.tscompanyHandlersCRUD for companies table
leads.tsleadHandlersCRUD for leads table + validation
kundkort.tskundkortHandlersCompany card UI backend (search, detail, enrich, export)
search.tssearchHandlersFull-text, advanced, semantic search
scrape.tsscrapeHandlersOn-demand website scraping
export.tsexportHandlersCSV/JSON export with GDPR filtering
users.tsuserHandlersUser management
organizations.tsorganizationHandlersTenant/org management
projects.tsprojectHandlersProject management
documents.tsdocumentHandlersDocument storage with embeddings
enrichmentErrors.tsenrichmentErrorHandlersError tracking API
validation.tsvalidationHandlersValidation layer API
middleware/auth.tsrequireAuth, requireRoleJWT + RBAC middleware
middleware/rate-limit.tsrateLimitMiddlewareRedis-based sliding window rate limiting

Enrichment Pipeline (src/enrichment/)

FileExportsPurpose
pipeline.tsenrichV7()Main enrichment orchestrator (7 steps)
types.tsEnrichmentInput, EnrichmentResultType definitions
config.tsINVALID_NAMES, ROLE_PATTERNS, thresholdsConfiguration + blocklists (~900 lines)
ecoApiIntegration.tsfetchDataGaps(), fetchCompanyInsights()ECOAPI integration
sources/domain.tsfindCompanyDomain(), validateCompanyDomain()Domain discovery (4 tiers)
sources/crawlee.tsscrapeWithCrawlee()Crawlee multi-page scraper (6 strategies)
sources/firecrawl.tsscrapeWebsiteFirecrawl()Firecrawl LLM fallback
sources/maps.tsqueryGooglePlaces()Google Places API
sources/serper.tsfreeSearch()Free DNS/HTTP search (replaced Serper)
sources/registry.tsqueryDomainRegistry()IIS .se zone PostgreSQL lookup
sources/website.tsscrapeWebsite()Website scraping coordinator
sources/linkedin.tslinkedInSearch()DISABLED — returns empty
sources/newsJobs.tscheckNewsAndJobs()DISABLED — returns empty
sources/hunter.tshunterSearch()DEPRECATED — replaced by SMTP
processors/nameUtils.tsisValidPersonName(), inferRoleType()Name validation + role inference
processors/emailUtils.tsgenerateEmailGuesses(), detectEmailPattern()Email generation + pattern detection
processors/phoneUtils.tsextractPhoneNumbers()Phone extraction
processors/scoring.tscalculateLeadScore()0-10 lead scoring

Workers (src/workers/)

FileExportsPurpose
enrichDispatcher.tsdispatchEnrichmentBatches()Batch enrichment dispatch
enrichWorker.tsenrichWorkerBullMQ enrich job processor
updateWorker.tsupdateWorkerUPSERT to DB + RoPA + Art.14 trigger
art14Worker.tsart14WorkerGDPR Article 14 notification delivery
playwrightWorker.tsplaywrightWorkerDedicated Playwright browser worker
updateWorker.tsupdateWorker (v2)Improved update with better error handling

Library (src/lib/)

FileExportsPurpose
redis.tssharedRedis (IORedis)Shared Redis connection
redisClients.tsenrichmentCacheRedis, domainCacheRedis, etc.Named Redis clients per DB
article14Notification.tsnotifyDataSubject(), sendNotificationEmail()Art.14 email logic
smtpEmailValidator.tsvalidateEmailViaSMTP(), validateEmailsBatch()SMTP RCPT TO probing (939 lines)
embeddings.tsgenerateEmbedding()OpenAI text-embedding-3-small
keycloak.tskeycloakClientKeycloak auth integration
apiTokenCounter.tsApiTokenCounterCost tracking per enrichment
webhooks.tstriggerWebhook()Webhook delivery
validation.tsvalidateCompany()Validation logic

Fetchers (src/fetchers/)

FileExportsPurpose
bolagsverket/index.tsfetchBolagsverketData()BV VärdefullaDatamängder API
bolagsverket/openApi.tsfetchBolagsverketOpenApi()BV Öppet API (IP-blocked, unused)
bolagsverket/real-api.tsfetchBolagsverketRealApi()Authenticated BV API
bolagsverket/mapper.tsmapBolagsverketToCompany()Data transformation
bolagsverket/download.tsdownloadBulkFile()Bulk file download
bolagsverket/extract.tsextractZipData()ZIP extraction
scb/index.tsfetchCompanies()SCB PxWebApi v2 (synthesized data)
scb/rate-limiter.tsscbRateLimiter10 req/10s limiter
news/real-api.tsfetchNews()News API (disabled)

Import Scripts (src/import/)

FileExportsPurpose
index.tsimportBolagsverket(), importScb()CLI entry points
bolagsverket-import.tsimportBolagsverketBulk()Streaming CSV import
scb-import.tsimportScbBulk()TSV import
merge-sources.tsmergeScbAndBolagsverket()Smart merge
validate-merge.tsvalidateMergedData()Quality checks
delta-import.tsrunDeltaImport()Incremental updates
parser.tsparseOrgNr()Swedish org number parsing
copy-import.tscopyImport()Copy-based import

Root Source Files

FileExportsPurpose
index.tsMain entry point — starts pipeline
validationEngine.tsvalidateCompany()4-layer validation wrapper
leadEnrichment.tsenrichLead()Lead-specific enrichment
emailDiscovery.tsdiscoverEmails()Email discovery logic
contactPositionMapper.tsmapContactPositions()Position mapping
swedishOfficialsExtractor.tsextractSwedishOfficials()Bolagsverket official extraction
googleMapsScraper.tsscrapeGoogleMaps()Maps scraping
platformScrapers.tsscrapeWix(), scrapeWordPress()Platform-specific scrapers
websiteScraper.tsscrapeCompanyWebsite()Generic scraper
eniroScraper.tsscrapeEniro()Eniro scraping (ToS-blocked)
eniroIntegration.tssearchEniro()Eniro integration
hunterIntegration.tssearchHunter()Hunter.io (deprecated)
cache.tsgetCachedEnrichment(), setCachedEnrichment()Redis caching layer
compliance.tshash_contact(), logRoPA()GDPR utilities
logger.tslogger (Pino)Structured logging

Key Dependencies

PackageVersionUsed For
bullmq5.70.1Job queues (Scrape/Enrich/Update/Art14/Playwright)
@crawlee/playwright3.16.0Multi-page web scraping
playwright1.58.2Browser automation
@mendable/firecrawl-js4.18.0LLM-structured extraction
pg8.13.3PostgreSQL (legacy, migrating to Bun.sql)
ioredis5.9.3Redis (legacy, migrating to Bun.redis)
@anthropic-ai/sdk0.78.0Claude Vision for Playwright fallback
axios1.13.6HTTP client
cheerio1.2.0Server-side HTML parsing
zod3.23.8Schema validation
pino10.3.1Structured logging
csv-parse6.1.0CSV parsing for bulk import
react19.2.4Frontend (Kundkort)
@tremor/react3.18.7UI components

Hardcoded Values

ValueLocationWhat It Controls
0.4src/enrichment/sources/domain.ts:303Domain validation minimum score
0.35src/enrichment/sources/registry.tsRegistry fuzzy match threshold
6 monthssrc/enrichment/pipeline.tsEnrichment cache TTL
30 dayssrc/cache.tsDomain cache TTL
12 pagessrc/enrichment/sources/crawlee.tsMax pages per Crawlee crawl
5 concurrentsrc/lib/smtpEmailValidator.tsMax SMTP probes
20 workerssrc/workers/enrichWorker.tsEnrich worker concurrency
50 workerssrc/workers/updateWorker.tsUpdate worker concurrency
4 workerssrc/workers/playwrightWorker.tsPlaywright worker concurrency
10,000src/api/export.tsMax export records
200/daysrc/api/kundkort.tsEnrichment daily limit

TODO/FIXME Markers

Only 9 TODO/FIXME markers in the entire codebase — remarkably clean:

FileLineMarkerDescription
src/fetchers/scb/index.ts78TODOSCB provides aggregates, not real companies
src/fetchers/scb/index.ts206TODOProduction path for real company data

Test Files

Unit / Integration Tests (in src/)

FileCoverageLines
src/compliance.test.tsSHA-256 hashing, RoPA logging
src/validationEngine.test.ts4-layer validation logic
src/integration.test.tsDatabase, queues, workers
src/enrichmentEngine.v7.test.tsv7 enrichment pipeline
src/lib/validation.test.tsValidation utilities
src/fetchers/bolagsverket/mapper.test.tsBV data mapping
autoresearch/regression.test.tsName validation, role mapping guards

API Tests (tests/api/)

FileCoverageLinesKey Areas
tests/api/auth.test.tsAuthentication API317Registration, login, JWT tokens, rate limiting
tests/api/companies.test.tsCompanies CRUD472Pagination, filtering, RoPA logging
tests/api/documents.test.tsDocuments CRUD468Embeddings, similarity search, pgvector
tests/api/index.test.tsEnd-to-end all APIs655Health, auth, orgs, users, projects, docs, companies, leads, search, rate limits, cleanup
tests/api/kundkort-enrich.test.tsKundkort enrich endpoint88On-demand enrichment, auth, 404/400 handling
tests/api/leads.test.tsLeads CRUD536Filtering, validation endpoint, RoPA
tests/api/organizations.test.tsOrganizations CRUD308UUID validation, referential integrity
tests/api/projects.test.tsProjects CRUD322Org filtering, create/update/delete
tests/api/search.test.tsSearch endpoints332Basic + advanced filters, multi-table, Unicode
tests/api/security.test.tsSecurity hardening134JWT forgery, env vars, CSV injection, rate limiting
tests/api/structure.test.tsHandler exports85Static validation of handler method presence
tests/api/users.test.tsUsers CRUD343Email/password validation, create/update/delete

Enrichment Tests (tests/enrichment/)

FileCoverageLinesKey Areas
tests/enrichment/board-members-integration.test.tsBV board members in pipeline31enrichV7 integration, updated_fields, contacts
tests/enrichment/crawlee-quality.test.tsCrawlee extraction quality308Contact extraction from text/img alts, name normalization, role inference, false-positive guards
tests/enrichment/firecrawl.test.tsFirecrawl extractor512Contact page discovery, mock-client extraction, multi-page merge, HTML fallback, error handling
tests/enrichment/processors.test.tsProcessor utilities183Role inference, email/phone extraction, lead scoring, name validation

Fetcher Tests (tests/fetchers/)

FileCoverageLinesKey Areas
tests/fetchers/bolagsverket.test.tsBolagsverket fetcher113ZIP download orchestration, download → extract → cleanup
tests/fetchers/scb.test.tsSCB API client160PxWebApi v2 table list, metadata, data fetch, company synthesis

Integration / E2E Tests (tests/)

FileCoverageLinesKey Areas
tests/domainDiscovery.test.tsDomain discovery guards17Rejects hosting providers (180.se), municipalities (uppsala.se)
tests/speed.test.tsPerformance benchmarks22Registry lookup latency (<50ms), correctness

Test File Summary

CategoryFilesApprox. Lines
API Tests12~3,883
Enrichment Tests4~1,034
Fetcher Tests2~273
Integration / E2E2~39
Unit (in src/)7
Total27~5,229+