Repository Layout Complete
Every file in src/ mapped to its purpose
89 TypeScript files across 12 directories. Complete catalog for navigation.
File Count by Directory
| Directory | Files | Purpose |
|---|---|---|
src/api/ | 16 | REST API endpoints + middleware |
src/enrichment/ | 16 | Enrichment pipeline + sources + processors |
src/lib/ | 9 | Shared utilities (Redis, SMTP, notifications, etc.) |
src/fetchers/ | 9 | External data fetchers (Bolagsverket, SCB, news) |
src/import/ | 8 | Bulk import scripts |
src/workers/ | 6 | BullMQ worker implementations |
src/validation/ | 3 | Validation layer implementations |
src/extractors/ | 1 | Website extraction (Playwright) |
src/db/ | 3 | Database connection + schema |
src/compliance/ | 1 | GDPR compliance utilities |
src/mocks/ | 2 | Mock data for testing |
Root src/ | 15 | Entry points, engines, scrapers |
API Layer (src/api/)
| File | Exports | Purpose |
|---|---|---|
index.ts | app (Express/Bun.serve) | API entry point, route registration |
auth.ts | authHandlers | Login, register, JWT token refresh |
companies.ts | companyHandlers | CRUD for companies table |
leads.ts | leadHandlers | CRUD for leads table + validation |
kundkort.ts | kundkortHandlers | Company card UI backend (search, detail, enrich, export) |
search.ts | searchHandlers | Full-text, advanced, semantic search |
scrape.ts | scrapeHandlers | On-demand website scraping |
export.ts | exportHandlers | CSV/JSON export with GDPR filtering |
users.ts | userHandlers | User management |
organizations.ts | organizationHandlers | Tenant/org management |
projects.ts | projectHandlers | Project management |
documents.ts | documentHandlers | Document storage with embeddings |
enrichmentErrors.ts | enrichmentErrorHandlers | Error tracking API |
validation.ts | validationHandlers | Validation layer API |
middleware/auth.ts | requireAuth, requireRole | JWT + RBAC middleware |
middleware/rate-limit.ts | rateLimitMiddleware | Redis-based sliding window rate limiting |
Enrichment Pipeline (src/enrichment/)
| File | Exports | Purpose |
|---|---|---|
pipeline.ts | enrichV7() | Main enrichment orchestrator (7 steps) |
types.ts | EnrichmentInput, EnrichmentResult | Type definitions |
config.ts | INVALID_NAMES, ROLE_PATTERNS, thresholds | Configuration + blocklists (~900 lines) |
ecoApiIntegration.ts | fetchDataGaps(), fetchCompanyInsights() | ECOAPI integration |
sources/domain.ts | findCompanyDomain(), validateCompanyDomain() | Domain discovery (4 tiers) |
sources/crawlee.ts | scrapeWithCrawlee() | Crawlee multi-page scraper (6 strategies) |
sources/firecrawl.ts | scrapeWebsiteFirecrawl() | Firecrawl LLM fallback |
sources/maps.ts | queryGooglePlaces() | Google Places API |
sources/serper.ts | freeSearch() | Free DNS/HTTP search (replaced Serper) |
sources/registry.ts | queryDomainRegistry() | IIS .se zone PostgreSQL lookup |
sources/website.ts | scrapeWebsite() | Website scraping coordinator |
sources/linkedin.ts | linkedInSearch() | DISABLED — returns empty |
sources/newsJobs.ts | checkNewsAndJobs() | DISABLED — returns empty |
sources/hunter.ts | hunterSearch() | DEPRECATED — replaced by SMTP |
processors/nameUtils.ts | isValidPersonName(), inferRoleType() | Name validation + role inference |
processors/emailUtils.ts | generateEmailGuesses(), detectEmailPattern() | Email generation + pattern detection |
processors/phoneUtils.ts | extractPhoneNumbers() | Phone extraction |
processors/scoring.ts | calculateLeadScore() | 0-10 lead scoring |
Workers (src/workers/)
| File | Exports | Purpose |
|---|---|---|
enrichDispatcher.ts | dispatchEnrichmentBatches() | Batch enrichment dispatch |
enrichWorker.ts | enrichWorker | BullMQ enrich job processor |
updateWorker.ts | updateWorker | UPSERT to DB + RoPA + Art.14 trigger |
art14Worker.ts | art14Worker | GDPR Article 14 notification delivery |
playwrightWorker.ts | playwrightWorker | Dedicated Playwright browser worker |
updateWorker.ts | updateWorker (v2) | Improved update with better error handling |
Library (src/lib/)
| File | Exports | Purpose |
|---|---|---|
redis.ts | sharedRedis (IORedis) | Shared Redis connection |
redisClients.ts | enrichmentCacheRedis, domainCacheRedis, etc. | Named Redis clients per DB |
article14Notification.ts | notifyDataSubject(), sendNotificationEmail() | Art.14 email logic |
smtpEmailValidator.ts | validateEmailViaSMTP(), validateEmailsBatch() | SMTP RCPT TO probing (939 lines) |
embeddings.ts | generateEmbedding() | OpenAI text-embedding-3-small |
keycloak.ts | keycloakClient | Keycloak auth integration |
apiTokenCounter.ts | ApiTokenCounter | Cost tracking per enrichment |
webhooks.ts | triggerWebhook() | Webhook delivery |
validation.ts | validateCompany() | Validation logic |
Fetchers (src/fetchers/)
| File | Exports | Purpose |
|---|---|---|
bolagsverket/index.ts | fetchBolagsverketData() | BV VärdefullaDatamängder API |
bolagsverket/openApi.ts | fetchBolagsverketOpenApi() | BV Öppet API (IP-blocked, unused) |
bolagsverket/real-api.ts | fetchBolagsverketRealApi() | Authenticated BV API |
bolagsverket/mapper.ts | mapBolagsverketToCompany() | Data transformation |
bolagsverket/download.ts | downloadBulkFile() | Bulk file download |
bolagsverket/extract.ts | extractZipData() | ZIP extraction |
scb/index.ts | fetchCompanies() | SCB PxWebApi v2 (synthesized data) |
scb/rate-limiter.ts | scbRateLimiter | 10 req/10s limiter |
news/real-api.ts | fetchNews() | News API (disabled) |
Import Scripts (src/import/)
| File | Exports | Purpose |
|---|---|---|
index.ts | importBolagsverket(), importScb() | CLI entry points |
bolagsverket-import.ts | importBolagsverketBulk() | Streaming CSV import |
scb-import.ts | importScbBulk() | TSV import |
merge-sources.ts | mergeScbAndBolagsverket() | Smart merge |
validate-merge.ts | validateMergedData() | Quality checks |
delta-import.ts | runDeltaImport() | Incremental updates |
parser.ts | parseOrgNr() | Swedish org number parsing |
copy-import.ts | copyImport() | Copy-based import |
Root Source Files
| File | Exports | Purpose |
|---|---|---|
index.ts | — | Main entry point — starts pipeline |
validationEngine.ts | validateCompany() | 4-layer validation wrapper |
leadEnrichment.ts | enrichLead() | Lead-specific enrichment |
emailDiscovery.ts | discoverEmails() | Email discovery logic |
contactPositionMapper.ts | mapContactPositions() | Position mapping |
swedishOfficialsExtractor.ts | extractSwedishOfficials() | Bolagsverket official extraction |
googleMapsScraper.ts | scrapeGoogleMaps() | Maps scraping |
platformScrapers.ts | scrapeWix(), scrapeWordPress() | Platform-specific scrapers |
websiteScraper.ts | scrapeCompanyWebsite() | Generic scraper |
eniroScraper.ts | scrapeEniro() | Eniro scraping (ToS-blocked) |
eniroIntegration.ts | searchEniro() | Eniro integration |
hunterIntegration.ts | searchHunter() | Hunter.io (deprecated) |
cache.ts | getCachedEnrichment(), setCachedEnrichment() | Redis caching layer |
compliance.ts | hash_contact(), logRoPA() | GDPR utilities |
logger.ts | logger (Pino) | Structured logging |
Key Dependencies
| Package | Version | Used For |
|---|---|---|
bullmq | 5.70.1 | Job queues (Scrape/Enrich/Update/Art14/Playwright) |
@crawlee/playwright | 3.16.0 | Multi-page web scraping |
playwright | 1.58.2 | Browser automation |
@mendable/firecrawl-js | 4.18.0 | LLM-structured extraction |
pg | 8.13.3 | PostgreSQL (legacy, migrating to Bun.sql) |
ioredis | 5.9.3 | Redis (legacy, migrating to Bun.redis) |
@anthropic-ai/sdk | 0.78.0 | Claude Vision for Playwright fallback |
axios | 1.13.6 | HTTP client |
cheerio | 1.2.0 | Server-side HTML parsing |
zod | 3.23.8 | Schema validation |
pino | 10.3.1 | Structured logging |
csv-parse | 6.1.0 | CSV parsing for bulk import |
react | 19.2.4 | Frontend (Kundkort) |
@tremor/react | 3.18.7 | UI components |
Hardcoded Values
| Value | Location | What It Controls |
|---|---|---|
0.4 | src/enrichment/sources/domain.ts:303 | Domain validation minimum score |
0.35 | src/enrichment/sources/registry.ts | Registry fuzzy match threshold |
6 months | src/enrichment/pipeline.ts | Enrichment cache TTL |
30 days | src/cache.ts | Domain cache TTL |
12 pages | src/enrichment/sources/crawlee.ts | Max pages per Crawlee crawl |
5 concurrent | src/lib/smtpEmailValidator.ts | Max SMTP probes |
20 workers | src/workers/enrichWorker.ts | Enrich worker concurrency |
50 workers | src/workers/updateWorker.ts | Update worker concurrency |
4 workers | src/workers/playwrightWorker.ts | Playwright worker concurrency |
10,000 | src/api/export.ts | Max export records |
200/day | src/api/kundkort.ts | Enrichment daily limit |
TODO/FIXME Markers
Only 9 TODO/FIXME markers in the entire codebase — remarkably clean:
| File | Line | Marker | Description |
|---|---|---|---|
src/fetchers/scb/index.ts | 78 | TODO | SCB provides aggregates, not real companies |
src/fetchers/scb/index.ts | 206 | TODO | Production path for real company data |
Test Files
Unit / Integration Tests (in src/)
| File | Coverage | Lines |
|---|---|---|
src/compliance.test.ts | SHA-256 hashing, RoPA logging | — |
src/validationEngine.test.ts | 4-layer validation logic | — |
src/integration.test.ts | Database, queues, workers | — |
src/enrichmentEngine.v7.test.ts | v7 enrichment pipeline | — |
src/lib/validation.test.ts | Validation utilities | — |
src/fetchers/bolagsverket/mapper.test.ts | BV data mapping | — |
autoresearch/regression.test.ts | Name validation, role mapping guards | — |
API Tests (tests/api/)
| File | Coverage | Lines | Key Areas |
|---|---|---|---|
tests/api/auth.test.ts | Authentication API | 317 | Registration, login, JWT tokens, rate limiting |
tests/api/companies.test.ts | Companies CRUD | 472 | Pagination, filtering, RoPA logging |
tests/api/documents.test.ts | Documents CRUD | 468 | Embeddings, similarity search, pgvector |
tests/api/index.test.ts | End-to-end all APIs | 655 | Health, auth, orgs, users, projects, docs, companies, leads, search, rate limits, cleanup |
tests/api/kundkort-enrich.test.ts | Kundkort enrich endpoint | 88 | On-demand enrichment, auth, 404/400 handling |
tests/api/leads.test.ts | Leads CRUD | 536 | Filtering, validation endpoint, RoPA |
tests/api/organizations.test.ts | Organizations CRUD | 308 | UUID validation, referential integrity |
tests/api/projects.test.ts | Projects CRUD | 322 | Org filtering, create/update/delete |
tests/api/search.test.ts | Search endpoints | 332 | Basic + advanced filters, multi-table, Unicode |
tests/api/security.test.ts | Security hardening | 134 | JWT forgery, env vars, CSV injection, rate limiting |
tests/api/structure.test.ts | Handler exports | 85 | Static validation of handler method presence |
tests/api/users.test.ts | Users CRUD | 343 | Email/password validation, create/update/delete |
Enrichment Tests (tests/enrichment/)
| File | Coverage | Lines | Key Areas |
|---|---|---|---|
tests/enrichment/board-members-integration.test.ts | BV board members in pipeline | 31 | enrichV7 integration, updated_fields, contacts |
tests/enrichment/crawlee-quality.test.ts | Crawlee extraction quality | 308 | Contact extraction from text/img alts, name normalization, role inference, false-positive guards |
tests/enrichment/firecrawl.test.ts | Firecrawl extractor | 512 | Contact page discovery, mock-client extraction, multi-page merge, HTML fallback, error handling |
tests/enrichment/processors.test.ts | Processor utilities | 183 | Role inference, email/phone extraction, lead scoring, name validation |
Fetcher Tests (tests/fetchers/)
| File | Coverage | Lines | Key Areas |
|---|---|---|---|
tests/fetchers/bolagsverket.test.ts | Bolagsverket fetcher | 113 | ZIP download orchestration, download → extract → cleanup |
tests/fetchers/scb.test.ts | SCB API client | 160 | PxWebApi v2 table list, metadata, data fetch, company synthesis |
Integration / E2E Tests (tests/)
| File | Coverage | Lines | Key Areas |
|---|---|---|---|
tests/domainDiscovery.test.ts | Domain discovery guards | 17 | Rejects hosting providers (180.se), municipalities (uppsala.se) |
tests/speed.test.ts | Performance benchmarks | 22 | Registry lookup latency (<50ms), correctness |
Test File Summary
| Category | Files | Approx. Lines |
|---|---|---|
| API Tests | 12 | ~3,883 |
| Enrichment Tests | 4 | ~1,034 |
| Fetcher Tests | 2 | ~273 |
| Integration / E2E | 2 | ~39 |
Unit (in src/) | 7 | — |
| Total | 27 | ~5,229+ |