Decision Register
Architecture Decision Records (ADRs)
Every significant decision that shaped this project. Format: context → decision → consequences.
ADR-001: Bun over Node.js
Date: 2026-03-04
Context: Need fast TypeScript execution, built-in bundling, native APIs.
Decision: Use Bun as the exclusive runtime. Ban node, npm, npx, ts-node.
Consequences:
- ✅ Fast startup, built-in TS support,
Bun.sql,Bun.redis,Bun.serve() - ❌ Some npm packages incompatible; must use Bun-native alternatives
- ❌ Team must learn Bun-specific APIs
ADR-002: PostgreSQL over SQLite
Date: 2026-03-04 Context: Initial POC used SQLite. Needed concurrency, pgvector, robustness. Decision: Migrate to PostgreSQL 15 + pgvector extension. Consequences:
- ✅ Concurrent connections, vector embeddings, GIN indexes, trigram search
- ✅ Docker Compose infrastructure
- ❌ More complex local setup (port 5433)
ADR-003: BullMQ over Custom Queue
Date: 2026-03-04 Context: Need reliable job queue for scrape → enrich → update pipeline. Decision: Use BullMQ on Redis 7. Consequences:
- ✅ Dead letter queues, retry logic, rate limiting, job progress tracking
- ✅ Multiple worker types (scrape, enrich, update, art14, playwright)
- ❌ Redis dependency adds infrastructure complexity
ADR-004: Remove Serper.dev
Date: 2026-03-25 Context: Serper.dev used for Google search API (domain discovery, LinkedIn, news). ToS concerns + credit exhaustion. Decision: Replace with free tier: Google Places API → IIS .se zone registry → DNS/HTTP scoring. Consequences:
- ✅ Zero cost for domain discovery
- ✅ No ToS risk
- ❌ Lost LinkedIn and news signals
- ❌ Slightly lower domain discovery accuracy for edge cases
ADR-005: Crawlee over Playwright-Only
Date: 2026-04-01 Context: Playwright alone was slow and missed JS-rendered content. Needed multi-page crawling. Decision: Add Crawlee as primary scraper, Playwright as fallback, Firecrawl as last resort. Consequences:
- ✅ Multi-page crawling (up to 12 pages)
- ✅ 6 extraction strategies (DOM, text, email, prose, alt text, JSON-LD)
- ✅ 29 rounds of optimization, 145 companies tested
- ❌ Playwright inline in workers = scaling bottleneck (>40 concurrent = OOM)
ADR-006: Article 6(1)(f) Legitimate Interest
Date: 2026-03-10 Context: GDPR legal basis for B2B contact data processing. Decision: Rely on Article 6(1)(f) — legitimate interests. Corporate data, professional capacity, publicly available. Consequences:
- ✅ No consent required for corporate registry data
- ✅ Swedish law mandates publication of board members
- ⚠️ Must complete LIA document (currently KB articles only)
- ⚠️ Bisnode precedent risk — proactive Art. 14 notification mandatory
ADR-007: SHA-256 Opt-Out (Not Plaintext)
Date: 2026-03-04 Context: GDPR requires opt-out mechanism. Storing plaintext of people who opted out would itself violate GDPR. Decision: Store only HMAC-SHA256 hashes, no raw contact data. Consequences:
- ✅ No PII in opt-out table
- ✅ Salted hashes prevent rainbow table attacks
- ❌ Cannot verify what contact a hash represents (by design)
- ❌ Salt rotation invalidates all existing hashes
ADR-008: Feature-Flag Firecrawl
Date: 2026-03-26 Context: Firecrawl LLM extraction promising but expensive (~$0.0018/credit). Decision: Implement Firecrawl as optional fallback, not default. A/B test before switching. Consequences:
- ✅ Controlled cost exposure
- ✅ Can compare Crawlee vs Firecrawl quality
- ❌ Two code paths to maintain
- ❌ Phase 2 (full comparison) not yet done
ADR-009: Archive Inactive Companies
Date: 2026-04-06 Context: 2.9M Bolagsverket records, only ~651K active AB companies relevant. Decision: Archive inactive and non-AB companies to separate tables. Consequences:
- ✅ Faster queries on active dataset
- ✅ Reduced memory footprint
- ✅ Backup sizes reduced (~2.4GB → smaller working set)
- ❌ Archive ≠ deletion under GDPR
- ❌ Need restore capability
ADR-010: Kundkort Frontend (Tremor + React)
Date: 2026-03-30 Context: Need user-facing interface for company search and enrichment. Decision: Build React frontend with Tremor UI components, integrated with backend API. Consequences:
- ✅ Modern UI with data visualization
- ✅ Search, detail view, manual enrich, CSV export
- ❌ Additional frontend complexity
- ❌ Auth/Keycloak integration required
ADR-011: Autoresearch Loop for Quality
Date: 2026-04-02 Context: Manual experimentation too slow for 29 rounds of optimization. Decision: Build autonomous loop: run experiment → analyze → suggest changes → repeat. Consequences:
- ✅ 29 rounds completed, 145 companies tested
- ✅ Metrics-driven improvement (composite score formula)
- ✅ Regression tests guard against quality degradation
- ❌ Autonomous changes need human review before production
ADR-012: Multi-Tenancy with PostgreSQL RLS
Date: 2026-03-07 Context: Need to serve multiple customers from same database. Decision: Implement org-scoped tables with PostgreSQL Row Level Security. Consequences:
- ✅ Data isolation at database level
- ✅ Single schema for all tenants
- ❌ RLS not yet enabled in production code
- ❌ Performance overhead of RLS checks