Decision Register

Architecture Decision Records (ADRs)

Every significant decision that shaped this project. Format: context → decision → consequences.

ADR-001: Bun over Node.js

Date: 2026-03-04 Context: Need fast TypeScript execution, built-in bundling, native APIs. Decision: Use Bun as the exclusive runtime. Ban node, npm, npx, ts-node. Consequences:

  • ✅ Fast startup, built-in TS support, Bun.sql, Bun.redis, Bun.serve()
  • ❌ Some npm packages incompatible; must use Bun-native alternatives
  • ❌ Team must learn Bun-specific APIs

ADR-002: PostgreSQL over SQLite

Date: 2026-03-04 Context: Initial POC used SQLite. Needed concurrency, pgvector, robustness. Decision: Migrate to PostgreSQL 15 + pgvector extension. Consequences:

  • ✅ Concurrent connections, vector embeddings, GIN indexes, trigram search
  • ✅ Docker Compose infrastructure
  • ❌ More complex local setup (port 5433)

ADR-003: BullMQ over Custom Queue

Date: 2026-03-04 Context: Need reliable job queue for scrape → enrich → update pipeline. Decision: Use BullMQ on Redis 7. Consequences:

  • ✅ Dead letter queues, retry logic, rate limiting, job progress tracking
  • ✅ Multiple worker types (scrape, enrich, update, art14, playwright)
  • ❌ Redis dependency adds infrastructure complexity

ADR-004: Remove Serper.dev

Date: 2026-03-25 Context: Serper.dev used for Google search API (domain discovery, LinkedIn, news). ToS concerns + credit exhaustion. Decision: Replace with free tier: Google Places API → IIS .se zone registry → DNS/HTTP scoring. Consequences:

  • ✅ Zero cost for domain discovery
  • ✅ No ToS risk
  • ❌ Lost LinkedIn and news signals
  • ❌ Slightly lower domain discovery accuracy for edge cases

ADR-005: Crawlee over Playwright-Only

Date: 2026-04-01 Context: Playwright alone was slow and missed JS-rendered content. Needed multi-page crawling. Decision: Add Crawlee as primary scraper, Playwright as fallback, Firecrawl as last resort. Consequences:

  • ✅ Multi-page crawling (up to 12 pages)
  • ✅ 6 extraction strategies (DOM, text, email, prose, alt text, JSON-LD)
  • ✅ 29 rounds of optimization, 145 companies tested
  • ❌ Playwright inline in workers = scaling bottleneck (>40 concurrent = OOM)

ADR-006: Article 6(1)(f) Legitimate Interest

Date: 2026-03-10 Context: GDPR legal basis for B2B contact data processing. Decision: Rely on Article 6(1)(f) — legitimate interests. Corporate data, professional capacity, publicly available. Consequences:

  • ✅ No consent required for corporate registry data
  • ✅ Swedish law mandates publication of board members
  • ⚠️ Must complete LIA document (currently KB articles only)
  • ⚠️ Bisnode precedent risk — proactive Art. 14 notification mandatory

ADR-007: SHA-256 Opt-Out (Not Plaintext)

Date: 2026-03-04 Context: GDPR requires opt-out mechanism. Storing plaintext of people who opted out would itself violate GDPR. Decision: Store only HMAC-SHA256 hashes, no raw contact data. Consequences:

  • ✅ No PII in opt-out table
  • ✅ Salted hashes prevent rainbow table attacks
  • ❌ Cannot verify what contact a hash represents (by design)
  • ❌ Salt rotation invalidates all existing hashes

ADR-008: Feature-Flag Firecrawl

Date: 2026-03-26 Context: Firecrawl LLM extraction promising but expensive (~$0.0018/credit). Decision: Implement Firecrawl as optional fallback, not default. A/B test before switching. Consequences:

  • ✅ Controlled cost exposure
  • ✅ Can compare Crawlee vs Firecrawl quality
  • ❌ Two code paths to maintain
  • ❌ Phase 2 (full comparison) not yet done

ADR-009: Archive Inactive Companies

Date: 2026-04-06 Context: 2.9M Bolagsverket records, only ~651K active AB companies relevant. Decision: Archive inactive and non-AB companies to separate tables. Consequences:

  • ✅ Faster queries on active dataset
  • ✅ Reduced memory footprint
  • ✅ Backup sizes reduced (~2.4GB → smaller working set)
  • ❌ Archive ≠ deletion under GDPR
  • ❌ Need restore capability

ADR-010: Kundkort Frontend (Tremor + React)

Date: 2026-03-30 Context: Need user-facing interface for company search and enrichment. Decision: Build React frontend with Tremor UI components, integrated with backend API. Consequences:

  • ✅ Modern UI with data visualization
  • ✅ Search, detail view, manual enrich, CSV export
  • ❌ Additional frontend complexity
  • ❌ Auth/Keycloak integration required

ADR-011: Autoresearch Loop for Quality

Date: 2026-04-02 Context: Manual experimentation too slow for 29 rounds of optimization. Decision: Build autonomous loop: run experiment → analyze → suggest changes → repeat. Consequences:

  • ✅ 29 rounds completed, 145 companies tested
  • ✅ Metrics-driven improvement (composite score formula)
  • ✅ Regression tests guard against quality degradation
  • ❌ Autonomous changes need human review before production

ADR-012: Multi-Tenancy with PostgreSQL RLS

Date: 2026-03-07 Context: Need to serve multiple customers from same database. Decision: Implement org-scoped tables with PostgreSQL Row Level Security. Consequences:

  • ✅ Data isolation at database level
  • ✅ Single schema for all tenants
  • ❌ RLS not yet enabled in production code
  • ❌ Performance overhead of RLS checks