Scope
The 20 most important commits across the 50-commit history of master. Selected for: irreversibility, scope, or surfaced bug class. Each entry has hash, date, the verbatim subject in quotes, and an interpretation of why it matters.
For chronological grouping see History Overview.
The 20
006a770 — 2026-03-11 — “fix: Security hardening — 12 issues resolved across codebase”
The first commit on master. JWT verification was non-functional before this commit (broken crypto.subtle.importKey('spki')). Opt-out hashing also switched from SHA256 to HMAC-SHA256 — opt-out hashes from before this date are invalid. See History Foundation Era and Opt-Out Hashes.
2f1119d — 2026-03-24 — “feat: Phase 1 critical fixes — reklamspärr gate, real validation laye[rs]…”
Two of the four validation layers were hash(orgNr) % 2 and random < 0.2 mocks until this commit. Lead scores from before this date are not comparable. Also moved Article 14 trigger from export to collection time. See Reklamspärr and Article 14.
714a500 — 2026-03-24 — “feat: Phase 2 — decompose enrichmentEngine.v7.ts god file into modules”
The 1176-line monolith was split into the current src/enrichment/ tree (types.ts, config.ts, processors/, sources/). Every later refactor descends from this layout. See Repository Layout.
c006048 — 2026-03-25 — “feat: Phase 3 — scale infrastructure for 1.4M enrichment runs/year”
Promoted JSONB fields (domain, lead_score, contacts_count, has_vd_contact, last_enriched_at) to real columns with indexes. Six per-purpose Redis logical DBs. RoPA_Log.month_bucket for partitioning at 1.4M+ rows/year. Does NOT decouple Playwright from the enrich worker — that blocker remains open.
23f6c2f — 2026-03-25 — “fix: domain resolution hardening, eniro ToS removal, kundkort v2”
Three things at once: blocked all 290 Swedish municipality .se domains, content-validates top-3 Serper candidates, removed dead registry imports that were crashing every run. The day’s domain-discovery binge starts here. See History Domain Discovery Era.
ce92174 — 2026-03-25 — “feat: IIS .se zone registry (Tier 0) for domain resolution”
Loaded the ~1.47M-row IIS .se zone into domain_registry with pg_trgm. Enables free domain lookup before spending Serper budget. The Tier 0 cache layer in Domain Discovery.
5da08d0 — 2026-03-25 — “fix: domain resolution — find .com domains, block junk .se directories”
Concrete cited win in the body: TMP I Uppsala AB lead score 0 → 4.1 after the fix. Added 10 Swedish business-directory aggregators (bisnode.se, creditsafe.se, vainu.com, etc.) to INVALID_DOMAINS so they stop winning the discovery race against the real corporate .com.
d78d33e — 2026-03-25 — “fix: address all code-review findings — CRITICAL/HIGH/MEDIUM”
The compliance audit pass. Reklamspärr filter on export endpoints, MAX_EXPORT_LIMIT enforced unconditionally, Gate 0 added inside enrichV7() itself so callers cannot bypass it. Critical for any GDPR audit defence.
b0122f3 — 2026-03-25 — “feat: rebuild kundkort with canonical field spec”
The kundkort field list became canonical here: 18 named fields, GAP cards when missing, role normalisation via ROLE_PATTERNS. Any kundkort generated before this commit has a different shape.
8952c8d — 2026-03-26 — “feat: Firecrawl LLM extractor — Phase 1 (feature-flagged)”
First LLM-based extractor in the project. Feature-flagged via USE_FIRECRAWL. Eventually superseded by Crawlee Scraper but remains in the codebase. See History Firecrawl Era.
73d0aba — 2026-03-31 — “feat: Kundkort frontend + BV VärdefullaDatamängder API + frontend build fix”
Discovered the working Bolagsverket VärdefullaDatamängder gateway (gw.api.bolagsverket.se) — same credentials as the IP-blocked Öppet API but a different hostname. Without this commit there is no Bolagsverket data in production. Also fixed the dev frontend’s CJS double-transpile bug.
2324c1f — 2026-04-01 — “feat(scraper): add Crawlee-based multi-page website scraper”
Crawlee landed. PlaywrightCrawler wrapping with MemoryStorage (no disk writes), homepage + sub-page crawl, alt-text and text extraction. Wired behind USE_CRAWLEE=true. Becomes the default scraper after f6c4d30 below.
cb91206 — 2026-04-02 — “autoresearch: add autonomous extraction improvement system”
The autoresearch loop — agent-driven A/B testing of extraction strategies. JSON-LD added in round 1 contributed +18.7 composite. False-positive rate cited as 37.5% → 6.3%. See Autoresearch Loop.
23d7207 — 2026-04-02 — “feat: Crawlee quality loop complete — 29 rounds, 145 companies tested”
The 29 autoresearch rounds collapsed into one commit. Body claims: 0 false positives, ~60% extraction rate on companies with live websites, Serper removed from fallbacks (TOS), 82/82 unit tests pass, “production-ready for ~650K active AB companies”. Followed minutes later by cleanup commit f6c4d30 (1356 deletions of stale tests and staged dist/).
6a27fb5 — 2026-04-06 — “feat: Add ECOAPI, Tremor frontend with data visualization, and enrichment integration”
Replaced the old kundkort with a Tremor-based dashboard (FinancialChart, ContactDistributionChart, DataCompletenessChart, GapsPanel, InsightsPanel). Introduced ECOAPI at localhost:3001 — later archived in commit 2ce49b6.
4deddb7 — 2026-04-06 — “feat: add database backup script and archive tables migration for inactive companies”
Start of the data-lifecycle infrastructure: backup script + archive-tables migration (originally 007_add_archive_tables.sql, later renamed to 009).
15f01d8 — 2026-04-06 — “feat: archive inactive companies - 2026-04-06”
The actual archive run for inactive companies. The dated subject line marks it as a one-off operation rather than a feature. Hot-table size drops here.
8ea0a44 — 2026-04-06 — “feat: archive non-AB companies - 2026-04-06”
Second archive run, this one for non-AB legal forms. Together with 15f01d8 these are the first significant data deletions in the project’s history.
b3fecc0 — 2026-04-22 — “docs: add SYSTEM_OVERVIEW and SYSTEMDOKUMENTATION_V2 with senior-review fixes”
The first external-reviewed documentation. Five fixes from senior review: file-path attribution, mock-vs-production confusion in src/mocks/validation.ts, dropping of a circular “100% domain accuracy” claim, baseline clarification on Places “+29.4”, FPR-revert reconciliation. Basis for the current System Overview note.
81c2568 — 2026-04-27 — “feat(migrations): add schema_migrations tracking and resolve 007 issues”
Added schema_migrations table on commit 50 of 50. The project ran for ~7 weeks with 10 migrations and no record of which had been applied. Also resolved the duplicate 007_* filename collision and removed an illegal BEGIN/COMMIT wrapper around CREATE INDEX CONCURRENTLY. See Schema Migrations and History Migrations Era.
d10de38 — 2026-05-06 — “feat(procurement): Landing B frontend — full table redesign + drawer lots”
Sprint Landing B-frontend, 16 files +1444/-275. Closes the operator-flagged “we cannot infinite scroll” + “way to sort the list and more rows with info to sort with” requests synthesised from three parallel research agents (Trend Researcher / UX Researcher / Evidence Collector). Reality Checker approved the design pre-build.
Seven-column table: Title · Buyer · Notice Type · Published · Deadline · Value · County. Progressive disclosure across sm/md/lg/xl breakpoints. Server-driven pagination (page-numbers + perPage 25/50/100) replaces the previous hardcoded limit: 2000. Tri-state click-to-sort headers cycling ASC → DESC → cleared (back to default published_at DESC). URL state via useSearchParams — ?page=&perPage=&sort=&dir=&q=&type=&nuts=&within=&valueMin=&valueMax=&cpv=&status= — refresh-safe + shareable. First URL-state usage in the project; convention documented in Procurements URL State (TODO).
New utilities: lib/sv-format.ts (Swedish locale formatters with 23 unit tests; “55 MSEK” / “850 tkr” / “500 kr” + “19 maj 2026”), lib/nuts.ts (NUTS-3 → Swedish län display with 17 unit tests; covers all 21 län plus NUTS-2 fallback). New components: DeadlineCell (countdown chip; destructive ≤7d, amber ≤14d, muted ≤30d), PaginationBar (windowed page list [1, "…", 4, 5, 6, "…", 47]), SortableHeader (aria-sort + arrow indicator), SkeletonRow. New hook: useProcurementSearchParams (URL-state + UI-state-to-server-params adapter).
Drawer LotItem fix — buyers commonly leave their procurement system’s default template name (verified in DB: ~70 notices use “Generell del”, “Grundmall upphandling”, “Kravspecifikation”, “Administrativa föreskrifter”). The display now shows “Del N” position prefix even when the buyer didn’t customise, and lot descriptions truncate at word boundary with “Visa mer” expand instead of ugly mid-word line-clamp-3 cut. Raw LOT-id shown muted underneath for traceability.
NoticeTypeBadge tightened — operator flagged “Aktiv upphandling” wrapping to 2 lines. Shortened to “Aktiv” / “Avgjord (sociala)” / “Ändrat” / “Periodisk” plus whitespace-nowrap. Full label preserved in tooltip.
Two pre-existing issues intentionally bundle into Landing C: smart-chip pills lacking keyboard a11y (B5) and drawer missing role="dialog" / focus trap (B6). The auth-persist gap (B1) stays in G1 globally per operator decision.
9890bcc — 2026-05-05 — “feat(procurement): Landing B server — numeric value, sort, server-side filters”
Server-side companion to d10de38. Three additions:
ProcurementWire.uppskattat_varde_belopp(numeric SEK alongside the existing pre-formatteduppskattat_vardestring) so the frontend can doIntl.NumberFormat, sort by value, and apply value-range filters without re-parsing.SORTABLE_COLUMNSallowlist +isSortColumn()predicate + dynamic ORDER BY composition. SQL injection guard tested with 8 attack strings (16 unit tests intests/procurement/sortAllowlist.test.ts). NULLS LAST +idtie-breaker for deterministic pagination.- New
ListParamsfilters:nuts_prefix,value_min,value_max,deadline_within_days. The deadline filter has a>= now()lower bound so “closing within 14 days” means future-only (caught by live-data sanity check before commit — original query was returning past-dated rows).
Live-verified: combined ?nuts_prefix=SE11&deadline_within_days=30&sort_by=estimated_value_amount&sort_dir=desc returns 172 high-value Stockholm tenders closing soon. SQL injection attempts return default-sort fallback (200, no execution).
7dabb10 — 2026-05-05 — “fix(procurement): frontend hardening — routing, layout, drawer, filter”
Same-day follow-up to 5c4db1f. Two QA gates on the live UI surfaced 9 issues: a /upphandlingar 404 (Swedish alias the user hit), table-layout: auto clipping STATUS/DEADLINE/VALUE columns off the right edge at 1440px, frontend hardcoded limit: 100 against a 2113-row corpus (backend ALSO capped at 200), foreign-country winner_org_country: "ESP" returned by the API but never rendered, consortium_size > 1 was a silent icon swap with zero text affordance, awards with no real bid showing 0 SEK, and a notice_type query parameter wired in ListParams but ignored by the SQL builder. All fixed.
This commit reads short but is a good case study: the original Feature 1 ship (5c4db1f) was approved by Reality Checker and 47/47 unit tests passed. The frontend hardening issues weren’t visible until the operator opened the page and tried to use it. The QA gates that matter are the live-UI ones, not the unit tests. Evidence preserved at vault/Wiki/Tests/screenshots/2026-05-05-feature1-frontend-hardening/ (11 screenshots — 2 pre-fix demonstrating the bugs, 9 post-fix demonstrating the fixes).
5c4db1f — 2026-05-05 — “feat(procurement): Feature 1 Buyer Intel + Feature 2 ABANDON probe”
Three intertwined tracks landed together:
-
Feature 1 (Buyer Intel) —
procurement_awardstable joining 5 eForms sections (NoticeResult / LotResult / LotTender / TenderingParty / Tenderer / Organization) to expose contract winners, bid amounts, bid ranges, received bid counts, and award dates per LotResult oncan-*notices. Framework-agreement and consortium aware. Live data: 6163 awards across 719 notices in a 15-day rolling window; top winners are medical-supplies frameworks (Mediplast 72 wins; SWECO 218M SEK). -
Feature 2 (sub-threshold sourcing) — ABANDON. Probe verified the same structural lockout as Path 3B: SE annonsdatabaser are paywalled commercial platforms with no public RSS/JSON feeds, KKV registry has 5 entries (e-Avrop, KommersAnnons, Mercell, Konstpool, Clira), net-new volume is bounded above by ~7-8k/yr versus existing TED 25k SE. Geographic expansion (DK/NO/FI TED = 34,313 notices/yr verified) parked per operator: Sverige-only just nu.
-
Destructive-DB protection layer — Post-incident hardening after a too-wide DELETE caught 53 rows from prior successful runs (predicate
ingested_at > now() - 1 hour, butingested_athad been touched by re-upserts). Added:_truncateAllForTest()2-layer guard requiring DB name to end in_test, separateenrichnodedb_testdatabase for integration tests, PreToolUse hook blockingDELETE/UPDATE/DROP/TRUNCATE/ALTER TABLE, and helper scriptssafe-cleanup-failed-batch.ts+reclassify-awards-pii.ts(count-first, transactional, row-count assertion).
PII guard for Swedish enskild firma went through two QA gates. Gate D found 170 winners flagged with ~150 false positives (Aktiebolaget X, MOmentum Industrial, Hushållningssällskapet Västra). Gate E (Reality Checker) found three false-NEGATIVE blockers (apostrophe-cap, Mc/Mac/De prefix, single-letter middle initial) — all GDPR-relevant. After fixes: 56/56 unit tests pass; live data reclassified 170 → 94 flagged (76 TRUE→FALSE, 0 FALSE→TRUE).
This is the second probe (after Path 3B) that killed an SE procurement data-source hypothesis on the same structural reason. Standing rule reinforced: probe → design → QA → implement → QA → commit. See Build Inventory for the shipped/stub registry.
Selection criteria
A commit qualifies as notable if at least one of:
- Irreversible — schema change, security primitive change, data deletion.
- Scope — touches more than five files in different subsystems.
- Surfaces a bug class — exposes a category of bugs (mock validation, double-transpilation, JSONB-as-column, blocked APIs).
Commits like seeds: sync, mulch: update expertise, chore: remove temporary test companies, and the duplicate f6c4d30 are excluded as housekeeping.
See also
History Overview, System Overview, Known Issues.