Scope
Two-day window, 2026-03-26 → 2026-03-27. Three commits introduce Firecrawl as a feature-flagged alternative to Playwright extraction, then patch it.
The experiment ran in parallel with Playwright but was eventually superseded by Crawlee Scraper. Firecrawl remains in the codebase, gated behind USE_FIRECRAWL=true. See Firecrawl.
Commits
8952c8d — 2026-03-26 — feat: Firecrawl LLM extractor — Phase 1 (feature-flagged)
Co-author: Claude Sonnet 4.6.
- New flag
USE_FIRECRAWL=trueroutes website extraction through Firecrawl’s structured JSON extraction instead of Playwright. Playwright remains the default; no existing behaviour changes. src/enrichment/sources/firecrawl.ts— extractor using@mendable/firecrawl-js. Lazy client singleton (test-isolatable via_setClientForTest). Zod schema for contacts, address, services, social_links. Scrapes all contact pages in parallel and merges. AppliesisValidPersonName()+inferRoleType()on LLM output. Throws on missingFIRECRAWL_API_KEY.src/enrichment/sources/website.ts— dispatches to Firecrawl or Playwright.src/enrichment/types.ts—'firecrawl'added towebsite_methodunion.tests/enrichment/firecrawl.test.ts— 25 tests, no API key required.docs/FIRECRAWL_MIGRATION.md— architecture, credit model, phase roadmap.
8f2d1a5 — 2026-03-26 — fix: Firecrawl code review fixes
Sparse body. Code-review patch the same day as introduction.
549dd51 — 2026-03-27 — feat: improve Firecrawl contact extraction with URL guessing, HTML fallback, and better prompt
Co-author: Claude Sonnet 4.6.
- Fixed SDK v4 format:
jsonmust be inline informatsarray (notjsonOptions). - Added
guessContactUrls()— probes 8 common Swedish contact paths via HEAD when Firecrawlmap()returns nothing. Prevents falling back to homepage only. - Added
extractContactsFromHtml()— cheerio fallback for zero-contact LLM results. Parsesh2/h3/h4/strongfor person names with role andmailto:. - Improved LLM prompt: explicitly asks for team members, executives, board members. Swedish section names (
Om oss,Medarbetare,Ledning). - Added HTML format to Firecrawl scrape response for fallback without extra requests.
- Extracted
ENRICH_USER_AGENTto shared constant inconfig.ts(used bydomain.tstoo). - Fixed double
inferRoleType()call. - 13 new tests:
guessContactUrls(4),extractContactsFromHtml(7), html fallback (2).
Significance
- Firecrawl was the first attempt to use an LLM-based extractor for contacts.
- Phase 2 A/B comparison vs Playwright is referenced in
docs/FIRECRAWL_MIGRATION.mdbut never appears in commit history. By 2026-04-01 the project pivots to Crawlee. See History Crawlee Era. - The HTML and URL-guessing fallbacks added in 549dd51 hedge against the LLM returning empty — implicit acknowledgement that the LLM extraction alone was unreliable.
2026-04-02 —
USE_CRAWLEEis the recommended scraper. Firecrawl is dormant unless explicitly enabled.
See also
Firecrawl, Crawlee Scraper, History Crawlee Era, History Overview.