Name Validation

The most important quality gate in the system. Every candidate name from every extraction strategy passes through isValidPersonName() in src/enrichment/processors/nameUtils.ts before being accepted.

Source: docs/SYSTEM_OVERVIEW.md § Name validation.

Function

export function isValidPersonName(name: string): boolean {
  if (!name || name.length < 3 || name.length > 60) return false;
 
  const words = name.trim().split(/\s+/);
 
  // ~400-entry blocklist: nav items, UI phrases, industry terms, honorifics.
  // Examples: "kontakt", "om", "cookies", "performance", "analytics", "podcast", "golf"
  if (words.some(w => INVALID_NAME_STANDALONE_WORDS.has(w.toLowerCase()))) return false;
 
  // Legal entity suffixes: AB, HB, EF, KB, LLC, Ltd, GmbH, förening.
  if (INVALID_SUFFIXES.some(s => name.toLowerCase().endsWith(s))) return false;
 
  // Compass-direction / region suffixes ("Regionchef Syd").
  if (LOCATION_TERMINATING_WORDS.some(w =>
    name.toLowerCase().endsWith(w.toLowerCase()))) return false;
 
  // No digits or @/\/() in any word
  if (words.some(w => /\d|[@/\\()]/.test(w))) return false;
 
  // 2–5 words, each starting uppercase
  if (words.length < 2 || words.length > 5) return false;
  if (!words.every(w => /^[\p{Lu}]/u.test(w))) return false;
 
  // Reject all-caps strings > 4 chars
  if (/^[\p{Lu}\s]+$/u.test(name) && name.length > 4) return false;
 
  return true;
}

Blocklists

Live in src/enrichment/config.ts (~790 lines). The result of 29 autoresearch rounds — every false positive that got through has an entry added.

Known gap

Warning

Production config (extraction-v7) still has FPR 21.4% — UI phrases like “OSS FÖR”, “LÅT OSS”, “Send Message”, “Plesk Setup Guide”, “Not Found”, “Log In” slip through. Best achieved FPR is 0% (quick-test) and 2.6% (jsonld-v2). Fix is to batch additions to INVALID_NAME_STANDALONE_WORDS. P1 in Known Issues.

See also

Name Reversal, Blocklists, Crawlee Scraper, Lead Scoring, Known Issues.