The Problem
Family history is a data problem disguised as a hobby. The raw material is century-old handwritten civil registers, typewritten certificates, scanned photographs, newspaper clippings, census images, and online profiles of wildly varying reliability — heterogeneous, unstandardized, and mostly trapped on paper or behind archive viewers.
The challenge isn't storing names. It's reading documents that range from clean typescript to 19th-century cursive in a foreign bureaucratic hand; aligning each fact to the correct person when names repeat across generations and records disagree; and tracking provenance — which record supports which claim, at what confidence — so conflicting sources can be adjudicated instead of silently overwritten.
Do all of that across ~1,400 pages and 800+ people without it collapsing into an unmaintainable pile of notes — and prove the pipeline is reliable enough that its output can be trusted.