Case Study — Healthcare Forms Platform

How a routine data flow mapping exercise surfaced 8 categories of Protected Health Information in a live analytics pipeline.

FormDR processes 6.6M clinical events per week. HIPAA compliance was assumed but never audited end-to-end. One data flow mapping exercise changed everything.

Stack PostHog
8
PHI categories found in the analytics pipeline
6.6M
Clinical events per week flowing through unscrubbed surfaces
4
Layer PHI scrubber — ingestion sanitizer, allowlist, masking, weekly scan
6 months
Zero PHI leaks post-remediation — passed external HIPAA audit

The setup.

FormDR is a healthcare forms platform that processes clinical intake, questionnaires, consent documents, and patient communications. The platform handles 6.6M clinical events per week across 12 major practice deployments, supporting everything from routine check-in forms to mental health screening instruments.

PostHog was deployed as the analytics layer — chosen for its HIPAA-compliant cloud offering with a signed BAA in place. The configuration was standard: autocapture enabled on most surfaces, session recording active for UX analysis, GeoIP enrichment on by default. Everything the documentation said should work for a BAA-covered deployment.

The team had never audited what the analytics pipeline was actually ingesting. HIPAA compliance was assumed by contract, but no end-to-end data flow mapping had ever been done. The BAA covered PostHog as a processor. It did not cover what the client’s own SDK configuration was sending into the pipeline.

The Assumption Gap
  • BAA signed with PostHog — covered the processor, not the shipper
  • Autocapture running on patient-facing clinical form surfaces
  • No end-to-end data flow mapping ever performed
  • Session replay and console log capture enabled on clinical surfaces
  • 6.6M events/week flowing through untouched ingestion pipeline

What we found.

A systematic audit of PostHog event properties, autocapture output, SDK configuration, session recording metadata, and tracked URL patterns identified 8 distinct categories of PHI exposure in the live analytics pipeline. Each finding was classified against 45 CFR 164.514(b)(2) Safe Harbor de-identification standards.

Category 1
Clinical Questionnaire Responses
Autocapture was recording element text from clinical form fields, including responses to PHQ-9 depression screening, GAD-7 anxiety assessment, and suicide risk evaluation instruments. The text values of radio button groups and dropdown selections on mental health forms were flowing directly into PostHog event properties.
Found in: autocapture element text on patient-facing clinical forms
Category 2
Patient Demographics
Sex, race, ethnicity, marital status, language preference, insurance relationship, and family relationship data was being captured by autocapture on intake form surfaces. Each of these fields is an enumerated PHI identifier under HIPAA Safe Harbor. The platform had no way to distinguish clinical form surfaces from marketing pages in the autocapture configuration.
Found in: autocapture element text on patient intake forms
Category 3
Medical Record Numbers in URL Parameters
Sequential integer patient record identifiers and encounter IDs were embedded in the URL parameters of tracked pageview and submission events. These were present in every event generated from patient record and clinical workflow pages. Medical record numbers are an explicit HIPAA identifier category under Safe Harbor and cannot be treated as anonymised.
Found in: tracked URL parameters on patient record pages
Category 4
Patient Contact Identifiers in Custom Events
Backend-originated custom events were emitting patient email addresses, phone numbers, and mailing addresses as event property values. Boolean indicators revealing whether SSN, email, phone, or address data had been provided by the patient were also present. The combination of direct identifiers with data-presence flags created a clear PHI exposure path.
Found in: backend custom event properties
Category 5
GeoIP Resolution at Full Fidelity
All events carried PostHog’s default GeoIP enrichment at city, state, postal code, and lat/long resolution. HIPAA Safe Harbor requires geographic data aggregation to the first three digits of ZIP code at minimum. City-level resolution on patient-associated events could identify individuals in sparsely populated areas.
Found in: default GeoIP enrichment on all events
Category 6
Referrer Headers Leaking Source Context
HTTP referrer headers from patient portal access points were being captured in event properties, revealing the specific patient portal URL and inbound path. This created a linkage between the patient’s clinical session and the analytics event stream — information that could associate a specific patient cohort with a specific practice or clinical pathway.
Found in: HTTP referrer header values in event properties
Category 7
Console Errors Containing PHI
Console log capture was active on clinical surfaces. Browser console errors generated during form interactions were being recorded in session replay metadata. These errors sometimes contained patient data values passed in JavaScript stack traces or network error responses. Console logs are typically invisible to most analytics audits.
Found in: session replay console log metadata
Category 8
Anti-Pattern Tracking of Hidden Fields
Several forms had hidden input fields containing patient record context data that were being captured by autocapture alongside visible fields. These fields were not intended for analytics — they were implementation artifacts from form templates. Autocapture does not distinguish hidden from visible fields on the DOM. Hidden PHI fields were being treated the same as visible non-PHI fields.
Found in: autocapture of hidden DOM input fields on forms

The fix: a 4-layer PHI scrubber.

Rather than disable analytics on clinical surfaces entirely — which would have removed the team’s ability to measure product engagement — we designed a layered defence that scrubs PHI at four distinct points in the ingestion pipeline.

Layer 1 — Ingestion-Time Regex Sanitizer
A server-side middleware layer running in the backend ingestion pipeline. Regex patterns matched and redacted known PHI patterns: phone numbers, email addresses, SSN patterns, medical record number formats, and ZIP+4 codes. The sanitizer ran before any event reached the PostHog API — PHI was blocked at the pipeline boundary, not retroactively removed.
Layer 2 — Event Property Allowlisting
A strict allowlist of permitted event property names replaced the original ad-hoc blocklist approach. Only properties explicitly reviewed and approved for analytics were allowed through. Any event property not on the allowlist was dropped at ingestion. This eliminated autocapture surface exposure without disabling autocapture — it just filtered what autocapture could contribute.
Layer 3 — Session Replay Input Masking
PostHog’s session replay input masking was configured at the field level across all clinical form surfaces using CSS selectors. Every input, textarea, select, and hidden field on patient-facing forms was masked. The configuration was validated by replaying sample session recordings and confirming that no unmasked clinical data was visible in replay.
Layer 4 — Weekly PHI Scan Job
An automated scan job runs weekly against the PostHog event database, sampling recent events for any new PHI patterns. The scan checks for: new property names not on the allowlist, unrecognised URL parameter patterns, unmasked session recording elements, and referrer URL patterns from clinical surfaces. Any finding generates an alert to the engineering team with the specific event ID and property value that triggered it. The scan tooling was open-sourced.

The result.

0
PHI leaks detected in 6 months post-remediation — the weekly scan has produced zero findings
Pass
External HIPAA audit result — no findings related to analytics pipeline
4
Layer defence: ingestion sanitizer, property allowlist, replay masking, weekly scan
8
PHI exposure categories closed with documented remediation per finding
1
Open-source PHI scan tool published — reusable for any PostHog deployment
100%
Analytics retention — product team kept visibility without exposing PHI

What you can do now.

Your compliance team has a complete audit trail: 8 PHI exposure categories documented with the exact source location, data type, and regulatory classification for each. Every remediation action is traceable to the specific finding that triggered it. The weekly scan job provides ongoing verification — not a one-time snapshot but continuous monitoring.

Your product team kept full analytics visibility. The 4-layer scrubber didn’t disable analytics — it filtered out PHI while preserving all non-PHI event data and session recordings. Engagement metrics, feature adoption, funnel analysis, and cohort retention all remained intact. No data loss. No blind spots.

The scan tooling is open-source and reusable. Any PostHog deployment with HIPAA, GDPR, or SOC 2 requirements can adopt the same weekly scan approach. The patterns and allowlist methodology are documented and transferable — not tied to the specific SDK configuration that triggered the audit.

Jake McMahon
Jake McMahon
ProductQuant

10 years building analytics and growth systems for B2B SaaS at $1M–$50M ARR. BSc Behavioural Psychology, MSc Data Science. HIPAA analytics compliance is not about the BAA — it’s about what your SDK configuration actually sends before the BAA-covered processor touches it. The gap between assumed compliance and actual data flow is usually where the exposure lives.

What this looks like for your company

Analytics PHI Audit.

A structured end-to-end data flow audit of your analytics pipeline — identifying every PHI exposure point before a compliance review or breach notification finds them for you.

  • End-to-end data flow mapping: SDK configuration, autocapture surfaces, event properties, URL patterns, session recording settings, and referrer exposure
  • Live event property sampling across all tracked surfaces to determine what is actually being collected vs. assumed
  • PHI classification with regulatory citation for each finding
  • 4-layer scrubber design: ingestion sanitizer, property allowlist, replay masking, and automated scan job
  • Post-fix validation methodology with weekly scan automation
$3,497 · 10 days
Right for you if
  • Running PostHog or any analytics tool on healthcare or regulated data surfaces
  • Signed a BAA but never audited what your SDK sends before the BAA-covered processor
  • Autocapture or session recording enabled on surfaces that might handle PHI

Do you know what’s in your analytics events?

A 15-minute call is enough to know whether what we do is relevant to where you are. No pitch. Just a conversation about your specific situation.