PHI Exposure Audit: 8 PHI Types Found in 6.6M Clinical Events/Week

The setup.

FormDR is a healthcare forms platform that processes clinical intake, questionnaires, consent documents, and patient communications. The platform handles 6.6M clinical events per week across 12 major practice deployments, supporting everything from routine check-in forms to mental health screening instruments.

PostHog was deployed as the analytics layer — chosen for its HIPAA-compliant cloud offering with a signed BAA in place. The configuration was standard: autocapture enabled on most surfaces, session recording active for UX analysis, GeoIP enrichment on by default. Everything the documentation said should work for a BAA-covered deployment.

The team had never audited what the analytics pipeline was actually ingesting. HIPAA compliance was assumed by contract, but no end-to-end data flow mapping had ever been done. The BAA covered PostHog as a processor. It did not cover what the client’s own SDK configuration was sending into the pipeline.

The Assumption Gap

BAA signed with PostHog — covered the processor, not the shipper
Autocapture running on patient-facing clinical form surfaces
No end-to-end data flow mapping ever performed
Session replay and console log capture enabled on clinical surfaces
6.6M events/week flowing through untouched ingestion pipeline

What we found.

A systematic audit of PostHog event properties, autocapture output, SDK configuration, session recording metadata, and tracked URL patterns identified 8 distinct categories of PHI exposure in the live analytics pipeline. Each finding was classified against 45 CFR 164.514(b)(2) Safe Harbor de-identification standards.

Category 1

Clinical Questionnaire Responses

Autocapture was recording element text from clinical form fields, including responses to PHQ-9 depression screening, GAD-7 anxiety assessment, and suicide risk evaluation instruments. The text values of radio button groups and dropdown selections on mental health forms were flowing directly into PostHog event properties.

Found in: autocapture element text on patient-facing clinical forms

Category 2

Patient Demographics

Sex, race, ethnicity, marital status, language preference, insurance relationship, and family relationship data was being captured by autocapture on intake form surfaces. Each of these fields is an enumerated PHI identifier under HIPAA Safe Harbor. The platform had no way to distinguish clinical form surfaces from marketing pages in the autocapture configuration.

Found in: autocapture element text on patient intake forms

Category 3

Medical Record Numbers in URL Parameters

Sequential integer patient record identifiers and encounter IDs were embedded in the URL parameters of tracked pageview and submission events. These were present in every event generated from patient record and clinical workflow pages. Medical record numbers are an explicit HIPAA identifier category under Safe Harbor and cannot be treated as anonymised.

Found in: tracked URL parameters on patient record pages

Category 4

Patient Contact Identifiers in Custom Events

Backend-originated custom events were emitting patient email addresses, phone numbers, and mailing addresses as event property values. Boolean indicators revealing whether SSN, email, phone, or address data had been provided by the patient were also present. The combination of direct identifiers with data-presence flags created a clear PHI exposure path.

Found in: backend custom event properties

Category 5

GeoIP Resolution at Full Fidelity

All events carried PostHog’s default GeoIP enrichment at city, state, postal code, and lat/long resolution. HIPAA Safe Harbor requires geographic data aggregation to the first three digits of ZIP code at minimum. City-level resolution on patient-associated events could identify individuals in sparsely populated areas.

Found in: default GeoIP enrichment on all events

Category 6

Referrer Headers Leaking Source Context

HTTP referrer headers from patient portal access points were being captured in event properties, revealing the specific patient portal URL and inbound path. This created a linkage between the patient’s clinical session and the analytics event stream — information that could associate a specific patient cohort with a specific practice or clinical pathway.

Found in: HTTP referrer header values in event properties

Category 7

Console Errors Containing PHI

Console log capture was active on clinical surfaces. Browser console errors generated during form interactions were being recorded in session replay metadata. These errors sometimes contained patient data values passed in JavaScript stack traces or network error responses. Console logs are typically invisible to most analytics audits.

Found in: session replay console log metadata

Category 8

Anti-Pattern Tracking of Hidden Fields

Several forms had hidden input fields containing patient record context data that were being captured by autocapture alongside visible fields. These fields were not intended for analytics — they were implementation artifacts from form templates. Autocapture does not distinguish hidden from visible fields on the DOM. Hidden PHI fields were being treated the same as visible non-PHI fields.

Found in: autocapture of hidden DOM input fields on forms

The fix: a 4-layer PHI scrubber.

Rather than disable analytics on clinical surfaces entirely — which would have removed the team’s ability to measure product engagement — we designed a layered defence that scrubs PHI at four distinct points in the ingestion pipeline.

Layer 1 — Ingestion-Time Regex Sanitizer

A server-side middleware layer running in the backend ingestion pipeline. Regex patterns matched and redacted known PHI patterns: phone numbers, email addresses, SSN patterns, medical record number formats, and ZIP+4 codes. The sanitizer ran before any event reached the PostHog API — PHI was blocked at the pipeline boundary, not retroactively removed.

Layer 2 — Event Property Allowlisting

A strict allowlist of permitted event property names replaced the original ad-hoc blocklist approach. Only properties explicitly reviewed and approved for analytics were allowed through. Any event property not on the allowlist was dropped at ingestion. This eliminated autocapture surface exposure without disabling autocapture — it just filtered what autocapture could contribute.

Layer 3 — Session Replay Input Masking

PostHog’s session replay input masking was configured at the field level across all clinical form surfaces using CSS selectors. Every input, textarea, select, and hidden field on patient-facing forms was masked. The configuration was validated by replaying sample session recordings and confirming that no unmasked clinical data was visible in replay.

Layer 4 — Weekly PHI Scan Job

An automated scan job runs weekly against the PostHog event database, sampling recent events for any new PHI patterns. The scan checks for: new property names not on the allowlist, unrecognised URL parameter patterns, unmasked session recording elements, and referrer URL patterns from clinical surfaces. Any finding generates an alert to the engineering team with the specific event ID and property value that triggered it. The scan tooling was open-sourced.

What you can do now.

Your compliance team has a complete audit trail: 8 PHI exposure categories documented with the exact source location, data type, and regulatory classification for each. Every remediation action is traceable to the specific finding that triggered it. The weekly scan job provides ongoing verification — not a one-time snapshot but continuous monitoring.

Your product team kept full analytics visibility. The 4-layer scrubber didn’t disable analytics — it filtered out PHI while preserving all non-PHI event data and session recordings. Engagement metrics, feature adoption, funnel analysis, and cohort retention all remained intact. No data loss. No blind spots.

The scan tooling is open-source and reusable. Any PostHog deployment with HIPAA, GDPR, or SOC 2 requirements can adopt the same weekly scan approach. The patterns and allowlist methodology are documented and transferable — not tied to the specific SDK configuration that triggered the audit.

See PostHog Setup Offers → Book a 15-Minute Call

What this looks like for your company

Analytics PHI Audit.

A structured end-to-end data flow audit of your analytics pipeline — identifying every PHI exposure point before a compliance review or breach notification finds them for you.

End-to-end data flow mapping: SDK configuration, autocapture surfaces, event properties, URL patterns, session recording settings, and referrer exposure
Live event property sampling across all tracked surfaces to determine what is actually being collected vs. assumed
PHI classification with regulatory citation for each finding
4-layer scrubber design: ingestion sanitizer, property allowlist, replay masking, and automated scan job
Post-fix validation methodology with weekly scan automation

$3,497 · 10 days

Right for you if

Running PostHog or any analytics tool on healthcare or regulated data surfaces
Signed a BAA but never audited what your SDK sends before the BAA-covered processor
Autocapture or session recording enabled on surfaces that might handle PHI

See Full Details → Start a Conversation

How a routine data flow mapping exercise surfaced 8 categories of Protected Health Information in a live analytics pipeline.

The setup.

What we found.

The fix: a 4-layer PHI scrubber.

The result.

What you can do now.

Analytics PHI Audit.

Do you know what’s in your analytics events?