PostHog PII and PHI Exposure: Where Personal Data Hides and How to Audit It

The short version

The distinct_id is the highest-leverage exposure point — if it is an email address, username, or phone number, 100% of your PostHog data has PII attached to every event
Autocapture records element text from the DOM, which can include form field values, clinical questionnaire responses, and demographic data on patient-facing surfaces
GeoIP resolves to city and coordinate level by default — below HIPAA’s Safe Harbor threshold. The raw IP ($ip) is also stored and is PII under GDPR
URLs in pageview events can contain record IDs, email addresses, JWT tokens, and OAuth codes depending on your URL structure and auth flow
Authentication tokens (JWTs, session tokens) in tracked URLs are a dual risk: identity exposure and credential exfiltration via analytics
Session recording captures keystrokes and form interactions unless explicitly masked per surface
Console log capture records JS console output — which often contains user objects, API error responses, and state dumps with PII that was never intended for production logging
Backend SDK events bypass front-end review and commonly carry direct identifiers passed via raw objects
This list is not exhaustive. PostHog maintains detailed privacy documentation; review it alongside any compliance audit

The problem with “we’re not collecting PII”

Most teams that end up with PII or Protected Health Information (PHI) in their PostHog instance didn’t put it there deliberately. The exposure usually comes from:

A default feature that was never reviewed against the data it was running on
An SDK integration that carried more properties than the engineer realised
A URL structure that embeds identifiers in tracked page paths
An identity call that set a user property with a real name or email address

The gap is rarely between “we wanted to collect PHI” and “we didn’t.” It’s between “we checked the events we built” and “we checked everything the SDK was doing.” Those are different audits.

For HIPAA-covered entities, this distinction matters under 45 CFR §164.514(b)(2) — the Safe Harbor de-identification standard. Data that appears anonymised at the event level may still be PHI if it carries geographic data below the state level, dates alongside medical context, or account numbers that function as patient identifiers.

For GDPR purposes, the threshold is any data that is or could be used to identify a natural person — which is a broader category than HIPAA’s enumerated identifier list.

The checklist below covers the most common PostHog exposure points. It is based on common patterns found in analytics audits, supplemented by PostHog’s own privacy documentation. It is not a substitute for a full compliance review.

1. Autocapture

Autocapture is PostHog’s feature for automatically tracking user interactions — clicks, form interactions, input changes, and page events — without requiring manual event instrumentation. It is enabled by default.

The risk depends entirely on which pages it is running on and what content those pages display. On a marketing site or logged-out onboarding flow, autocapture is largely harmless. On a patient intake form, a clinical questionnaire, or any page where users type or select personal information, autocapture captures that content as element text in the event payload.

High risk surfaces for autocapture

Patient-facing form pages (intake, demographics, clinical screening)
Any form that collects health, financial, or identity information
Account settings pages where users update name, address, or contact details
Checkout and billing pages where card details or addresses are entered
Internal admin tools where patient or user records are displayed

What to check

Is autocapture: true in your PostHog initialisation config?
Are there any patient-facing, clinical, or form-heavy pages in your app where autocapture is not explicitly disabled?
Have you sampled live autocapture events from these surfaces and reviewed the $elements property to see what element text is being captured?

How to address it

PostHog provides two main mechanisms. The first is to set autocapture: false globally and enable it selectively only on surfaces you have reviewed. The second is to use PostHog’s URL allowlist or blocklist to exclude specific paths from autocapture. For healthcare and financial applications, the global-off approach is generally safer — you instrument what you need explicitly rather than excluding what you don’t want.

PostHog also provides the ph-no-capture CSS class for marking individual elements to exclude from autocapture, which is useful for surgical exclusions within a page that is otherwise safe to track.

Review PostHog’s official autocapture documentation for the current configuration options, as the implementation has evolved across SDK versions.

2. GeoIP enrichment and raw IP address

PostHog automatically enriches all events with two types of location data. The first is the raw IP address, stored in the $ip property. The second is derived geographic data — country, region, city, postal code, and latitude/longitude coordinates — resolved from that IP via GeoIP lookup.

Both matter for compliance, but for different reasons.

Raw IP address ($ip)

Under GDPR, an IP address is personal data. It can identify a natural person, either directly or in combination with other information held by a controller or processor. PostHog stores the IP in the $ip event property by default. If you are processing events from EU-based users without a lawful basis covering IP retention, this is a GDPR exposure regardless of whether you use GeoIP enrichment.

You can prevent PostHog from capturing the IP by setting ip: false in the JS SDK init config, or by stripping $ip via PostHog’s property filter before ingestion. On self-hosted instances, IP processing can be disabled at the ingestion pipeline level.

GeoIP enrichment

For most SaaS applications, city-level location data is acceptable. For healthcare applications, it is not. Under HIPAA’s Safe Harbor de-identification standard (45 CFR §164.514(b)(2)(i)), geographic subdivisions smaller than a state must be aggregated to the first three digits of the ZIP code, and any area with a population below 20,000 must be suppressed entirely.

Full coordinates, city names, and postal codes on patient-associated events fail this threshold by definition. This applies to every event from patient-facing surfaces — not just form submission events.

What to check

Is the $ip property present on events from EU-based users? Do you have a lawful basis for retaining it?
Are GeoIP properties ($geoip_city_name, $geoip_postal_code, $geoip_latitude, $geoip_longitude) present on events from patient-facing or covered surfaces?
Does your BAA with PostHog (if applicable) cover GeoIP data processing?

How to address it

To suppress IP capture: set person_profiles: 'identified_only' and ip: false in the JS SDK, or use PostHog’s property filter to drop $ip at ingestion. To suppress GeoIP: use the property filter to drop $geoip_* fields, or disable GeoIP enrichment on self-hosted instances at the ingestion stage. Neither change retroactively removes data already ingested.

3. Record identifiers and PII in tracked URLs

PostHog automatically captures the current URL in pageview, pageleave, and other events. The $current_url property is present on most events by default.

If your application URL structure embeds user identifiers, record IDs, or email addresses in the path or query string, those will appear in PostHog event data across every pageview event.

Common patterns to check:

/patients/12345/records — numeric patient IDs in path segments
/appointments/67890 — appointment or encounter IDs that map to patient records
/users/[email protected] — email addresses embedded in URL paths
/[email protected] — email addresses in query parameters

Under HIPAA Safe Harbor, medical record numbers and account numbers that could be used to identify an individual are enumerated identifiers — they cannot appear in de-identified data. Sequential integers in application URLs that map to patient records qualify as medical record numbers for this purpose.

What to check

Query your PostHog instance for distinct URL patterns in $current_url values across recent pageview events
Identify any path segments that contain numeric IDs or identifying strings
Determine whether those IDs map to patient or user records
Check query parameters for email addresses, names, or any PII passed in the URL

How to address it

PostHog supports URL sanitisation via the sanitize_properties configuration option in the JavaScript SDK, allowing you to modify or strip specific properties before they are sent. The better long-term fix is URL architecture: using opaque identifiers (UUIDs with no external meaning) rather than sequential integers in application URLs, so even captured URLs carry no identifying information.

4. Authentication tokens in the analytics pipeline

JSON Web Tokens (JWTs) and session tokens appearing in PostHog event data are a distinct exposure category — different in kind from a patient ID in a URL, and often missed in standard privacy reviews because they don’t look like PII on the surface.

The risk is twofold. First, a JWT typically carries a signed payload containing user identity claims: user ID, email, role, organisation, and sometimes additional profile data. If a JWT appears in PostHog event properties, that payload is effectively exposing user identity through your analytics pipeline. Second, a valid authentication token in an analytics system is a credential exfiltration risk — not just a de-identification failure. Anyone with read access to PostHog data could, in principle, extract tokens that were valid at the time of capture.

Where tokens appear in PostHog events

In $current_url: URLs with ?token=eyJ... or /auth/callback?code=... are captured on every pageview
In $referrer: If a token-bearing URL is the referrer for the next pageview, it appears in the referrer property
In custom event properties: Backend SDK calls that include a token in the event payload — often by passing a full request object
In $set person properties: Auth context passed to posthog.identify() that inadvertently includes a token field

Tokens in URLs are the most common source. Many authentication flows pass tokens as query parameters during the OAuth callback, magic link confirmation, or password reset steps. If PostHog is initialised before those redirects complete, the token-bearing URL is captured as a pageview.

What to check

Query PostHog for $current_url values containing token=, code=, eyJ (JWT prefix), or access_token=
Check $referrer values for the same patterns — a token in a redirected-from URL appears here
Search your backend PostHog SDK calls for any properties derived from request headers, authorization context, or session objects
Inspect posthog.identify() calls for any property that looks like a token or bearer credential

How to address it

The primary fix is URL sanitisation: use the sanitize_properties config option to strip or redact token-bearing query parameters before PostHog captures the URL. For OAuth flows, ensure PostHog does not initialise until after the token exchange is complete and the URL has been cleaned. For backend SDK calls, enforce an explicit event schema that does not include any field derived from auth context. Never pass request objects or session objects directly as event properties.

Analytics Compliance

Need a PostHog compliance audit?

We run structured reviews of analytics SDK configurations, event property audits, and HIPAA Safe Harbor assessments for healthcare SaaS. If you’re not certain what your PostHog instance is capturing, a 15-minute conversation is usually enough to scope the work.

Start a Conversation →

5. Session recording and heatmaps

Session recording captures a replay of a user’s interactions: DOM state, mouse movements, scroll position, and keystrokes. It is PostHog’s most powerful UX research tool and its highest-risk feature from a PII perspective.

Session recording on a form page that processes personal data captures everything the user types — names, dates of birth, addresses, health responses, card numbers. If the recording is from a patient-facing surface, that content is PHI. PostHog does not automatically mask form fields; masking must be explicitly configured.

Heatmaps aggregate click and scroll patterns at the page level. They do not typically capture form field content as text, but the heatmap infrastructure processes the page DOM during data collection. On pages with PHI content, this warrants the same scrutiny as session recording.

Configuration flags to check

disable_session_recording — must be true on any surface processing PHI or sensitive PII
maskAllInputs — set to true as a baseline if session recording runs on any authenticated surface
maskInputOptions — for surgical masking of specific input types (passwords, email, etc.)
blockSelector — use to block entire DOM subtrees (e.g., a patient info panel) from recording
enable_heatmaps — review which pages this is active on

What to check

Is session recording enabled (disable_session_recording: false)? On which pages?
Have you reviewed session recordings from any authenticated or form-heavy pages to verify what is actually captured?
Is maskAllInputs set globally, or are individual inputs explicitly masked?
Are heatmaps enabled on pages where patient or user data is displayed?

How to address it

For any surface that processes PHI or sensitive PII, disable session recording using posthog.stopSessionRecording() called conditionally based on the current path, or set disable_session_recording: true in the init config and enable it selectively only on surfaces you have reviewed. If session recording runs on authenticated surfaces, set maskAllInputs: true as a baseline minimum and add blockSelector rules for any component that displays patient or user records.

6. Console log capture

Console log capture is a PostHog feature that records JavaScript console.log, console.warn, and console.error output and attaches it to session recordings. It is enabled via the capture_console_log option and appears as log entries within the session recording timeline in PostHog.

This matters because many applications log user-context data to the console during development — and those console statements are never removed before production. Common patterns:

Debug logging that prints user objects: console.log('User loaded:', user) — where user contains email, name, and ID
Error logging that includes request context: API errors that echo back user IDs, email addresses, or patient identifiers from the failed request
State management logging (Redux DevTools, Zustand debug mode) that prints full application state including any loaded user or patient data
Third-party library logs that include session tokens, API keys, or auth context

The risk is easy to underestimate because console log content is not reviewed during a standard analytics audit — it requires inspecting session recordings, not event properties. In practice, console logs can carry highly sensitive data that was never intended to be visible outside of local development.

What to check

Is capture_console_log: true (or capture_console_log_opt_in: true) in your PostHog init config?
Open a session recording for an authenticated user and review the console log panel. What is actually being logged?
Search your codebase for console.log calls that include user objects, API responses, or any variable that could contain PII

How to address it

Disable console log capture by setting capture_console_log: false in the PostHog init config. This is generally the right default for any production environment that processes personal data. If console logs are genuinely needed for debugging in production, consider a structured logging approach that sanitises PII before output — but this is a broader application hygiene question that extends beyond PostHog.

7. Person properties and $identify calls

When you call posthog.identify() in the JavaScript SDK, you are setting a distinct ID for the current user and optionally setting person properties. Person properties persist on the PostHog person profile indefinitely — they are not event-level data, they are durable user-level attributes.

If a $set call includes a user’s name, email address, phone number, or any other PII, that data lives on the person profile in PostHog until it is explicitly deleted. This is separate from the event data — and separate from what gets cleaned up when you archive or delete events.

Common PII in person properties

email — set directly from authentication context
name or $name — full name from user profile
phone — from account settings or signup flow
company — organisation name that could identify a small-practice user
Custom properties from staff login events (e.g., staff_email, practitioner_name)

For GDPR, person profiles containing name and email constitute personal data requiring a lawful basis for processing, subject access request obligations, and deletion on request. PostHog provides a person deletion endpoint, but the obligation to honour it is yours.

For HIPAA, person properties linking staff identity to patient interaction events can create an association that constitutes PHI — particularly if the person profile also contains event history from patient-facing surfaces.

What to check

Search your codebase for all calls to posthog.identify() and posthog.people.set()
List every property being set in those calls
Determine whether any of those properties contain PII directly, or a key that maps to PII in your database
Review whether staff identity is being set on the same person profile as events from patient-facing surfaces

How to address it

Replace direct PII identifiers with internal hashed or opaque IDs. Instead of identifying users by email, use a non-reversible hash of the user ID so the PostHog person profile cannot be matched to a real person without access to your internal mapping table. Do not set name, email, or phone as person properties unless you have a specific product reason and have assessed the data protection implications.

8. Backend and server-side events

PostHog provides server-side SDKs (Python, Node.js, Ruby, Go, PHP, and others) for capturing events from your backend. Backend events are often overlooked in privacy audits because they don’t go through the browser SDK and are harder to discover by inspecting network traffic in DevTools.

Common patterns that introduce PII via backend events:

User lifecycle events (user_created, account_activated) that pass the raw user object as event properties
Form submission or processing events that carry the submission payload
Webhook processing events that echo back the incoming payload (which may contain customer data)
Email or notification events that include recipient addresses as properties
Billing events that include customer name, billing address, or last-four card digits

The most common pattern: an engineer adds a contact_created event and passes the contact record object directly as properties. The object contains every field in the database row — including name, email, phone, address, and any identifiers. It’s the path of least resistance in the moment and creates a compliance problem that is easy to miss.

What to check

Search your backend codebases for all PostHog SDK calls
For each server-side event, list the properties being passed
Check whether any property is an object or array that may expand to contain more fields than intended
Review whether user or contact IDs are direct identifiers (e.g., sequential integers, email addresses) or opaque internal references

How to address it

Define an explicit event schema for each backend event and enforce it at the call site — pass only the properties in the schema, not the full data object. Replace direct identifiers (email, name) with hashed or opaque internal IDs. If you need to correlate backend events to frontend events, use a consistent internal ID rather than a PII-containing identifier as the distinct_id.

9. The distinct_id: your universal event identifier

Every event in PostHog carries a distinct_id — the identifier that links events to a person profile. It is set when you call posthog.identify(), and it persists across sessions until reset.

This is the single highest-leverage exposure point in the entire post, because it affects every event in your PostHog project — not just specific event types or surfaces. If your distinct_id is a PII-containing value, then 100% of your PostHog data has PII attached to it by definition.

Common distinct_id patterns that introduce PII:

Email address — the most common mistake. PostHog even uses email as the default suggested distinct_id in some documentation examples. Any email used as a distinct_id links identity to every event, feature flag call, and session recording in your project.
Username or display name — directly identifying and often searchable
Sequential integer database ID — not PII on its own, but if it maps directly to a patient or user record and appears alongside demographic event data, it functions as an identifier under re-identification risk analysis
Phone number — occasionally used as a user identifier in consumer apps, carries the same risk as email

The problem compounds because the distinct_id also appears in feature flag evaluation events (auto-emitted), group events, session recording metadata, and all backend SDK events. A PII-containing distinct_id that is set in the browser propagates to every part of your PostHog data.

What to check

What value are you passing to posthog.identify()? Search your codebase for every call.
Inspect person profiles in PostHog — the ID shown at the top of each person is the distinct_id. Are those recognisable as email addresses, names, or phone numbers?
Is the same distinct_id used consistently across frontend and backend SDK calls?
If you have anonymous pre-identification events (before identify() is called), what happens when those events are merged into the identified profile? Does the merged profile now carry PII?

How to address it

Use an internal opaque ID — a non-sequential UUID or a one-way hash of your internal user ID — as the PostHog distinct_id. This ID should be meaningless outside your own system: it cannot be reverse-engineered to an email address or matched to a patient record without access to your internal mapping table. Never use email, username, or phone as the distinct_id. If your current setup uses PII as the distinct_id, migrating requires aliasing the old ID to a new opaque ID via posthog.alias() — which preserves event history while replacing the identifier going forward.

10. Feature flags and group analytics

Feature flag evaluation events ($feature_flag_called) are emitted automatically when PostHog evaluates a flag for a user. These events inherit whatever distinct_id is in use — which is why the distinct_id section above is the primary fix for feature flag privacy exposure. There is no separate flag-specific remediation if the underlying identifier is already opaque.

Group analytics allows you to associate events with a group such as an organisation, account, or practice. Group properties persist on the group profile and can include any data you set — company name, plan tier, contact name, billing email. If your groups are small enough that a group is individually identifiable (a solo-practitioner medical practice, for example), the group profile itself constitutes an identifiable entity.

What to check

Is your distinct_id opaque? (Covered above — but feature flag events make this more urgent since they fire automatically)
What properties are set on group profiles? List them and assess whether any are PII
Are any groups small enough that the group name or ID could identify a specific individual?
Is a contact email, owner name, or billing address stored as a group property?

How to address it

Apply the same opaque ID principle to group identifiers — use an internal group ID rather than a company name or email domain as the group key. For group properties, only set what is necessary for product analytics (plan, account type, created date) and avoid setting contact-level details that constitute PII.

The audit checklist

Use this as a starting point for an internal review. It is not exhaustive — PostHog’s privacy documentation and your specific application architecture will surface additional items.

Area	What to check	Risk if unchecked
distinct_id	What value is passed to identify()? Is it opaque or PII-containing?	High — affects every event in the project
Autocapture	Which surfaces is it running on? Sample $elements on events from those surfaces	High on clinical/form surfaces
Raw IP ($ip)	Is $ip present on events from EU users? Do you have a lawful basis?	High for GDPR
GeoIP enrichment	Check $geoip_* properties on events from covered surfaces	High for HIPAA; Medium for GDPR
URL record IDs & PII	Query distinct $current_url values; check for numeric IDs and emails in paths	High if patient record IDs present
Auth tokens in analytics	Query $current_url and $referrer for token=, eyJ, code= patterns	High — credential + identity exposure
Session recording	Is it enabled? Which pages? Is maskAllInputs on?	High on form pages without masking
Console log capture	Check capture_console_log; review session recordings for what is actually logged	Medium–High — depends on log content
Heatmaps	Check enable_heatmaps; review whether it runs on sensitive pages	Medium
Person properties	Audit all identify() and $set calls; list every property set	High — persists indefinitely on person profile
Backend events	Grep backend codebase for PostHog SDK calls; review all properties passed	High — bypasses front-end review
Feature flags	Auto-emitted events inherit distinct_id — fix that first	Resolved by opaque distinct_id
Group analytics	List all group properties; assess whether groups are individually identifiable	Medium — depends on group size and properties

A note on BAAs, hosting, and the difference they make

PostHog offers a HIPAA Business Associate Agreement (BAA) on paid plans without requiring an enterprise contract — which distinguishes it from Mixpanel and Amplitude, where BAA access typically requires enterprise pricing. However, a BAA does not substitute for a compliant configuration. It establishes shared responsibility; it does not prevent PHI from entering the analytics pipeline in the first place.

PostHog also supports self-hosting via Docker or Kubernetes, which means all data stays within your own infrastructure. For organisations with strict data residency requirements, self-hosting removes the question of data processed by a third-party service entirely. The tradeoff is infrastructure maintenance overhead.

Whether you are on PostHog Cloud or self-hosted, the configuration audit described above applies equally. Self-hosting does not eliminate the compliance risk of misconfigured autocapture, URL tracking, or session recording — it changes where the data goes, not what gets collected.

For current BAA terms, pricing, and self-hosting documentation, refer to PostHog’s official HIPAA compliance and privacy pages directly. These terms change and the linked documentation will be more current than anything cited here.

The harder question: what about data already collected?

Fixing your PostHog configuration stops new PHI from being ingested. It does not remove PHI from events already in your PostHog project.

PostHog provides several mechanisms for historical data remediation:

Person deletion: PostHog’s API supports deleting a person profile and all associated events. For GDPR right-to-erasure requests, this is the primary path
Event property filtering: The PostHog property filter app can retroactively drop specified properties from all events. This works at ingestion for new data and can be applied to define which properties appear in queries, but it does not guarantee removal from raw storage
Project reset: If the exposure is pervasive and historical data is not required, resetting the PostHog project removes all data. This is a significant step and requires careful planning

For HIPAA, the presence of PHI in historical analytics data may require notification obligations under the Breach Notification Rule (45 CFR § 164.400–414) depending on the nature of the disclosure. This is a legal assessment, not a technical one. Engage your privacy counsel before determining whether historical data remediation is sufficient or whether breach notification obligations apply.

PostHog’s own resources

PostHog maintains detailed documentation on privacy configuration that goes well beyond what is covered here. The checklist in this article reflects common patterns found in audits — PostHog’s documentation reflects the full range of configuration options, which change across SDK versions.

Key areas to review in PostHog’s official documentation:

Autocapture configuration and the ph-no-capture CSS class
Session recording privacy controls and input masking options
HIPAA compliance guide and BAA information
GDPR compliance and data subject request handling
Self-hosting options and data residency
The PostHog property filter app for retroactive property removal

The PostHog team is generally responsive on these topics and has published detailed guides on HIPAA-compliant configurations. Use those guides alongside this checklist, not instead of it.

Analytics audit

Find out what your PostHog setup is actually capturing — and what it's missing.

PII exposure is one layer of a larger instrumentation problem. The Analytics Audit maps every gap in your tracking plan — data quality, missing events, misattributed properties — and delivers a prioritised fix roadmap.

Join the Analytics Audit cohort →

The short version

The problem with “we’re not collecting PII”

1. Autocapture

High risk surfaces for autocapture

What to check

How to address it

2. GeoIP enrichment and raw IP address

Raw IP address ($ip)

GeoIP enrichment

What to check

How to address it

3. Record identifiers and PII in tracked URLs

What to check

How to address it

4. Authentication tokens in the analytics pipeline

Where tokens appear in PostHog events

What to check

How to address it

Need a PostHog compliance audit?

5. Session recording and heatmaps

Configuration flags to check

What to check

How to address it

6. Console log capture

What to check

How to address it

7. Person properties and $identify calls

Common PII in person properties

What to check

How to address it

8. Backend and server-side events

What to check

How to address it

9. The distinct_id: your universal event identifier

What to check

How to address it

10. Feature flags and group analytics

What to check

How to address it

The audit checklist

A note on BAAs, hosting, and the difference they make

The harder question: what about data already collected?

PostHog’s own resources

Find out what your PostHog setup is actually capturing — and what it's missing.

Not sure what your PostHog is capturing?