The short version
- The distinct_id is the highest-leverage exposure point — if it is an email address, username, or phone number, 100% of your PostHog data has PII attached to every event
- Autocapture records element text from the DOM, which can include form field values, clinical questionnaire responses, and demographic data on patient-facing surfaces
- GeoIP resolves to city and coordinate level by default — below HIPAA’s Safe Harbor threshold. The raw IP ($ip) is also stored and is PII under GDPR
- URLs in pageview events can contain record IDs, email addresses, JWT tokens, and OAuth codes depending on your URL structure and auth flow
- Authentication tokens (JWTs, session tokens) in tracked URLs are a dual risk: identity exposure and credential exfiltration via analytics
- Session recording captures keystrokes and form interactions unless explicitly masked per surface
- Console log capture records JS console output — which often contains user objects, API error responses, and state dumps with PII that was never intended for production logging
- Backend SDK events bypass front-end review and commonly carry direct identifiers passed via raw objects
- This list is not exhaustive. PostHog maintains detailed privacy documentation; review it alongside any compliance audit
The problem with “we’re not collecting PII”
Most teams that end up with PII or Protected Health Information (PHI) in their PostHog instance didn’t put it there deliberately. The exposure usually comes from:
- A default feature that was never reviewed against the data it was running on
- An SDK integration that carried more properties than the engineer realised
- A URL structure that embeds identifiers in tracked page paths
- An identity call that set a user property with a real name or email address
The gap is rarely between “we wanted to collect PHI” and “we didn’t.” It’s between “we checked the events we built” and “we checked everything the SDK was doing.” Those are different audits.
For HIPAA-covered entities, this distinction matters under 45 CFR §164.514(b)(2) — the Safe Harbor de-identification standard. Data that appears anonymised at the event level may still be PHI if it carries geographic data below the state level, dates alongside medical context, or account numbers that function as patient identifiers.
For GDPR purposes, the threshold is any data that is or could be used to identify a natural person — which is a broader category than HIPAA’s enumerated identifier list.
The checklist below covers the most common PostHog exposure points. It is based on common patterns found in analytics audits, supplemented by PostHog’s own privacy documentation. It is not a substitute for a full compliance review.
1. Autocapture
Autocapture is PostHog’s feature for automatically tracking user interactions — clicks, form interactions, input changes, and page events — without requiring manual event instrumentation. It is enabled by default.
The risk depends entirely on which pages it is running on and what content those pages display. On a marketing site or logged-out onboarding flow, autocapture is largely harmless. On a patient intake form, a clinical questionnaire, or any page where users type or select personal information, autocapture captures that content as element text in the event payload.
High risk surfaces for autocapture
- Patient-facing form pages (intake, demographics, clinical screening)
- Any form that collects health, financial, or identity information
- Account settings pages where users update name, address, or contact details
- Checkout and billing pages where card details or addresses are entered
- Internal admin tools where patient or user records are displayed
What to check
- Is autocapture: true in your PostHog initialisation config?
- Are there any patient-facing, clinical, or form-heavy pages in your app where autocapture is not explicitly disabled?
- Have you sampled live autocapture events from these surfaces and reviewed the $elements property to see what element text is being captured?
How to address it
PostHog provides two main mechanisms. The first is to set autocapture: false globally and enable it selectively only on surfaces you have reviewed. The second is to use PostHog’s URL allowlist or blocklist to exclude specific paths from autocapture. For healthcare and financial applications, the global-off approach is generally safer — you instrument what you need explicitly rather than excluding what you don’t want.
PostHog also provides the ph-no-capture CSS class for marking individual elements to exclude from autocapture, which is useful for surgical exclusions within a page that is otherwise safe to track.
Review PostHog’s official autocapture documentation for the current configuration options, as the implementation has evolved across SDK versions.
2. GeoIP enrichment and raw IP address
PostHog automatically enriches all events with two types of location data. The first is the raw IP address, stored in the $ip property. The second is derived geographic data — country, region, city, postal code, and latitude/longitude coordinates — resolved from that IP via GeoIP lookup.
Both matter for compliance, but for different reasons.
Raw IP address ($ip)
Under GDPR, an IP address is personal data. It can identify a natural person, either directly or in combination with other information held by a controller or processor. PostHog stores the IP in the $ip event property by default. If you are processing events from EU-based users without a lawful basis covering IP retention, this is a GDPR exposure regardless of whether you use GeoIP enrichment.
You can prevent PostHog from capturing the IP by setting ip: false in the JS SDK init config, or by stripping $ip via PostHog’s property filter before ingestion. On self-hosted instances, IP processing can be disabled at the ingestion pipeline level.
GeoIP enrichment
For most SaaS applications, city-level location data is acceptable. For healthcare applications, it is not. Under HIPAA’s Safe Harbor de-identification standard (45 CFR §164.514(b)(2)(i)), geographic subdivisions smaller than a state must be aggregated to the first three digits of the ZIP code, and any area with a population below 20,000 must be suppressed entirely.
Full coordinates, city names, and postal codes on patient-associated events fail this threshold by definition. This applies to every event from patient-facing surfaces — not just form submission events.
What to check
- Is the $ip property present on events from EU-based users? Do you have a lawful basis for retaining it?
- Are GeoIP properties ($geoip_city_name, $geoip_postal_code, $geoip_latitude, $geoip_longitude) present on events from patient-facing or covered surfaces?
- Does your BAA with PostHog (if applicable) cover GeoIP data processing?
How to address it
To suppress IP capture: set person_profiles: 'identified_only' and ip: false in the JS SDK, or use PostHog’s property filter to drop $ip at ingestion. To suppress GeoIP: use the property filter to drop $geoip_* fields, or disable GeoIP enrichment on self-hosted instances at the ingestion stage. Neither change retroactively removes data already ingested.
3. Record identifiers and PII in tracked URLs
PostHog automatically captures the current URL in pageview, pageleave, and other events. The $current_url property is present on most events by default.
If your application URL structure embeds user identifiers, record IDs, or email addresses in the path or query string, those will appear in PostHog event data across every pageview event.
Common patterns to check:
- /patients/12345/records — numeric patient IDs in path segments
- /appointments/67890 — appointment or encounter IDs that map to patient records
- /users/[email protected] — email addresses embedded in URL paths
- /[email protected] — email addresses in query parameters
Under HIPAA Safe Harbor, medical record numbers and account numbers that could be used to identify an individual are enumerated identifiers — they cannot appear in de-identified data. Sequential integers in application URLs that map to patient records qualify as medical record numbers for this purpose.
What to check
- Query your PostHog instance for distinct URL patterns in $current_url values across recent pageview events
- Identify any path segments that contain numeric IDs or identifying strings
- Determine whether those IDs map to patient or user records
- Check query parameters for email addresses, names, or any PII passed in the URL
How to address it
PostHog supports URL sanitisation via the sanitize_properties configuration option in the JavaScript SDK, allowing you to modify or strip specific properties before they are sent. The better long-term fix is URL architecture: using opaque identifiers (UUIDs with no external meaning) rather than sequential integers in application URLs, so even captured URLs carry no identifying information.
4. Authentication tokens in the analytics pipeline
JSON Web Tokens (JWTs) and session tokens appearing in PostHog event data are a distinct exposure category — different in kind from a patient ID in a URL, and often missed in standard privacy reviews because they don’t look like PII on the surface.
The risk is twofold. First, a JWT typically carries a signed payload containing user identity claims: user ID, email, role, organisation, and sometimes additional profile data. If a JWT appears in PostHog event properties, that payload is effectively exposing user identity through your analytics pipeline. Second, a valid authentication token in an analytics system is a credential exfiltration risk — not just a de-identification failure. Anyone with read access to PostHog data could, in principle, extract tokens that were valid at the time of capture.
Where tokens appear in PostHog events
- In $current_url: URLs with ?token=eyJ... or /auth/callback?code=... are captured on every pageview
- In $referrer: If a token-bearing URL is the referrer for the next pageview, it appears in the referrer property
- In custom event properties: Backend SDK calls that include a token in the event payload — often by passing a full request object
- In $set person properties: Auth context passed to posthog.identify() that inadvertently includes a token field
Tokens in URLs are the most common source. Many authentication flows pass tokens as query parameters during the OAuth callback, magic link confirmation, or password reset steps. If PostHog is initialised before those redirects complete, the token-bearing URL is captured as a pageview.
What to check
- Query PostHog for $current_url values containing token=, code=, eyJ (JWT prefix), or access_token=
- Check $referrer values for the same patterns — a token in a redirected-from URL appears here
- Search your backend PostHog SDK calls for any properties derived from request headers, authorization context, or session objects
- Inspect posthog.identify() calls for any property that looks like a token or bearer credential
How to address it
The primary fix is URL sanitisation: use the sanitize_properties config option to strip or redact token-bearing query parameters before PostHog captures the URL. For OAuth flows, ensure PostHog does not initialise until after the token exchange is complete and the URL has been cleaned. For backend SDK calls, enforce an explicit event schema that does not include any field derived from auth context. Never pass request objects or session objects directly as event properties.
Need a PostHog compliance audit?
We run structured reviews of analytics SDK configurations, event property audits, and HIPAA Safe Harbor assessments for healthcare SaaS. If you’re not certain what your PostHog instance is capturing, a 15-minute conversation is usually enough to scope the work.
Start a Conversation →5. Session recording and heatmaps
Session recording captures a replay of a user’s interactions: DOM state, mouse movements, scroll position, and keystrokes. It is PostHog’s most powerful UX research tool and its highest-risk feature from a PII perspective.
Session recording on a form page that processes personal data captures everything the user types — names, dates of birth, addresses, health responses, card numbers. If the recording is from a patient-facing surface, that content is PHI. PostHog does not automatically mask form fields; masking must be explicitly configured.
Heatmaps aggregate click and scroll patterns at the page level. They do not typically capture form field content as text, but the heatmap infrastructure processes the page DOM during data collection. On pages with PHI content, this warrants the same scrutiny as session recording.
Configuration flags to check
- disable_session_recording — must be true on any surface processing PHI or sensitive PII
- maskAllInputs — set to true as a baseline if session recording runs on any authenticated surface
- maskInputOptions — for surgical masking of specific input types (passwords, email, etc.)
- blockSelector — use to block entire DOM subtrees (e.g., a patient info panel) from recording
- enable_heatmaps — review which pages this is active on
What to check
- Is session recording enabled (disable_session_recording: false)? On which pages?
- Have you reviewed session recordings from any authenticated or form-heavy pages to verify what is actually captured?
- Is maskAllInputs set globally, or are individual inputs explicitly masked?
- Are heatmaps enabled on pages where patient or user data is displayed?
How to address it
For any surface that processes PHI or sensitive PII, disable session recording using posthog.stopSessionRecording() called conditionally based on the current path, or set disable_session_recording: true in the init config and enable it selectively only on surfaces you have reviewed. If session recording runs on authenticated surfaces, set maskAllInputs: true as a baseline minimum and add blockSelector rules for any component that displays patient or user records.
6. Console log capture
Console log capture is a PostHog feature that records JavaScript console.log, console.warn, and console.error output and attaches it to session recordings. It is enabled via the capture_console_log option and appears as log entries within the session recording timeline in PostHog.
This matters because many applications log user-context data to the console during development — and those console statements are never removed before production. Common patterns:
- Debug logging that prints user objects: console.log('User loaded:', user) — where user contains email, name, and ID
- Error logging that includes request context: API errors that echo back user IDs, email addresses, or patient identifiers from the failed request
- State management logging (Redux DevTools, Zustand debug mode) that prints full application state including any loaded user or patient data
- Third-party library logs that include session tokens, API keys, or auth context
The risk is easy to underestimate because console log content is not reviewed during a standard analytics audit — it requires inspecting session recordings, not event properties. In practice, console logs can carry highly sensitive data that was never intended to be visible outside of local development.
What to check
- Is capture_console_log: true (or capture_console_log_opt_in: true) in your PostHog init config?
- Open a session recording for an authenticated user and review the console log panel. What is actually being logged?
- Search your codebase for console.log calls that include user objects, API responses, or any variable that could contain PII
How to address it
Disable console log capture by setting capture_console_log: false in the PostHog init config. This is generally the right default for any production environment that processes personal data. If console logs are genuinely needed for debugging in production, consider a structured logging approach that sanitises PII before output — but this is a broader application hygiene question that extends beyond PostHog.
7. Person properties and $identify calls
When you call posthog.identify() in the JavaScript SDK, you are setting a distinct ID for the current user and optionally setting person properties. Person properties persist on the PostHog person profile indefinitely — they are not event-level data, they are durable user-level attributes.
If a $set call includes a user’s name, email address, phone number, or any other PII, that data lives on the person profile in PostHog until it is explicitly deleted. This is separate from the event data — and separate from what gets cleaned up when you archive or delete events.
Common PII in person properties
- email — set directly from authentication context
- name or $name — full name from user profile
- phone — from account settings or signup flow
- company — organisation name that could identify a small-practice user
- Custom properties from staff login events (e.g., staff_email, practitioner_name)
For GDPR, person profiles containing name and email constitute personal data requiring a lawful basis for processing, subject access request obligations, and deletion on request. PostHog provides a person deletion endpoint, but the obligation to honour it is yours.
For HIPAA, person properties linking staff identity to patient interaction events can create an association that constitutes PHI — particularly if the person profile also contains event history from patient-facing surfaces.
What to check
- Search your codebase for all calls to posthog.identify() and posthog.people.set()
- List every property being set in those calls
- Determine whether any of those properties contain PII directly, or a key that maps to PII in your database
- Review whether staff identity is being set on the same person profile as events from patient-facing surfaces
How to address it
Replace direct PII identifiers with internal hashed or opaque IDs. Instead of identifying users by email, use a non-reversible hash of the user ID so the PostHog person profile cannot be matched to a real person without access to your internal mapping table. Do not set name, email, or phone as person properties unless you have a specific product reason and have assessed the data protection implications.
8. Backend and server-side events
PostHog provides server-side SDKs (Python, Node.js, Ruby, Go, PHP, and others) for capturing events from your backend. Backend events are often overlooked in privacy audits because they don’t go through the browser SDK and are harder to discover by inspecting network traffic in DevTools.
Common patterns that introduce PII via backend events:
- User lifecycle events (user_created, account_activated) that pass the raw user object as event properties
- Form submission or processing events that carry the submission payload
- Webhook processing events that echo back the incoming payload (which may contain customer data)
- Email or notification events that include recipient addresses as properties
- Billing events that include customer name, billing address, or last-four card digits
The most common pattern: an engineer adds a contact_created event and passes the contact record object directly as properties. The object contains every field in the database row — including name, email, phone, address, and any identifiers. It’s the path of least resistance in the moment and creates a compliance problem that is easy to miss.
What to check
- Search your backend codebases for all PostHog SDK calls
- For each server-side event, list the properties being passed
- Check whether any property is an object or array that may expand to contain more fields than intended
- Review whether user or contact IDs are direct identifiers (e.g., sequential integers, email addresses) or opaque internal references
How to address it
Define an explicit event schema for each backend event and enforce it at the call site — pass only the properties in the schema, not the full data object. Replace direct identifiers (email, name) with hashed or opaque internal IDs. If you need to correlate backend events to frontend events, use a consistent internal ID rather than a PII-containing identifier as the distinct_id.
9. The distinct_id: your universal event identifier
Every event in PostHog carries a distinct_id — the identifier that links events to a person profile. It is set when you call posthog.identify(), and it persists across sessions until reset.
This is the single highest-leverage exposure point in the entire post, because it affects every event in your PostHog project — not just specific event types or surfaces. If your distinct_id is a PII-containing value, then 100% of your PostHog data has PII attached to it by definition.
Common distinct_id patterns that introduce PII:
- Email address — the most common mistake. PostHog even uses email as the default suggested distinct_id in some documentation examples. Any email used as a distinct_id links identity to every event, feature flag call, and session recording in your project.
- Username or display name — directly identifying and often searchable
- Sequential integer database ID — not PII on its own, but if it maps directly to a patient or user record and appears alongside demographic event data, it functions as an identifier under re-identification risk analysis
- Phone number — occasionally used as a user identifier in consumer apps, carries the same risk as email
The problem compounds because the distinct_id also appears in feature flag evaluation events (auto-emitted), group events, session recording metadata, and all backend SDK events. A PII-containing distinct_id that is set in the browser propagates to every part of your PostHog data.
What to check
- What value are you passing to posthog.identify()? Search your codebase for every call.
- Inspect person profiles in PostHog — the ID shown at the top of each person is the distinct_id. Are those recognisable as email addresses, names, or phone numbers?
- Is the same distinct_id used consistently across frontend and backend SDK calls?
- If you have anonymous pre-identification events (before identify() is called), what happens when those events are merged into the identified profile? Does the merged profile now carry PII?
How to address it
Use an internal opaque ID — a non-sequential UUID or a one-way hash of your internal user ID — as the PostHog distinct_id. This ID should be meaningless outside your own system: it cannot be reverse-engineered to an email address or matched to a patient record without access to your internal mapping table. Never use email, username, or phone as the distinct_id. If your current setup uses PII as the distinct_id, migrating requires aliasing the old ID to a new opaque ID via posthog.alias() — which preserves event history while replacing the identifier going forward.
10. Feature flags and group analytics
Feature flag evaluation events ($feature_flag_called) are emitted automatically when PostHog evaluates a flag for a user. These events inherit whatever distinct_id is in use — which is why the distinct_id section above is the primary fix for feature flag privacy exposure. There is no separate flag-specific remediation if the underlying identifier is already opaque.
Group analytics allows you to associate events with a group such as an organisation, account, or practice. Group properties persist on the group profile and can include any data you set — company name, plan tier, contact name, billing email. If your groups are small enough that a group is individually identifiable (a solo-practitioner medical practice, for example), the group profile itself constitutes an identifiable entity.
What to check
- Is your distinct_id opaque? (Covered above — but feature flag events make this more urgent since they fire automatically)
- What properties are set on group profiles? List them and assess whether any are PII
- Are any groups small enough that the group name or ID could identify a specific individual?
- Is a contact email, owner name, or billing address stored as a group property?
How to address it
Apply the same opaque ID principle to group identifiers — use an internal group ID rather than a company name or email domain as the group key. For group properties, only set what is necessary for product analytics (plan, account type, created date) and avoid setting contact-level details that constitute PII.
The audit checklist
Use this as a starting point for an internal review. It is not exhaustive — PostHog’s privacy documentation and your specific application architecture will surface additional items.
| Area | What to check | Risk if unchecked |
|---|---|---|
| distinct_id | What value is passed to identify()? Is it opaque or PII-containing? | High — affects every event in the project |
| Autocapture | Which surfaces is it running on? Sample $elements on events from those surfaces | High on clinical/form surfaces |
| Raw IP ($ip) | Is $ip present on events from EU users? Do you have a lawful basis? | High for GDPR |
| GeoIP enrichment | Check $geoip_* properties on events from covered surfaces | High for HIPAA; Medium for GDPR |
| URL record IDs & PII | Query distinct $current_url values; check for numeric IDs and emails in paths | High if patient record IDs present |
| Auth tokens in analytics | Query $current_url and $referrer for token=, eyJ, code= patterns | High — credential + identity exposure |
| Session recording | Is it enabled? Which pages? Is maskAllInputs on? | High on form pages without masking |
| Console log capture | Check capture_console_log; review session recordings for what is actually logged | Medium–High — depends on log content |
| Heatmaps | Check enable_heatmaps; review whether it runs on sensitive pages | Medium |
| Person properties | Audit all identify() and $set calls; list every property set | High — persists indefinitely on person profile |
| Backend events | Grep backend codebase for PostHog SDK calls; review all properties passed | High — bypasses front-end review |
| Feature flags | Auto-emitted events inherit distinct_id — fix that first | Resolved by opaque distinct_id |
| Group analytics | List all group properties; assess whether groups are individually identifiable | Medium — depends on group size and properties |
A note on BAAs, hosting, and the difference they make
PostHog offers a HIPAA Business Associate Agreement (BAA) on paid plans without requiring an enterprise contract — which distinguishes it from Mixpanel and Amplitude, where BAA access typically requires enterprise pricing. However, a BAA does not substitute for a compliant configuration. It establishes shared responsibility; it does not prevent PHI from entering the analytics pipeline in the first place.
PostHog also supports self-hosting via Docker or Kubernetes, which means all data stays within your own infrastructure. For organisations with strict data residency requirements, self-hosting removes the question of data processed by a third-party service entirely. The tradeoff is infrastructure maintenance overhead.
Whether you are on PostHog Cloud or self-hosted, the configuration audit described above applies equally. Self-hosting does not eliminate the compliance risk of misconfigured autocapture, URL tracking, or session recording — it changes where the data goes, not what gets collected.
For current BAA terms, pricing, and self-hosting documentation, refer to PostHog’s official HIPAA compliance and privacy pages directly. These terms change and the linked documentation will be more current than anything cited here.
The harder question: what about data already collected?
Fixing your PostHog configuration stops new PHI from being ingested. It does not remove PHI from events already in your PostHog project.
PostHog provides several mechanisms for historical data remediation:
- Person deletion: PostHog’s API supports deleting a person profile and all associated events. For GDPR right-to-erasure requests, this is the primary path
- Event property filtering: The PostHog property filter app can retroactively drop specified properties from all events. This works at ingestion for new data and can be applied to define which properties appear in queries, but it does not guarantee removal from raw storage
- Project reset: If the exposure is pervasive and historical data is not required, resetting the PostHog project removes all data. This is a significant step and requires careful planning
For HIPAA, the presence of PHI in historical analytics data may require notification obligations under the Breach Notification Rule (45 CFR § 164.400–414) depending on the nature of the disclosure. This is a legal assessment, not a technical one. Engage your privacy counsel before determining whether historical data remediation is sufficient or whether breach notification obligations apply.
PostHog’s own resources
PostHog maintains detailed documentation on privacy configuration that goes well beyond what is covered here. The checklist in this article reflects common patterns found in audits — PostHog’s documentation reflects the full range of configuration options, which change across SDK versions.
Key areas to review in PostHog’s official documentation:
- Autocapture configuration and the ph-no-capture CSS class
- Session recording privacy controls and input masking options
- HIPAA compliance guide and BAA information
- GDPR compliance and data subject request handling
- Self-hosting options and data residency
- The PostHog property filter app for retroactive property removal
The PostHog team is generally responsive on these topics and has published detailed guides on HIPAA-compliant configurations. Use those guides alongside this checklist, not instead of it.
Analytics audit
Find out what your PostHog setup is actually capturing — and what it's missing.
PII exposure is one layer of a larger instrumentation problem. The Analytics Audit maps every gap in your tracking plan — data quality, missing events, misattributed properties — and delivers a prioritised fix roadmap.
Join the Analytics Audit cohort →