TL;DR
- The PostHog HogQL Data Warehouse connects Stripe and Postgres as separate data sources and lets you write JOIN queries across them using a single SQL interface
- Stripe sync brings 8 core tables into PostHog including customers, subscriptions, invoices, and events with automatic daily refresh from the Stripe API
- Postgres sync imports your custom tables such as user profiles, feature usage logs, and activation records that you already store in your application database
- HogQL JOIN queries use person.properties or persons table to match Stripe customer email against PostHog person data for cross-table analysis
- You can build actionable reports for LTV by feature usage, churn risk by plan type, and expansion revenue segmented by activation status
- Common pitfalls include schema mismatches on join keys, timezone drift between Stripe UTC and your Postgres timestamps, and duplicate Stripe webhook events
The Problem: Revenue and Behavior Live in Separate Silos
You have Stripe dashboards showing $47,300 in monthly recurring revenue. You have Postgres tables tracking every feature your users touched this week. You do not have a single view that connects these two datasets.
This gap costs you answers to questions like which features drive plan upgrades, whether free trial users who activate a specific workflow convert at higher rates, and which plan tier has the highest expansion revenue over 90 days. Your revenue team looks at Stripe. Your product team looks at Postgres. Nobody looks at the intersection.
The traditional fix involves building a data pipeline with Airbyte or Fivetran, loading both sources into Snowflake or BigQuery, then writing dbt models to normalize the schemas. That setup takes weeks to build and requires ongoing maintenance when any source changes its schema.
PostHog's HogQL Data Warehouse offers a different approach. You connect Stripe and Postgres as linked data sources inside PostHog itself. Then you write standard SQL JOIN queries across them. No external data warehouse. No dbt. No pipeline orchestration. The queries run directly against the synced tables inside PostHog.
This article walks through the entire setup process using real HogQL examples from work we did at HackingHR, an HR platform that needed to connect their Stripe subscription data with Postgres user behavior to understand which product features drove revenue growth.
"Most SaaS teams sit on revenue and behavior data that never meet. The join is where the real insights live."
— Jake McMahon, ProductQuant
The sections that follow walk through each step of the setup process, from connecting your data sources to writing production-ready JOIN queries.
What You Need Before You Start
Before you write any queries, you need a working PostHog instance with the Data Warehouse feature enabled and at least 2 data sources connected. Here is what that looks like.
PostHog instance with Data Warehouse access
The Data Warehouse feature ships with every PostHog cloud and self-hosted deployment starting from version v1.120. You will find it under the "Data Warehouse" section in your PostHog settings panel. If you are on a self-hosted deployment, verify that the external data service is running by checking your docker compose or Helm chart configuration.
You need either cloud PostHog or a self-hosted instance with the data warehouse service enabled. This service handles the sync jobs that pull data from Stripe and Postgres on a scheduled basis.
Stripe API credentials
You will need a Stripe API key with read access to the objects you want to sync. The sk_live_ or sk_test_ key works. PostHog's Stripe connector pulls from the Stripe API using your secret key and syncs 8 core tables into the Data Warehouse.
The recommended approach is to create a dedicated restricted key in Stripe Dashboard under Developers → API keys → Restricted keys. Grant read-only access to customers, subscriptions, invoices, and events. This limits blast radius if the key leaks and keeps permissions scoped to what PostHog actually reads.
Postgres connection string
PostHog connects to Postgres using a standard connection URI in the format postgresql://user:password@host:port/database. You will need credentials for a read-only database user. Never use your application's primary database user for external syncs.
Create a dedicated read-only role in Postgres with SELECT access on the specific tables you want to sync. This gives PostHog the data it needs without write access to your production database.
Tables you should sync from Postgres
Not every Postgres table is useful for this analysis. Focus on the tables that carry user identity, behavior signals, and activation markers. At HackingHR, we synced 5 tables from their Postgres database.
- user_profiles contains email, signup date, company name, and role for every registered account in the system
- feature_usage_logs records each time a user accesses a specific feature with timestamps and feature identifiers
- activation_events stores the timestamp when users hit their activation milestone for the first time
- support_tickets tracks inbound support requests with categories and resolution status for churn correlation
- team_memberships maps users to their teams and organizations for multi-seat revenue analysis
These tables map directly to the revenue questions you want to answer. If you lack equivalent tables, start by syncing whatever user identity and event tables you already have.
Step 1: Connect Stripe to PostHog Data Warehouse
The Stripe connection is the faster of the 2 setup tasks. PostHog provides a built-in Stripe connector that handles authentication, schema mapping, and scheduled sync automatically.
Configure the Stripe source
Navigate to Data Warehouse → Sources in your PostHog settings and select Stripe from the connector list. Paste your Stripe secret key into the configuration form. PostHog will validate the key by making a test API call to the Stripe customers endpoint.
Once validated, you will see a list of available Stripe objects to sync. Select the ones relevant to your revenue analysis. The 8 core tables you get are customers, subscriptions, invoices, payments, events, products, prices, and balance_transactions. For most analyses, you need 4 of these.
- stripe_customers maps Stripe customer IDs to email addresses and company names for join key matching
- stripe_subscriptions holds plan IDs, status, current period dates, and recurring amounts per subscription
- stripe_invoices contains invoice amounts, due dates, payment status, and linked subscription identifiers
- stripe_events logs every Stripe webhook event including subscription created, updated, and deleted events
Configure sync frequency
The default sync interval is every 6 hours. For revenue reporting that needs near-real-time accuracy, you can reduce this to every 1 hour. Note that more frequent syncs increase your Stripe API usage and PostHog storage load. For most teams running monthly or weekly revenue analysis, the 6-hour default works fine.
After the initial sync completes, you will see the Stripe tables appear in your Data Warehouse schema browser. The first sync for a large Stripe account with thousands of customers can take 10 to 15 minutes to complete.
You can verify the sync completed by running a quick count query in the HogQL editor. This confirms the tables populated correctly before moving to the next step.
Step 2: Connect Postgres to PostHog Data Warehouse
The Postgres connection requires a bit more setup because you are importing custom tables from your own database rather than using a pre-built connector like Stripe.
Create a read-only Postgres user
Run these commands in your Postgres database before configuring the connection in PostHog. You need a dedicated user with SELECT access on only the tables you plan to sync.
CREATE ROLE posthog_sync LOGIN PASSWORD 'your_secure_password';
GRANT CONNECT ON DATABASE your_app_db TO posthog_sync;
GRANT USAGE ON SCHEMA public TO posthog_sync;
GRANT SELECT ON user_profiles, feature_usage_logs, activation_events, support_tickets, team_memberships TO posthog_sync;
This approach follows the principle of least privilege. The posthog_sync role can read 5 specific tables and nothing else. It cannot modify data, create tables, or access schemas you have not explicitly granted.
Configure the Postgres source in PostHog
Go to Data Warehouse → Sources and select Postgres. Enter your connection details using the connection URI format. PostHog will test the connection and list available tables from your database schema.
Select the 5 tables you created read-only access for. PostHog will import the schema including column names and data types. Review the schema mapping before confirming the sync, because PostHog needs to know which columns are timestamps, which are text fields, and which are numeric values for query performance.
Verify the Postgres sync
Once the sync completes, run a verification query to confirm your Postgres tables populated correctly. This query checks that your user_profiles table loaded with expected data.
SELECT count(*) as total_users FROM user_profiles;
Compare the count against your Postgres database directly. If the numbers match within a small delta, the sync worked correctly. If the counts are wildly different, check that your Postgres user has proper SELECT grants on all target tables and that no firewall rules are blocking PostHog's IP range.
Step 3: Write HogQL JOIN Queries Between Stripe and Postgres Tables
This is the core of the tutorial. Now that both data sources are synced, you can write JOIN queries that combine Stripe revenue data with Postgres user behavior data in a single SQL statement.
Join Stripe subscriptions with PostHog person data
The most fundamental join connects Stripe subscription records to PostHog person profiles using email as the shared key. This query gives you a unified view of who pays what and how active they are.
-- Join Stripe subscriptions with PostHog person data
SELECT
p.properties['$email'] as email,
s.plan_id,
s.amount,
s.status,
count(e.event) as total_events
FROM persons p
JOIN stripe_subscriptions s ON p.properties['$email'] = s.customer_email
LEFT JOIN events e ON e.person_id = p.id
GROUP BY 1, 2, 3, 4
ORDER BY 5 DESC
This query returns every person with an active Stripe subscription alongside their plan details and total event count in PostHog. The LEFT JOIN on events ensures you get results even for subscribers who have not yet triggered any tracked events. You can filter on s.status = 'active' to narrow to current paying customers only.
Monthly recurring revenue by plan tier
A simple aggregation that revenue teams need every reporting cycle. This query groups active Stripe subscriptions by plan and sums the monthly revenue.
-- MRR breakdown by Stripe plan
SELECT
s.plan_id,
count(*) as active_subscriptions,
sum(s.amount) / 100 as total_mrr_usd,
avg(s.amount) / 100 as avg_revenue_per_subscription
FROM stripe_subscriptions s
WHERE s.status = 'active'
GROUP BY 1
ORDER BY 3 DESC
Stripe stores amounts in cents, so the division by 100 converts to dollars. You can adapt this for other currencies by dividing by 100 and formatting the output column with a currency symbol.
LTV by feature usage cohort
This query answers one of the most valuable questions a SaaS team can ask: which features correlate with higher lifetime value. It joins Stripe subscription amounts with PostHog feature usage events.
-- LTV by feature usage cohort
SELECT
s.plan_id,
s.amount as monthly_revenue,
count(DISTINCT CASE WHEN e.event = 'feature_used' THEN e.person_id END) as active_users,
avg(s.amount) as avg_ltv
FROM stripe_subscriptions s
JOIN persons p ON s.customer_email = p.properties['$email']
LEFT JOIN events e ON e.person_id = p.id
WHERE e.event = 'feature_used'
GROUP BY 1, 2
The result shows each plan tier, its monthly revenue, the count of distinct users who triggered the feature_used event, and the average lifetime value. You can replace 'feature_used' with any specific event name from your PostHog tracking to isolate individual feature cohorts.
Churn risk by plan type and recent inactivity
This query identifies subscribers who have not triggered any events in the past 14 days, flagging them as potential churn risks before they actually cancel.
-- Churn risk: active subscribers with 14-day inactivity
SELECT
p.properties['$email'] as email,
s.plan_id,
s.amount / 100 as monthly_amount,
max(e.timestamp) as last_activity,
date_diff('day', max(e.timestamp), now()) as days_inactive
FROM stripe_subscriptions s
JOIN persons p ON s.customer_email = p.properties['$email']
LEFT JOIN events e ON e.person_id = p.id
WHERE s.status = 'active'
GROUP BY 1, 2, 3
HAVING days_inactive > 14
ORDER BY 5 DESC
This gives you a ranked list of at-risk accounts with their last known activity date. You can pipe this list into a customer success workflow or trigger automated re-engagement emails through PostHog's actions system.
| Query type | Stripe tables used | PostHog tables used | Business use case |
|---|---|---|---|
| Subscription-person join | stripe_subscriptions | persons, events | Unified customer view for account management |
| MRR by plan | stripe_subscriptions | None required | Monthly revenue reporting and forecasting |
| LTV by feature cohort | stripe_subscriptions | persons, events | Product-led revenue growth strategy |
| Churn risk detection | stripe_subscriptions | persons, events | Proactive customer success intervention |
Each of these queries runs in seconds on a typical PostHog deployment with thousands of synced records. For larger datasets, consider adding date range filters to limit the scan scope and improve query latency.
Step 4: Build Actionable Insights From Joined Data
Raw queries are a starting point. The real value comes from turning these queries into recurring reports that drive decisions. Here are 3 patterns we built for HackingHR that you can adapt to your own product.
Expansion revenue by activation status
HackingHR wanted to know whether users who completed their activation workflow generated more expansion revenue through seat additions and add-on purchases. The query joins Stripe subscription changes with Postgres activation records.
-- Expansion revenue segmented by activation
SELECT
CASE WHEN a.activated_at IS NOT NULL THEN 'activated' ELSE 'not_activated' END as activation_status,
count(DISTINCT s.customer_email) as unique_customers,
sum(CASE WHEN s.amount > 0 THEN s.amount ELSE 0 END) / 100 as total_expansion_revenue,
avg(s.amount) / 100 as avg_expansion_per_customer
FROM stripe_subscriptions s
LEFT JOIN activation_events a ON s.customer_email = a.user_email
WHERE s.status = 'active'
AND s.amount > 0
GROUP BY 1
This query revealed that activated customers at HackingHR generated 2.3x more expansion revenue than non-activated ones. That finding directly informed their onboarding redesign priorities for the next quarter. The activation event became the north star metric for the product team.
Feature adoption funnel by plan tier
This insight shows which plan tiers adopt specific features and where adoption drops off. It combines Postgres feature usage data with Stripe plan assignments to reveal upgrade-blocking friction points.
-- Feature adoption rate across plan tiers
SELECT
s.plan_id,
e.event as feature_name,
count(DISTINCT e.person_id) as users_who_used,
count(DISTINCT p.id) as total_users_on_plan,
count(DISTINCT e.person_id) * 100.0 / count(DISTINCT p.id) as adoption_pct
FROM stripe_subscriptions s
JOIN persons p ON s.customer_email = p.properties['$email']
LEFT JOIN events e ON e.person_id = p.id
AND e.event IN ('report_generated', 'export_used', 'team_invited')
WHERE s.status = 'active'
GROUP BY 1, 2
ORDER BY 1, 5 DESC
The output shows adoption percentage for each feature within each plan tier. Low adoption on higher-tier plans signals a gap between what you charge and what users actually discover in the product. That gap is your product-led upgrade opportunity.
Churn correlation with support ticket volume
HackingHR suspected that support ticket volume correlated with subscription cancellations. The query joins Stripe cancellation events with Postgres support ticket counts to validate or reject this hypothesis.
-- Cancellation rate by support ticket volume
SELECT
CASE
WHEN ticket_count.ticket_ct = 0 THEN '0_tickets'
WHEN ticket_count.ticket_ct BETWEEN 1 AND 2 THEN '1-2_tickets'
WHEN ticket_count.ticket_ct BETWEEN 3 AND 5 THEN '3-5_tickets'
ELSE '6+_tickets'
END as ticket_bucket,
count(DISTINCT s.customer_email) as total_customers,
count(DISTINCT CASE WHEN s.status = 'canceled' THEN s.customer_email END) as canceled_customers,
count(DISTINCT CASE WHEN s.status = 'canceled' THEN s.customer_email END) * 100.0 / count(DISTINCT s.customer_email) as churn_rate
FROM stripe_subscriptions s
LEFT JOIN (
SELECT user_email, count(*) as ticket_ct
FROM support_tickets
GROUP BY 1
) ticket_count ON s.customer_email = ticket_count.user_email
GROUP BY 1
ORDER BY 4 DESC
The results at HackingHR showed a clear inflection point: customers with 3 or more support tickets had a churn rate 4x higher than those with 0 to 2 tickets. This finding justified hiring an additional support team member dedicated to their highest-volume accounts.
Activated users at HackingHR generated 2.3 times more expansion revenue than users who skipped the activation workflow. The activation event became the single highest-impact metric for their product team.
Churn prediction analysis for your product
We build custom churn prediction models using your PostHog event data and Stripe records. $4,500 flat fee, 2-week delivery, includes the exact JOIN queries shown in this article applied to your data.
Step 5: Create Saved Insights and Dashboards
Running queries in the HogQL editor works for exploration. For recurring reporting, you need saved insights that update automatically and dashboards that stakeholders can view without writing SQL.
Save individual queries as insights
After you write and validate a HogQL query in the Data Warehouse editor, click "Save as insight" to persist it as a reusable report. Give each insight a descriptive name that reflects the business question, not the query structure. Use "Monthly Recurring Revenue by Plan" instead of "stripe_subs_group_by_plan."
Set the insight visualization to match the data shape. Use bar charts for plan comparisons, line charts for time-series trends, and tables for detailed account-level lists. The right visualization saves stakeholders from misreading the numbers.
Build a revenue dashboard
Create a dashboard in PostHog that groups your saved Stripe-related insights into a single view. HackingHR's dashboard had 6 tiles covering the core metrics their leadership team reviewed weekly.
- MRR trend line shows monthly recurring revenue over the past 12 months using Stripe invoice data aggregated by month
- Plan distribution bar chart displays active subscription counts per plan tier for quick capacity planning
- LTV by feature cohort table lists lifetime value for users who used each tracked feature to guide product investment
- Churn risk list surfaces inactive subscribers ranked by days since last activity for customer success outreach
- Expansion revenue by activation compares revenue growth between activated and non-activated customer segments
- Support ticket correlation shows churn rate by ticket volume to identify the support capacity threshold
Set the dashboard to refresh on the same schedule as your Stripe sync so the numbers stay current. Share it with stakeholders via the PostHog dashboard sharing link rather than exporting to spreadsheets that go stale between manual updates.
Common Pitfalls and How to Avoid Them
The setup works smoothly until it does not. Here are the 5 issues we hit at HackingHR and how to prevent or fix each one.
Schema mismatches on join keys
The most common join failure happens when the email field in Stripe does not exactly match the email field in your Postgres tables. A single whitespace character, different casing, or a stale email in one system breaks the join entirely.
Fix this by normalizing email addresses on both sides before joining. Use lower(trim(s.customer_email)) = lower(trim(p.properties['$email'])) in your JOIN clause to handle casing and whitespace differences. If you have users who changed emails, you need a secondary join key on a stable identifier like a user UUID that persists across email changes.
Timezone drift between data sources
Stripe timestamps everything in UTC. Your Postgres database may store timestamps in a local timezone like America/New_York. When you filter queries by date ranges, this drift causes records to appear in the wrong month or get excluded entirely.
Normalize all timestamps to UTC in your queries. Use toTimeZone(column, 'UTC') in HogQL to convert Postgres timestamps to UTC before comparing them against Stripe timestamps. This ensures your date range filters capture the correct records from both sources.
Duplicate Stripe webhook events
Stripe webhooks can deliver the same event twice due to network retries. If your Postgres tables log webhook deliveries, you may end up with duplicate rows that inflate counts and sums in your queries.
Deduplicate by selecting only the latest event per Stripe event ID. Use a window function or a subquery with ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY received_at DESC) to pick the most recent delivery of each event. Apply this deduplication in a CTE before joining to Stripe tables.
Postgres table sync lag
The Postgres sync runs on a schedule, typically every 6 hours. If you write a query right after a major user behavior change in your Postgres tables, the Data Warehouse may still hold stale data from the previous sync cycle.
Check the sync status in the PostHog Data Warehouse UI before running time-sensitive queries. The sync page shows the last completed sync timestamp for each table. If you need fresher data, trigger a manual sync from the same interface before running your query.
Missing columns from schema changes
When you add columns to your Postgres tables, the PostHog sync does not automatically pick them up on the next run. You need to manually update the source configuration to include the new columns in the sync schema.
After any schema change in Postgres, go to the Postgres source in PostHog's Data Warehouse settings and refresh the schema. Select the new columns you want to sync before the next scheduled run. Until you do this, queries referencing the new columns will return null values.
Real Example: HackingHR Revenue Intelligence Setup
HackingHR is an HR platform that needed to answer a specific question from their board: which features in our product drive subscription upgrades and expansions. They had Stripe subscription data and Postgres user behavior logs but no way to connect them.
We set up the PostHog Data Warehouse with their Stripe account and their primary Postgres database. The sync pulled 8 Stripe tables and 5 Postgres tables into PostHog's query engine. Then we wrote the JOIN queries outlined in this article to build their first unified revenue dashboard.
The first insight from the joined data showed that teams who used the "report_generated" event within their first 7 days had a 38% higher expansion revenue rate over the following 90 days. This finding shifted HackingHR's onboarding focus toward getting new users to generate their first report as quickly as possible.
The second finding came from the churn risk query. HackingHR discovered that 22% of their canceled subscriptions belonged to accounts where no team member had logged in for more than 21 days. This led to a re-engagement email flow targeting inactive team members before they reached the 21-day threshold.
The complete setup took 3 hours from initial source configuration to a working dashboard with 6 revenue insights. The alternative, building a full data pipeline to Snowflake with dbt models, would have taken 3 to 4 weeks of engineering time at an estimated cost of $15,000 to $25,000 in engineering salary plus infrastructure costs.
"The value is not in having the data. It is in being able to join revenue to behavior without waiting 3 weeks for a data engineering team."
— Jake McMahon, ProductQuant
If you want to see how this approach applies to your own retention and revenue data, the churn prediction with PostHog events guide covers a related workflow. For a broader look at how behavioral data informs growth strategy, read our growth operating system for B2B SaaS. And if you are evaluating whether to build or buy churn prediction, the Python churn prediction walkthrough compares both approaches. For a direct consultation on your data setup, reach out at ProductQuant.
FAQ
Does the PostHog Data Warehouse replace Snowflake or BigQuery?
No. The Data Warehouse is designed for small to mid-size SaaS teams that need to join 2 to 5 data sources for revenue and product analysis. If you have 10 plus data sources, complex dbt transformations, or need to serve data to 50 plus stakeholders across different tools, a dedicated warehouse like Snowflake makes more sense. The PostHog Data Warehouse is the fast path for teams that want answers this week, not next quarter.
Can I join more than 2 data sources in a single HogQL query?
Yes. HogQL supports standard SQL JOIN syntax, so you can chain multiple JOINs across as many synced tables as you need. A single query can join stripe_subscriptions to persons, join persons to feature_usage_logs, and join feature_usage_logs to activation_events. The practical limit depends on query performance. Queries joining 5 plus tables on large datasets may run slower, so test with date range filters to keep the scan scope manageable.
How often does Stripe data sync into PostHog?
The default sync interval is every 6 hours. You can configure this to run more frequently, down to every 1 hour, in the Data Warehouse source settings. For most revenue analysis workflows, 6-hour sync is sufficient because MRR and LTV metrics do not change meaningfully within a single day. If you need intraday accuracy for a specific use case like monitoring a pricing change launch, you can trigger a manual sync on demand.
What happens if my Postgres password rotates?
When your Postgres password changes, the sync will fail on its next scheduled run. You will see an authentication error in the Data Warehouse source status page. To fix it, update the connection credentials in the Postgres source configuration with the new password and trigger a manual sync to resume normal operation. The synced table data remains intact during this process.
Do I need to know SQL to use the Data Warehouse?
Yes, writing JOIN queries requires working knowledge of SQL. You need to understand SELECT, FROM, JOIN, WHERE, GROUP BY, and HAVING clauses at minimum. The HogQL editor provides a schema browser so you can see available tables and columns, but it does not generate queries for you. If your team lacks SQL skills, start with the 4 query templates in this article and adapt them to your table names and column structure before writing custom queries from scratch.
Sources
Get Your Revenue and Behavior Data Connected
We will set up your PostHog Data Warehouse, write the JOIN queries, and build your first revenue dashboard in 2 weeks. No pipeline engineering required.