TL;DR

  • HogQL is PostHog's SQL dialect for querying raw event data. It lets you write SQL queries against events, persons, and groups tables — without exporting to a warehouse.
  • HogQL is included on all plans, including free. There is no additional cost. The ClickHouse-powered engine handles queries that would require subqueries or window functions in standard SQL.
  • Use HogQL when the question requires a calculation the UI cannot do. Median time to activation by plan, feature correlations with 90-day retention, revenue impact of funnel drop-off — these require SQL, not three stitched-together insights.
  • Time filtering in your WHERE clause is the single highest-leverage performance habit. PostHog's columnar storage scans only the columns you request; filtering by time first dramatically reduces the data scanned.
  • The three tables you query most are events, persons, and groups. Events are your tracked actions. Persons are your users. Groups are your organizational entities (companies, teams, workspaces).

The Gap Between UI Answers and Real Decisions

PostHog SQL (HogQL): How to Query Your Product Data Like a Data Analyst
Key insights on PostHog SQL (HogQL): How to Query Your Product Data Like a Data Analyst.

The PostHog UI is built for standard analytics questions. Trends, funnels, retention curves, path analysis — the visual interface handles these well. You point, you click, you see a chart.

Then your CEO asks a question that requires the chart to do something the UI cannot.

"What's the median time to activation by plan type, excluding users who hit a billing error?" You cannot answer this in the UI. You would need to run three separate insights and mentally stitch them together — if you remember what you saw in each one.

"Which feature, when used in the first 14 days, has the strongest correlation with 90-day retention?" The UI shows retention curves. It does not show feature-to-retention correlation across your entire event schema.

"What's the revenue impact of users who dropped off at step 3 of onboarding versus step 1?" The funnel shows you where users drop off. It does not show you revenue by drop-off step.

The PostHog UI answers questions that fit its data model. HogQL answers questions that fit your business model.

This is the gap HogQL fills. It is not a replacement for the UI — it is an escape hatch for the questions that do not fit the standard templates. The engineers who built PostHog knew this gap existed. They introduced HogQL to give users direct SQL access without requiring a warehouse export.

The pattern across engineering-led product teams is consistent. After six months of PostHog, someone on the team — analyst, PM, or engineer — learns enough SQL to ask the questions the UI cannot. HogQL is what makes that possible without leaving PostHog.

The Data Model: Three Tables and One Gotcha

Before writing a single query, you need to understand the data model. PostHog stores your product analytics in three tables. The structure is straightforward — but there is one gotcha that trips up every analyst who comes from a relational database background.

The Events Table

Every tracked action is an event. When a user clicks a button, submits a form, or completes a workflow — that is an event. This is the table you will query most often.

Column Type What It Contains
event string The event name: signup_completed, feature_first_used
timestamp datetime When the event occurred
distinct_id string The user who triggered the event
person_id uuid Links to the persons table
properties JSON Event-level properties: {'plan': 'pro', 'source': 'organic'}

The gotcha: properties is a JSON object, not individual columns. You must access it with bracket notation — properties['plan'] — not dot notation. Writing properties.plan will not error, but it will return null for every row. This is the most common source of errors for new HogQL users.

The insight: Always verify your property access by running a SELECT properties['your_property'] before building a complex query around it. Null results in the first 30 seconds save hours of debugging later.

The Persons Table

Every tracked user has a row in the persons table. This table is updated whenever posthog.identify() is called — which means it contains properties that may not be present for users who never reached an identification call.

Column Type What It Contains
id uuid The person identifier
properties JSON User properties: {'email': '[email protected]', 'plan': 'pro'}
created_at datetime When the person was first seen

The insight: Person properties are eventually consistent. If you set a property via an identify call and query one minute later, the property may not appear. Event properties are always available immediately. Build this latency into your data freshness expectations.

The Groups Table

If your product is team-based — SaaS where companies, not individuals, are the customer — the groups table tracks organizational entities.

Column Type What It Contains
group_type_index integer 0 = organization, 1 = team, 2 = project, etc.
group_key string Unique identifier for the group
group_properties JSON Group-level properties: {'industry': 'healthcare', 'size': 50}

For products with individual users (personal finance, consumer apps), the persons table is sufficient. For team-based products, group analytics — and the groups table — is where your revenue-level analytics live.

The insight: Group-level analytics requires group analytics to be enabled in your PostHog settings before data flows into the groups table. If you do not see data in the groups table, check your group analytics configuration before assuming your queries are wrong.

Free Resource

PostHog Setup Checklist for New Analysts

A one-page checklist covering event naming conventions, property structure, and the three queries to run on day one. Includes the setup queries from this article ready to copy-paste.

The Five Queries That Answer Most Product Questions

The following five patterns cover the queries that appear repeatedly across product analytics work. Once you know these patterns, you can answer the majority of questions that require SQL.

1. Event Volume Over Time

The simplest query. How many times did an event occur each day for the last 30 days?

SELECT
    toDate(timestamp) as day,
    count() as event_count
FROM events
WHERE event = 'signup_completed'
  AND timestamp > now() - INTERVAL 30 DAY
GROUP BY day
ORDER BY day

Add a HAVING clause to filter by minimum count. Add a WHERE on a property to segment by plan, source, or region. The pattern is the same — group by time, filter in the WHERE clause, aggregate with count().

The insight: Use toDate(timestamp) for grouping by day, not timestamp itself. Grouping by timestamp groups by second — which produces a row per event, not a row per day.

2. Activation Rate by Segment

What percentage of signups complete activation, broken down by plan? This is the most common first HogQL query. Teams install PostHog, see their overall activation rate, and immediately want to segment it.

SELECT
    properties['plan'] as plan,
    countIf(event = 'signup_completed') as signups,
    countIf(event = 'activation_completed') as activations,
    round(activations * 100.0 / signups, 1) as activation_rate
FROM events
WHERE event IN ('signup_completed', 'activation_completed')
  AND timestamp > now() - INTERVAL 30 DAY
GROUP BY plan
ORDER BY activation_rate DESC

countIf() is ClickHouse's conditional count function. It counts rows where the condition is true. This is more readable — and faster — than wrapping a SUM(CASE WHEN ... THEN 1 ELSE 0 END) in standard SQL.

The insight: When activation rates vary significantly by segment, the variance is a signal. A plan with 15% activation rate versus 45% for another plan has an onboarding problem specific to that segment — not an overall onboarding problem.

3. Median Time to Activation

What's the typical time between signup and activation? This requires calculating the time difference between two events for the same user, then taking the median.

SELECT
    median(dateDiff('minute',
        (SELECT timestamp FROM events e2
         WHERE e2.event = 'signup_completed'
           AND e2.distinct_id = e1.distinct_id
         ORDER BY timestamp LIMIT 1),
        e1.timestamp)) as median_minutes_to_activation
FROM events e1
WHERE e1.event = 'activation_completed'
  AND e1.timestamp > now() - INTERVAL 30 DAY

In standard SQL, calculating a median requires sorting all values and finding the middle one — typically requiring window functions and subqueries. In ClickHouse, the median() function handles this directly. For 10,000 activation events, this query runs in under 100 milliseconds.

The insight: The median hides the distribution. Run a second query using quantile(0.9) to see the 90th percentile. If the median is 30 minutes but the 90th percentile is 72 hours, your activation flow has a bimodal problem — most users activate quickly, but a segment takes days.

4. Churn Signals: Feature Disengagement

Which users used a core feature regularly but have not used it in the last 14 days? This query builds your at-risk account list.

SELECT
    distinct_id,
    person.properties['email'] as email,
    person.properties['plan'] as plan,
    max(timestamp) as last_usage,
    dateDiff('day', last_usage, now()) as days_since_usage
FROM events
WHERE event = 'core_feature_used'
GROUP BY distinct_id, person.properties['email'], person.properties['plan']
HAVING days_since_usage > 14
  AND plan = 'pro'
ORDER BY days_since_usage DESC

Export this as CSV. Send it to your customer success team every Monday. Accounts with 21+ days of disengagement on a core feature are your highest churn risk — not because they complained or churned, but because they simply stopped using the thing you built the product around.

The insight: Feature disengagement is one of the earliest churn signals. Users who stop using the core feature are significantly more likely to churn in the next 30 days than users who are active. Build this query into your weekly operational rhythm.

5. Feature-to-Retention Correlation

Which feature, when used in the first 14 days, correlates most strongly with 90-day retention? This is the query that changes how you think about activation.

SELECT
    e.properties['feature_name'] as feature,
    count(DISTINCT e.distinct_id) as users_who_tried,
    count(DISTINCT CASE WHEN r.event = 'active_90d' THEN e.distinct_id END) as retained_users,
    round(retained_users * 100.0 / users_who_tried, 1) as retention_rate
FROM events e
LEFT JOIN events r ON e.distinct_id = r.distinct_id AND r.event = 'active_90d'
WHERE e.event = 'feature_first_used'
  AND e.timestamp > now() - INTERVAL 90 DAY
  AND e.timestamp < e.timestamp + INTERVAL 14 DAY
GROUP BY feature
ORDER BY retention_rate DESC

The result tells you which features are activation drivers. Once you know which feature correlates with retention, you can redesign your onboarding to get users to that feature faster. Teams that do this analysis correctly often see activation rates improve substantially.

The insight: Correlation is not causation — but it is close enough for product decisions. When users who touch feature X in week one retain at 3× the rate of users who do not, that feature is your activation driver. Optimize the path to it.

The Real Cost of Not Using SQL

The questions your CEO asks on a Tuesday afternoon are not the questions your dashboards answer. They are edge cases, segment breakdowns, and calculations that require joining data across events. These are the questions that require SQL.

$0

HogQL is included in every PostHog plan, including free. There is no additional cost. ClickHouse's query engine processes your data directly — no warehouse, no export pipeline, no per-query charges.

Consider the gap between the two approaches for the same question: "Of users who signed up from our LinkedIn campaign last month, how many activated within 7 days, and what was their median time to activation?"

With the PostHog UI: You need a funnel with a 7-day conversion window, broken down by UTM source, combined with a separate retention query. You run two insights, copy numbers into a spreadsheet, and calculate the median manually. Total time: 15-20 minutes. Accuracy: depends on whether you remembered to apply the same date filter to both insights.

With HogQL: One query. 30 seconds to write once you know the pattern.

SELECT
    countIf(activated) as activated_users,
    countIf(NOT activated) as not_activated,
    round(activated_users * 100.0 / (activated_users + not_activated), 1) as activation_rate,
    median(time_to_activation) as median_minutes
FROM (
    SELECT
        distinct_id,
        minIf(timestamp, event = 'activation_completed') -
            minIf(timestamp, event = 'signup_completed') as time_to_activation,
        countIf(event = 'activation_completed'
            AND timestamp < minIf(timestamp, event = 'signup_completed')
                + INTERVAL 7 DAY) > 0 as activated
    FROM events
    WHERE event IN ('signup_completed', 'activation_completed')
      AND timestamp > now() - INTERVAL 30 DAY
      AND properties['utm_source'] = 'linkedin'
    GROUP BY distinct_id
)

"We built HogQL because we kept seeing the same pattern: teams outgrowing the UI, then either exporting to a warehouse or asking us to build custom queries. HogQL gives them direct access to the same data the UI uses — without the export."

— PostHog Engineering Blog, Introducing HogQL

When the UI Wins

HogQL is not always the right tool. The PostHog UI is purpose-built for specific question types, and using it is faster than writing SQL for these queries.

Question Type Tool Why
How many signups this week? UI (Trends) Simple count over time — point, click, done
What's our funnel conversion? UI (Funnels) Visual funnel builder handles this natively
Show me retention curves by cohort UI (Retention) Visual cohort analysis built for this exact question
Median time to activation by plan? HogQL Requires median + group by — not in the UI
Which features correlate with 90-day retention? HogQL Requires cross-event correlation across your event schema
Revenue impact of funnel drop-off by step? HogQL Requires joining events with revenue data
90th percentile time to activation? HogQL Requires quantile calculation — not available in the UI

The rule: Use the UI for standard analytics patterns. Use HogQL when the question requires a calculation, segmentation, or correlation that the UI templates do not support.

For Engineering-Led Teams

PostHog + Your Warehouse: When and Why

Not every team needs a warehouse. But when you need to join PostHog data with Stripe revenue, Zendesk support tickets, or Google Ads spend — that's when a warehouse makes sense. We cover the architecture, the sync options, and the cost considerations.

What to Do Instead

If you are considering exporting PostHog data to a data warehouse instead of using HogQL, the decision depends on your data architecture and question type.

Use HogQL when:

  • Your question can be answered from PostHog data alone. No external data sources (Stripe, Salesforce, Zendesk) are required.
  • You need the answer now. Warehouse syncs run on schedules — hourly, daily, or weekly. HogQL queries live data.
  • Your team is under 50 people and engineering time is scarce. Setting up a warehouse pipeline requires schema mapping, type handling, and ongoing maintenance.
  • You are doing ad-hoc analysis. Running one-off queries without building a pipeline is the right use case for HogQL.

Use a warehouse (BigQuery, Snowflake, Redshift) when:

  • You need to join PostHog data with non-PostHog data. Revenue from Stripe, support tickets from Zendesk, ad spend from Google Ads — these require a warehouse.
  • You are building recurring dashboards that require complex transformations or ML models. The warehouse is the right home for production analytics.
  • Your data science team needs raw event data for modeling. Exporting to a warehouse gives them a familiar environment.

The cost argument matters at scale. BigQuery charges per query. Snowflake charges per compute. At 10M+ events per month, your per-query warehouse costs add up. HogQL is included in your PostHog plan — no per-query charges.

The complexity argument matters even more for small teams. A warehouse pipeline is not a weekend project. It requires ongoing maintenance as event schemas evolve and property types change. For a team under 20 people, that engineering time is better spent on the product.

The insight: HogQL and the warehouse are not competing tools — they are tools for different question types. Use HogQL for PostHog-only questions. Use the warehouse when you need to join across data sources. Most teams start with HogQL and add a warehouse when they hit the limits of single-source analysis.

FAQ

Is HogQL available on the free plan?

Yes. HogQL is included on all PostHog plans, including the free tier. There is no additional cost and no feature restriction. Over 90% of PostHog companies stay on the free tier, which includes full HogQL access.

Do I need to know SQL to use PostHog?

No. The PostHog UI handles most product analytics questions without SQL. Trends, funnels, retention curves, and path analysis are all point-and-click. HogQL is for the questions the UI cannot answer — typically complex aggregations, cross-table joins, or custom metrics like median time to activation segmented by multiple dimensions simultaneously.

Can I export HogQL query results?

Yes. Query results can be exported as CSV or JSON from the PostHog interface. You can also connect PostHog to external BI tools (Tableau, Looker, Metabase) via the HogQL endpoint, allowing your analysts to query PostHog data from their existing tools.

What's the difference between HogQL and standard SQL?

HogQL is based on ClickHouse SQL, which is similar to standard SQL with extensions for time-series and analytics queries. Most standard SQL patterns — SELECT, WHERE, GROUP BY, JOIN, HAVING — work identically. ClickHouse-specific functions like countIf, median, quantile, and dateDiff give you analytics capabilities that standard SQL would require subqueries or window functions to replicate. If you know standard SQL, you can learn HogQL in an afternoon.

Why is my query slow?

The most common cause of slow HogQL queries is missing time filters in the WHERE clause. ClickHouse's columnar storage scans only the columns you request — but it scans all rows unless you filter by time. Always include AND timestamp > now() - INTERVAL 30 DAY or similar in your WHERE clause. The second most common cause is SELECT * — specify only the columns you need.

What happens if a property doesn't exist?

If a property does not exist on an event, properties['missing_property'] returns null, not an error. Use coalesce(properties['plan'], 'unknown') to handle missing properties gracefully, or add AND properties['plan'] IS NOT NULL in your WHERE clause to filter out rows with missing properties.

Sources

Jake McMahon

About the Author

Jake McMahon is the founder of ProductQuant and holds a Master's in Behavioural Psychology and Big Data. Australian by birth, currently based in Tbilisi, Georgia. He works with engineering-led product teams on analytics architecture, activation analysis, and the SQL patterns that answer the questions dashboards cannot. The research for this article was conducted via Firecrawl and Playwright against PostHog's official documentation, blog, and tutorial content.

Next Step

Run Your First HogQL Query Today

If you have PostHog installed, you have HogQL. The queries in this article run on your existing data. Start with the activation rate by segment query — it is the one that most often leads to the next question.