Facts, Dimensions, and the Star Schema: Data Modelling for Non-Engineers

Your analytics dashboard is slow. Your analysts keep re-joining the same five tables in every query. Two people calculate “monthly revenue” and get different numbers. Your BI tool struggles with the schema. These aren't tool problems or performance problems — they're modelling problems. And the solution is a pattern that's been around since the 1990s: the star schema.

The analogy: receipts and catalogues

Imagine a busy retail store at the end of the day. The manager has a stack of till receipts and a set of reference catalogues.

Each receipt records what happened: item #4821 sold, quantity 2, price $34.99, at 14:32, paid by card, customer #9012. The receipt doesn’t contain the product name, the customer’s address, or the product category — just IDs and numbers. Thousands of receipts pile up every day.

The catalogues explain the IDs: the product catalogue says item #4821 is a “Leather Wallet, Men’s, Accessories, Brand X”. The customer register says customer #9012 is “Alice, Sydney, Loyalty Tier Gold, acquired via Instagram”.

This is a star schema. The receipts are your fact table. The catalogues are your dimension tables. The fact table sits in the middle and references the dimensions around it — forming a shape that, if you draw it, looks like a star.

What is a fact table?

A fact table records events or transactions — things that happened. It has three characteristics:

It grows quickly. Every order, every click, every payment adds a row. Fact tables often have hundreds of millions of rows.
It contains measures. Numbers you aggregate: revenue,quantity, discount_amount, session_duration.
It contains foreign keys. IDs that point to dimension tables:customer_id, product_id, date_id. The fact table doesn’t store the customer’s name — it stores their ID and lets you look up the name in the dimension table when you need it.

What is a dimension table?

A dimension table describes the context around a fact. Who did it? What was involved? When? Where?

It’s smaller. You have millions of orders but only thousands of products and hundreds of thousands of customers. Dimension tables are narrow.
It contains attributes. Descriptive fields you filter or group by:product_category, customer_city, acquisition_channel.
It changes slowly. A customer’s city might change. A product might be re-categorised. But these changes are infrequent — hence the term “Slowly Changing Dimension” (SCD).

A real e-commerce example

Here’s how a star schema looks for an e-commerce order analytics use case. The fact_orders table is your central table. It has one row per order line item, and it contains:

order_id, order_line_id — identifiers
customer_id, product_id, date_id, channel_id — foreign keys to dimensions
quantity, revenue, discount — the measures

To answer the question “What was our revenue from Sydney customers buying Accessories in Q1?”, your query joins fact_orders todim_customer (filter: city = ‘Sydney’), todim_product (filter: category = ‘Accessories’), and todim_date (filter: quarter = 1). Three joins, all straightforward, and your BI tool can generate this automatically.

Why this matters for performance

The alternative to a star schema is a wide, denormalised “flat table” — every column in one place. order_id, customer_name,customer_city, product_name, product_category,revenue — all in one massive table.

Flat tables feel simple at first. But:

Customer data is duplicated in every order row. If a customer changes their city, you have to update millions of rows.
Column count explodes. One table for orders, customers, products, dates, channels, campaigns — you end up with 80+ columns, most of which you never query together.
Metric definitions diverge. Analyst A calculates revenue including discounts. Analyst B excludes them. Neither is wrong — the flat table doesn’t enforce a single definition. A star schema with a well-defined fact table does.

The key insight: a star schema isn’t about performance alone — it’s about creating a single, agreed-upon definition of your business metrics. When fct_orders.revenue means one specific thing and every dashboard uses it, your numbers stop disagreeing.

How this connects to dbt

In a dbt project, the star schema maps neatly to the marts layer:

models/marts/core/fct_orders.sql — your fact table, joining and aggregating from staging models
models/marts/core/dim_customers.sql — your customer dimension
models/marts/core/dim_products.sql — your product dimension

dbt’s ref() function handles the dependencies automatically. Build the staging models first, then the dimensions, then the facts. The lineage graph makes this hierarchy visible to everyone on the team.

When to use a star schema (and when not to)

The star schema is the right choice when:

You have a clear set of business events to measure (orders, sessions, payments)
Multiple teams or dashboards query the same data
You want consistent metric definitions across reports
Your BI tool needs a well-structured schema to auto-generate queries

It’s overkill when:

You’re doing one-off analysis or exploration — a flat query is faster to write
Your data is truly simple — one source, one team, one dashboard
You’re at a very early stage and the schema changes weekly — model it properly once things stabilise

For most growing companies with more than one analyst and more than one dashboard, the star schema pays for itself quickly. The upfront modelling work is replaced by dramatically faster query writing, consistent numbers, and a data warehouse that new team members can understand without a 2-hour onboarding session.

Facts, Dimensions, and the Star Schema: Data Modelling for Non-Engineers

The analogy: receipts and catalogues

What is a fact table?

What is a dimension table?

A real e-commerce example

Why this matters for performance

How this connects to dbt

When to use a star schema (and when not to)

Working on something similar?