Most data quality problems are not caused by bad data. They are caused by not knowing which version of the data you are looking at. Is this table the raw export from Shopify, or the cleaned version, or the one with business logic applied? When the answer is “it depends on which analyst built it,” you have a structure problem. The medallion architecture solves it with a simple rule: data moves through named layers, each with a clear contract, and nothing skips a layer.
The analogy: sorted mail tray, filing cabinet, and executive report
Think of how information flows through an office, starting from the moment mail arrives.
The raw mail arrives in the building's post room — invoices, contracts, PDFs, handwritten notes. This is your source data: what Fivetran or Airbyte loads directly from Shopify, Stripe, or your application database. It is raw and unprocessed.
The sorted mail tray is Bronze. A receptionist opens each envelope, reads the label, and places it in the correctly named slot — “Finance,” “Legal,” “HR.” The content has not changed, but it is now consistently labelled and accessible. In dbt, this is the staging layer: rename the columns, cast the data types, one model per source table. Nothing more.
The filing cabinet is Silver. Someone went through the mail tray, removed duplicates, combined related documents, and filed them by topic and date. In dbt, this is the intermediate layer: deduplicate records, join staging models together, and prepare the shared building blocks that multiple marts will use.
The executive report is Gold. It does not contain raw invoices — it contains revenue by region, outstanding receivables, and cost trends. Business logic is applied. In dbt, these are your fact and dimension tables in the marts layer.
The Bronze layer — staging (stg_)
Bronze is the first dbt model layer. Its job is strictly mechanical: take a source table and make it usable, without applying any logic beyond what is necessary to clean up the source system's quirks.
The dbt Fundamentals course defines three rules for staging models:
- 1:1 with source tables. One staging model per source table, no more, no less.
stg_orders.sqlreads from the raw orders table and nothing else. - Rename and cast only.
o_idbecomesorder_id. AVARCHARdate column becomes a properTIMESTAMP. BasicCASE WHENcategorisation is acceptable. Nothing that requires joining another table. - No joins, no aggregations, no deduplication. These belong in Silver. If you find yourself writing a
JOINin a staging model, stop — it belongs in an intermediate model.
Staging models are typically materialised as views rather than tables, because they are lightweight and always derived directly from the source. Running them costs almost nothing.
Raw source tables — the tables Fivetran or Airbyte load into your warehouse — sit upstream of Bronze. They are referenced in dbt via source()and defined in sources.yml, but dbt does not transform them. Bronze (staging) is the first layer dbt actually creates.
The Silver layer — intermediate (int_)
Silver is where the real cleaning happens. Once you have consistent, correctly typed data coming out of your staging models, intermediate models handle everything that requires combining or further processing that data.
- Deduplication. Source systems often send the same record twice due to retries, webhook replays, or CDC (change data capture) edge cases. Intermediate models deduplicate using
ROW_NUMBER()or similar window functions, so that every downstream model works with clean, deduplicated data. - Joins across staging models. An intermediate model might join
stg_orderswithstg_order_itemsto create a single enriched entity. This join does not belong in staging (which is 1:1 with source) or in marts (where business logic would get mixed with structural joins). - Shared building blocks. If multiple mart models need the same joined or cleaned dataset, build it once in intermediate and reference it with
ref()wherever it is needed.
Intermediate models use the int_ prefix by convention —int_orders_joined.sql, int_order_items_enriched.sql. They are not exposed to BI tools directly. They exist purely to serve the Gold layer.
The Gold layer — marts (fct_ and dim_)
Gold is where business logic lives. Marts are the models that analysts and BI tools query directly. They are built from intermediate models (Silver) and contain the agreed-upon definitions of your company's metrics.
Two types of models live in Gold:
- Fact tables (
fct_). Records of events: orders, payments, sessions, clicks. Narrow and long — mostly foreign keys and measures. Example:fct_orders.sqlwith one row per order line item, containingrevenue,quantity, and keys to dimension tables. - Dimension tables (
dim_). Descriptive context: customers, products, dates, channels. Example:dim_customers.sqlwith one row per customer, containingcity,acquisition_channel,loyalty_tier.
Gold models are typically materialised as tables (not views) because they are queried frequently by BI tools and analysts. Materialising them avoids re-running all the upstream logic on every query.
Why layers matter: the reprocessing argument
The most important property of a layered architecture is what happens when something goes wrong — or when business logic changes.
Suppose a stakeholder says: “We have been calculating revenue including refunds. It should exclude them. Can you fix the last 18 months of data?”
In a layered architecture: you update the Gold model that calculates revenue, re-run it, and the fix propagates everywhere that model is used. Bronze and Silver are untouched. The raw source data is preserved. The correction takes minutes.
Without layers — if analysts query staging models directly, or business logic is scattered across staging and intermediate — the same fix requires tracking down every model that touched revenue logic, and hoping nothing else broke in the process.
The data swamp problem
A data platform without a medallion architecture becomes a data swamp. Models accumulate without structure, and within months nobody knows:
- Which models are safe to query and which are intermediate work-in-progress
- Whether deduplication has been applied
- Which version of business logic produced a given number
- Why a model exists and who maintains it
The medallion architecture is not a technical constraint — it is a team agreement. Bronze handles structure. Silver handles cleanup and joining. Gold handles business logic. When everyone follows the convention, a new team member can open the dbt project and immediately understand where to look.
The practical rules: analysts should only query Gold. BI tools should only connect to Gold. If you find a JOIN in a staging model, move it to intermediate. If you find business logic in an intermediate model, move it to a mart. Keeping each layer to its purpose is what makes the architecture worth having.
Medallion architecture and dbt
dbt does not use the Bronze/Silver/Gold terminology officially, but its recommended project structure maps directly onto it:
models/staging/→ Bronze (stg_ prefix, 1:1 with source)models/intermediate/→ Silver (int_ prefix, deduplication and joins)models/marts/→ Gold (fct_ and dim_ prefixes, business logic)
The raw tables loaded by your ingestion tool (Fivetran, Airbyte) sit upstream of all three layers. dbt reads them via source() references defined insources.yml but does not modify them.
The ref() function enforces the layer contracts automatically. A mart model that tries to ref() a source table instead of a staging model will still run — but dbt's lineage graph makes the skip visible, and code review catches it. The folder structure and naming conventions make violations obvious.
The medallion architecture predates dbt and applies equally to Spark, SQL Server, or any other transformation engine. But dbt is the cleanest implementation — the layer boundaries become folder boundaries, the naming conventions are enforced by convention, and the lineage graph makes the entire flow visible to everyone on the team.