You need to know how many orders shipped to Australia last quarter. 50 million rows in your database. PostgreSQL takes 42 seconds. The same query in BigQuery takes 0.8 seconds. Same data, same SQL, same result — 52× faster. This isn't magic or better hardware. It's a fundamentally different way of storing data on disk.
The analogy: the filing cabinet vs. the card index
Imagine you manage records for 10,000 employees at a large company.
Row-oriented (the filing cabinet): One physical folder per employee. Each folder contains their name, age, salary, department, hire date — everything together. To find the average salary for everyone in Engineering, you must open every single folder, find the salary field, ignore everything else, and calculate. You open 10,000 folders to get one number.
Columnar (the card index): One index drawer per data field. All salaries in one drawer. All departments in another. All hire dates in a third. To find average Engineering salary, you pull the salary drawer and the department drawer only — you never touch names, ages, or anything else. You read two drawers out of twenty.
That’s the entire insight. Columnar storage lets you read only the columns your query touches, skipping everything else.
How row-oriented storage works
PostgreSQL, MySQL, and most traditional relational databases store data row by row. When you insert a record — say, an order with id=1, customer=“Alice”, amount=45.99, country=“AU” — those values are written adjacent to each other on disk:[1, “Alice”, 45.99, “AU”]. The next row follows immediately after.
Row-oriented is excellent for:
- Single-row lookups: “Give me order #12345” reads one block, gets all fields instantly.
- Writes and updates: Adding or changing one row touches one location on disk.
- Transactions: ACID compliance is natural at the row level — either the whole row is written or none of it is.
Row-oriented is painful for analytics because:
SELECT AVG(amount) FROM ordersmust read every field of every row — including customer name, address, product ID, and 30 other columns you don’t need — just to get toamount.- At 50M rows with 40 columns, you’re reading ~40× more data than the query actually requires.
How columnar storage works
BigQuery, Snowflake, Amazon Redshift, and Apache Parquet store data column by column. All amount values live together. All country values live together. Different columns are stored in separate files or file sections.
When you run SELECT AVG(amount) FROM orders WHERE country = 'AU', the database reads exactly two columns: amount and country. The other 38 columns? Never touched. Never read from disk. Never loaded into memory.
At 50M rows with 40 columns, you’ve just reduced your I/O by 95%.
Why columnar storage is also cheaper
A hidden benefit: columns compress dramatically better than rows.
Imagine a country column in your orders table with 50M rows. Most countries appear thousands of times. A columnar database can use run-length encoding: instead of storing “AU”, “AU”, “AU”...400,000 times, it stores “AU” × 400,000. One entry. The column shrinks dramatically.
Row-oriented storage can’t do this — adjacent values are different fields with different types and wildly different values, so compression is far less effective.
Going deeper: why columnar is even faster than it sounds
Column pruning (reading fewer columns) is just the first layer. Two more reasons columnar systems are fast:
Predicate pushdown: The database reads your filter column first, identifies which file segments (called “row groups”) contain matching values, and then reads other columns only for those segments. If only 5% of rows havecountry = 'AU', BigQuery skips 95% of the data for every other column. The filter happens at the storage layer, before data even enters memory.
Vectorised execution: Modern CPUs have SIMD instructions — single operations that process 256 or 512 values simultaneously. Columnar databases store the same data type contiguously, making it trivially easy to apply these instructions. Processing 512 prices at once is much faster than processing them one by one.
OLTP vs. OLAP — the real distinction
You’ll see these terms constantly:
- OLTP (Online Transaction Processing): Many small reads and writes, one row at a time. Row-oriented. PostgreSQL, MySQL, DynamoDB, Aurora. Your application database.
- OLAP (Online Analytical Processing): Few large reads, scanning many rows but few columns. Columnar. BigQuery, Snowflake, Redshift. Your analytics warehouse.
The most common pattern for growing companies: use both. Keep your operational data in PostgreSQL (your app writes individual records to it constantly). Replicate it to BigQuery or Snowflake on a schedule — nightly export or real-time CDC (Change Data Capture). Run all your analytics there.
The practical rule of thumb: If you’re running queries that aggregate more than 1 million rows — counts, averages, GROUP BYs — you want a columnar database. If you’re reading or writing individual records (a user profile, a payment, an order), stay with row-oriented. The two are complementary, not competing.
What this means for your stack
You don’t need to choose one or the other. The standard architecture for data-driven companies looks like this:
- Source of truth: PostgreSQL or MySQL — your app writes here. Fast reads and writes for individual records.
- Analytics warehouse: BigQuery or Snowflake — your analysts query here. Columnar, compressed, designed for aggregate queries at scale.
- The connector: A pipeline (Fivetran, Airbyte, custom Airflow job) that moves data from PostgreSQL to BigQuery, typically every hour or nightly.
That pipeline — moving data from your OLTP system to your OLAP system — is often one of the first things a data engineer builds. And understanding why it exists is now clear: you’re moving data from a format optimised for your app to a format optimised for your analysts.