ACID vs BASE: The Database Properties Every Data Engineer Should Know

Every database makes promises. When you commit a transaction, it tells you what it guarantees about that data — whether it will survive a crash, whether it will be immediately visible to other users, whether a partial failure leaves garbage behind. ACID and BASE are two fundamentally different sets of promises. Knowing which one your system makes — and which one it does not make — changes how you design pipelines, handle failures, and trust your data.

The analogy: a bank vault and a postal service

ACID is a bank transfer. When you send money to someone, the bank guarantees that either the full amount leaves your account AND arrives in theirs, or nothing happens at all. There is no scenario where the money leaves your account and disappears in transit. The bank enforces this guarantee at every level — hardware, software, network. It is slower than handing someone cash on the street, but the guarantee is absolute.

BASE is a postal service during peak season. Your package will get there eventually. The tracking system might show “in transit” on one website and “out for delivery” on another at the same moment — different nodes have different views of the current state. It is not wrong, it is just temporarily inconsistent. Eventually the package arrives, the systems reconcile, and everything agrees.

ACID — strong guarantees for relational databases

ACID stands for Atomicity, Consistency, Isolation, and Durability. These four properties define how a relational database handles transactions. Together they guarantee that a transaction either happens completely and correctly, or it does not happen at all.

ACID databases include PostgreSQL, MySQL,Oracle, SQL Server, and SQLite. These are the systems that power the operational side of almost every business — the application database, the payments system, the inventory management system.

BASE — availability over consistency in distributed systems

BASE stands for Basically Available, Soft state, and Eventually consistent. It describes the behaviour of distributed databases that prioritise availability and partition tolerance over strict consistency.

Classic BASE databases include Apache Cassandra,Amazon DynamoDB, and Apache CouchDB. These systems are designed to scale horizontally across many nodes, survive node failures without going down, and handle write volumes that relational databases struggle with.

The trade-off is consistency. In a BASE system, different nodes may return slightly different answers to the same query during the window between a write and its full propagation. This is acceptable for many use cases — a social media like count that reads 1,042 on one server and 1,041 on another for a few seconds causes no harm. A bank balance that reads incorrectly for even a millisecond is a serious problem.

Not every database fits cleanly into one category. MongoDB, for example, has supported multi-document ACID transactions on replica sets since version 4.0, making it more nuanced than a simple BASE label suggests. Many modern distributed databases have evolved to offer configurable consistency levels — letting you choose the trade-off per operation.

The CAP theorem

In 2000, Eric Brewer proposed what became known as the CAP theorem — later formally proved by Gilbert and Lynch at MIT — which states that a distributed system can only guarantee two of three properties simultaneously:

Consistency — every read returns the most recent write
Availability — every request receives a response (not an error)
Partition tolerance — the system continues operating when network partitions occur

In a real distributed system, network partitions are inevitable. So the real choice is between Consistency and Availability when a partition occurs. ACID systems choose Consistency: they may refuse to respond rather than return stale data. BASE systems choose Availability: they respond, even if the response might be slightly out of date.

Why this matters for data engineers

Most data engineers work primarily with ACID systems on the source side and something closer to BASE on the data platform side. This gap is the root cause of several common pipeline problems.

Your application database — Postgres, MySQL — is ACID. When your ingestion tool (Fivetran, Airbyte) reads from it and loads into S3 or a cloud warehouse, the destination is not ACID by default. Object storage (S3, GCS) has no transaction semantics. If your pipeline crashes halfway through writing 1 million rows, 600,000 rows are in the destination and 400,000 are not. There is no rollback.

This is why two engineering patterns matter in data pipelines:

Idempotency — running the pipeline twice should produce the same result as running it once. Design your jobs so a re-run does not create duplicates. Use MERGE or INSERT OVERWRITE instead of plain INSERT where possible.
Deduplication — expect duplicates at the ingestion layer (Bronze) and handle them explicitly in staging (Silver). Do not assume your source will only send each record once.

The modern solution: open table formats like Delta Lake, Apache Iceberg, and Apache Hudi bring ACID transactions to data lakes. They sit on top of object storage (S3, GCS) and add atomicity, rollback, and time travel. If your data platform uses Databricks + Delta Lake or BigQuery with Iceberg, you get ACID guarantees on what would otherwise be a non-transactional storage layer. This significantly simplifies pipeline design.

When to use ACID databases and when to use BASE

The choice is usually made for you by the use case:

OLTP workloads — orders, payments, user accounts, inventory — require ACID. The cost of an inconsistent read is a wrong bank balance or a double-sold item. Use PostgreSQL, MySQL, or similar.
High-scale write-heavy workloads — event tracking, IoT sensor data, activity logs at millions of events per second — are well suited to BASE systems like Cassandra or DynamoDB. Strict consistency is not required; scale and availability are.
Analytics workloads — your data warehouse (BigQuery, Snowflake) is somewhere in between. It is not a BASE system — it enforces schema and handles failures gracefully — but it is also not designed for OLTP write patterns. It is optimised for read-heavy analytical queries.

The most important practical lesson: understand what guarantees your source system makes, and design your pipeline to handle the gaps. A Cassandra source may deliver the same event twice. A Postgres source with CDC (change data capture) can guarantee exactly-once delivery but requires careful handling of out-of-order events. Knowing whether you are working with ACID or BASE upstream determines how much defensive engineering your ingestion layer needs.

ACID vs BASE: The Database Properties Every Data Engineer Should Know

The analogy: a bank vault and a postal service

ACID — strong guarantees for relational databases

BASE — availability over consistency in distributed systems

The CAP theorem

Why this matters for data engineers

When to use ACID databases and when to use BASE

Working on something similar?