What is multiversion concurrency control?
Multiversion concurrency control (MVCC) is a database optimization technique that creates duplicate copies of records so that data can be safely read and updated at the same time.
With MVCC, DBMS reads and writes don’t block each other.
What are the benefits of an MVCC architecture?
When implemented properly by a DBMS, multiversion concurrency control provides the following benefits:
- Diminished need for database locks
- Fewer database access contention issues
- Improved read access performance
- Continued record isolation for write operations
- Reduced number of database deadlocks
How does an MVCC database work?
While each DBMS vendor is free to implement MVCC in their own way, multiversion concurrency control usually works like this:
- Every database record has a version number
- Concurrent reads happen against the record with the highest version number
- Write operations operate on copy of the record, not the record itself
- Users continue to read the older version while the copy is updated
- After the write operation is successful, the version id is incremented
- Subsequent concurrent reads use the updated version
- When a new update occurs, a new version is again created, continuing the cycle
What is the difference between MVCC vs locking?
Unlike traditional a DMBS, multiversion concurrency control does not lock a record when a write operation is about to occur. Instead, a new version of the record, with an incremented version number, is created.
Users can continue to read the old version of the record while the new record is transitionally edited and updated. This eliminates the need for locks, and it eliminates contention and deadlock issues.
When the new version of the record is committed to the database, all future read operations will work on the updated version. New write operations will again create a new version, continuing the cycle.
What are the drawbacks to multiversion concurrency control (MVCC)?
While MVCC offers many benefits, there are two key drawbacks to multiversion concurrency control:
- Concurrent update control methods are difficult to implement
- The database will grow in size and become bloated by multiple versions of DBMS records
For the user, or even the developer, the complexity associated with the implementation of MVCC concurrency control methods is completely hidden, as the functionality is provided by the database vendor.
The developer can write SQL, and the end user can consume applications as they normally would. The fact that the DBMS uses multiversion concurrency control behind the scenes is completely transparent to them.
PostgreSQL MVCC multiversion concurrency control
Every vendor has their own strategy for minimizing the main drawback of MVCC, which is the ever expanding size of the database given the number of record versions that can get created.
To deal with version bloat, a PostgreSQL MVCC database uses process, cleverly named VACCUM, to identify and delete duplicate records created by the multiversion concurrency control process that are no longer needed.
Addressing the MVCC database space exceeded problem
The PostgreSQL MVCC VACCUM process solves the version bloat problem, but it can be resource intensive when it runs, causing locks of its own as it inspects the state of each record in the database.
Despite the fact that it may consume extra resources, RDBMS admins should not turn PostgreSQL MVCC VACCUM process off, or else the system will eventually generate an MVCC database space exceeded error, and the database will need to be taken offline in order to resolve it.