I have worked on several applications that required full versioning of all database entities, not just to meet audit log requirements but so that older versions of entities can be viewed at any point. Often this is required for a document-based workflow system, where there is a legal requirement to see who entered what at each stage in workflow. As a document progresses through workflow, new versions are saved out but the old data cannot be removed or overwritten. We've typically implemented this using a combination of custom transaction management code, DAOs and stored procedures, but I'm sure there's a better (simpler, less complex) way of doing this?
Essentially, what is required is that we never UPDATE any data, only INSERT new versions. This is often handled by having a composite primary key (sequence number and version number, which increments) or a sequential primary key and a "parent" ID used to tie versions of an entity together.
Sometimes the older versions are saved in a different table (e.g. an audit table) but sometimes all data has to be in the same table, as which is "logically" the latest version of each entity might depend on who is asking (e.g. using Oracle FGAC/VPD access control).
I'm not sure whether EJB or O-R persistence frameworks could handle this sort of thing, since essentially each entity's primary key is changing mid-transaction?
Has anyone seen any design patterns that cover this sort of thing? I'd be interested what other approaches have been taken on this issue.
I do something very similar. I use JDO for persistence for use the following
1. Every table has a version number and a parent id (which is similir to your solution).
2. A use a mini-rules engine to allow for "callbacks" before and after create/ modify and delete operations (at the object level).
3. I do not insert the new data, but rather insert a copy of the data before it was modified and then modify the existing row. This removes the need for id/version number primary keys as follows:
id version parent name
0 0 null bruce
After Modify (change name from bruce to fred):
id version parent name
0 1 null fred
1 0 0 bruce
The nice thing here is that i do not need to use version number in my keys. Id is always at the "top" of the stack, i.e. it always points to the most recent edit. I use the version numbers for optimistic locking as well and the parent points for traversal.
The only issue I haven't licked is link tables. Should I copy all of the links with the row ? Obviosuly that could be expensive in terms of time and space.
Any ideas ?
Interesting, bruce. How scalable have you found JDO to be, BTW?
In terms of links, I have used two solutions:
1. reference <any> version of an entity
2. reference a specific version of an entity.
With (1) your links remain valid after saving out a new version, whilst with (2) your links "disappear" unless copied as part of the save.
Obviously maintaining referential integrity inside your database gets difficult if your entities are in one of two tables, or the foreign keys are referencing a non-unique column... ;-(
In terms of scalability I have not used JDO (the KODO implementation for
more than a few thousand rows. However, this would really depend on the JDO
In terms of your point 1) I do not think this is viable under a relatrional model. Say you have a one to many relationship of people to address. If I edit a person then my options are to a) let the address objects continue to reference the edited people object or b) copy the address so that both the historic people object and the edited people object have the same address. For your solution we would need to basically make every relationship a time based many to many relationship so I would have links from both the history and edited people object ot the same address objects. N umber of link rows can become quite large.
Note that my solution has the links "disappear" for the older objects while the edited objects maintain their links.
As to maintaining reference integrity that is not an issue if the data is all stored in one table. What gets complicated is that in order to be space efficient you probably want to store "diffs" in terms of adding/removing objects from child collections. So if a user is edited and an address is removed you want to store the edited row and potentially a row saying that the address has been removed rather than copying all of the address except the removed one. Sucks right.
We start to get into true version control issues where we have a starting point and then we save the diffs (or we save the current and have the diffs to go backwards). Its possible but it takes a lot of code to now load objects. I like the save the current and have the diffs to go backward myself as it makes the sql cleaner for most ops (I think).