Discussions

News: Configuration Management Using Subversion

  1. An ONJava article by Swaminathan Radhakrishnan called "Configuration Management in Java EE Applications Using Subversion" shows how to use the popular source control system for arching changes in data instead of using archive tables.
    Relational databases are generally the preferred choice for storing application data. They help organize, store, and retrieve the data in a very efficient manner. Since application data is stored in these relational databases, applications try to use these databases for tracking historical data as well. The most pervasive approach to storing historical data is to have a time-stamped history table for every table that stores important application entities. Updates made to the main table result in actions that push out the previous values of data to the history tables. This is either done through triggers or by the applications themselves.

    There are several issues with storing the historical information in history tables.
    • Relational databases and the relational data modeling concentrate on efficient data storage and retrieval. The history tables do not model naturally with relational databases.
    • There is no support for versioning from the database. The application has to carefully store the entries into the history tables either through triggers or some other custom techniques.
    • It's up to the application to determine what changed between versions. The retrieval of historical information from the history tables is also specific to the history table storing the data.
    The relational databases should still be the repository for storing and retrieving transactional data. They excel in managing these critical data assets. The shortcomings listed above are confined to storing multiple versions of data entities within the relational data store and tracking such entities over time.
    The thrust of the article can be found in this paragraph:
    Applications can use a combination of relational databases and Subversion to satisfy data management and data tracking requirements. Any updates that are made to the critical data assets present in the relational database would be accompanied with a commit into Subversion. Subversion would be the primary data source for use-cases for tracking, while the relational database would be used for all other purposes. An additional advantage is that due to Subversion's copy-modify-merge concept, there is no requirement to lock an object every time its retrieved from the relational database.
    Other coders have tried (successfully?) to use engines such as Lucene for actual data storage as well. Lucene doesn't really apply version control for data (as far as I know, although it can be bolted on), but the idea of using relational databases solely for looking up references to externally-controlled data can be found in more and more places (such as SVN, as in this article, or possibly even in APIs such as the Java Content Repository.)

    What do you think of the idea?

    Threaded Messages (9)

  2. We actually do this[ Go to top ]

    Polarion for SVN (http://polarion.com/) is actually working this way - it's storing the tracker data as well as all configuration in Subversion and therefore it's possible to track any changes made to the tracker or configuration by anyone (e.g. for observing a work item history or for comparing baselines of project tracker).

    The Subversion is used as the primary storage and on top we use Lucene and some caching to achieve good performance of searching and object retrieving.

    Michal Dobisek
    Polarion Software
  3. We actually do this[ Go to top ]

    Acually, Oracle DOES provide such capabilities. You can perfrom an SQL query, specifying the point-in-time to reference - that is "query the database, as it was two days ago" for example (only in SQLish...)

    It does so by using its redo log so it can trace back the database state back to...ummm...whenever ;-) Not sure how it fares performance-wise though.
  4. We actually do this[ Go to top ]

    This is interresting point, I didn't know this Oracle feature, thanks. In our situation there is still the strong positive point in that that both source codes and the tracker+configuration are stored in one place (SVN) in a readable form (XML), which provides easier way to manage the thing and a better integration.
  5. Re: We actually do this[ Go to top ]

    ...it can trace back the database state back to...ummm...
    Let me finish that: until 1555 is raised. Or 8177, if in a transaction. ;-)

    Sorry, couldn't resist. Silly jokes aside, could you please explain how it works exactly or post a link to Oracle documentation?
  6. If I'm not completely wrong SDOs are by in essence XMLs documents and have a similar concept of "data change".
    These data changes could be easily stored in a different table, using an aspect that wraps DAOs write methods or any other method.

    Am I missing something?
  7. ACIDity is the challenge[ Go to top ]

    I've contemplated systems like this in the past (including one that would have been RDBMS + SVN + Search engine) but guaranteeing ACIDity across all of the storage mechanisms is problematic (particularly with SVN, which, AFAIK, cannot be referenced as an XA Resource Manager).

    Even if all of the constituent storage systems can be treated as XA Resource Managers you're then into 2 phase commit land, and that's territory that's best avoided...
  8. ACIDity is the challenge[ Go to top ]

    i agree with your points, but i think its an interesting and very useable solution that would suit a lot of small budget projects. as a quick solution to requirements such as auditing and versioning, i think you could do a lot worse than using subversion for historical data.
  9. JSR 170 (see Apache JackRabbit, the reference implementation) supports JTA for transactions. So I assume this means any JSR 170 compliant repository could participate in transactions with a seperate RDBMS.
  10. JCR vs Subversion[ Go to top ]

    In simple circumstances, say a small number of configured objects, a JCR solution, might work quite well. However, when trying to manage a complex software configuration of say, thousands of interdependant objects, one needs to be able to snapshot configuration space quickly and cheaply. Subversion does this. But, as others have pointed out,
    Subversion does not give you strong ACID guarantees. Sometimes, of course, such guarantees are unneeded. At other, perhaps most, times, they are, and that's where systems like Alfresco's upcoming Web Content Management driven and (partially) Subversion inspired features will come in handy. In brief, Alfresco will be offering the rich versioning semantics of Subversion plus powerful configuration composing features (layering) in an ACID, and scalable package.
    Now in fairness to the JCR we (Alfresco, I do work for them by the way) hope to get some richer versioning features into JSR-283, the follow-on to the JCR.