SymmetricDS 1.0.0 Released - synchronizes databases

Discussions

News: SymmetricDS 1.0.0 Released - synchronizes databases

  1. SymmetricDS is open source (LGPL), web-enabled, database independent, data synchronization software. It uses web and database technologies to replicate tables between relational databases in near real time. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage. Features
    • Data Channels - Table synchronizations are grouped into independent channels
    • Guaranteed Delivery - Synchronized data is guaranteed to arrive at the target destination. If a synchronization fails, the same batch of data will be retried until it succeeds or manual intervention is taken. All other data synchronization is halted for the failed channel only.
    • Transaction Aware - Data updates are recorded and replayed with the same atomicity
    • Centralized Configuration - All configuration is downloaded from a central registration server
    • Multiple Deployment Options - Standalone engine, web application, embedded software component
    • Data Filtering and Rerouting - Allows for localized passwords and sensitive data filtering/routing
    • HTTP Transport - Pluggable transport defaults to Representation State Transfer (REST-style) HTTP services
    • Payload Compression - Optionally compresses data on transport
    • Notification Schemes - Push (trickle-back data) or Pull (trickle-poll data) changes
    • Symmetric Data Protocol - A fast streaming data format that is easy to generate, parse, and load
    • Plug-In API - Add customizations through extensions and plug-in points
    • Two-Way Table Synchronization - The same table can be synchronized both to and from the host system while avoiding update loops
    • Database Versioning - Specify data synchronization by version of target database
    • Auto Database Creation - Optionally allow creating and upgrading of database schema
    • Embeddable - Small enough to embed or bootstrap within another application (i.e. a POS application)
    • Multiple Schemas - Supports multiple database schemas naturally through the existence of Data Channels
    • Primary Key Updates - Captures the "before" and "after" data being changed, allowing updates to primary key data
    • Remote Management - Administration through a Java Management Extensions (JMX) console
    • Remote Database Administration - SQL can be delivered and run at remote databases via the synchronization infrastructure
    • Initial Data Load - Prepare the satellite database with an initial or recovery load of data

    Threaded Messages (17)

  2. The featureset of SymmetricDS sounds great! I wonder if it is possible to restrict the lines to be synchronized with a where clause. I want to build a multi site system, were some nodes are synchronized to some nodes only but synchronization must not be transitive. I.e. if A is synchronized to B and B to C, then I don't want that colaterally data from A is synchronized to C. Is this possible? Regards, Holger Engels
  3. Thanks for the interest. SymmetricDS supports exactly what you described. The documentation is still a work in progress, but it sounds like you are interested in the Trigger table. Database triggers are created off of the metadata provided in this table. You can accomplish what you are looking to do by specifying node_select criteria or sync_on_*_condition(s).
  4. I'll probably try this in an upcomming project, looks like an awesome project.
  5. Syncing with a local DB[ Go to top ]

    I have an application on multiple JVMs and nodes and I would like a lot of data stored locally for performance or even in memory using Apache Derby. But I need all these copies to synchronize with the central Oracle cluster when the data changes periodically. Is this possible?
  6. Re: Syncing with a local DB[ Go to top ]

    Yes, this is possible. You would deploy an instance of SymmetricDS per 'local' database with a node_group_id of 'local' (for example). You would also deploy an instance of SymmetricDS configured with a node_group_id of 'origin' (for example) with a DataSource pointed to the Oracle cluster. You would configure a node_group_link to 'pull' data with the source_node_group_id equal to 'origin' and the target_node_group_id equal to 'local.' The 'pull' frequency is configurable. If you are running in a web container you can deploy the SymmetricDS instances as WARs. You could also deploy them as a standalone process or bootstrap SymmetricDS as part of your application. We plan on supporting Apache Derby eventually, but 1.0.0 does not have a DerbyDialect. The Getting Started tutorial should help get you started. Feel free to post additional questions at the Forum!
  7. Additional dialects[ Go to top ]

    Is there any plan or roadmap for additional dialects implementation?
  8. Re: Additional dialects[ Go to top ]

    Yes, additional dialects are a priority for the next release. We are working on PostgreSQL, MS SQL-Server, and Apache Derby. We have a roadmap posted so you can see what other features and bug fixes are planned.
  9. Does Symmetric support undo/redo log replication as well as the trigger based approach? If not, how much additional performance overhead can one expect on the source database?
  10. Using database triggers is the only approach SymmetricDS supports. We keep the trigger code to a minimum so they are efficient. Capturing data instead of statements can be an advantage. For example, if you update data in the source database that is missing in the destination database, the update can be translated into an insert (called a "fallback insert"). For performance-sensitive tasks like a bulk load of large tables, it is advisable to disable the triggers. Having said this, we routinely bulk-load 200,000 rows into tables on an Oracle RAC without noticing a performance problem because of the triggers. The same production system also loads over 7 million rows of transactional data per day. After SymmetricDS was installed on the system, there was not an increase in utilization or degradation in performance that we could detect. To address this question in the future, I will try to quantity the overhead by posting benchmarks on our website. Thank you for the excellent question.
  11. Using database triggers is the only approach SymmetricDS supports. We keep the trigger code to a minimum so they are efficient. Capturing data instead of statements can be an advantage. For example, if you update data in the source database that is missing in the destination database, the update can be translated into an insert (called a "fallback insert").

    For performance-sensitive tasks like a bulk load of large tables, it is advisable to disable the triggers. Having said this, we routinely bulk-load 200,000 rows into tables on an Oracle RAC without noticing a performance problem because of the triggers. The same production system also loads over 7 million rows of transactional data per day. After SymmetricDS was installed on the system, there was not an increase in utilization or degradation in performance that we could detect. To address this question in the future, I will try to quantity the overhead by posting benchmarks on our website. Thank you for the excellent question.
    OK, I understand. I was thinking of some comparison between the trigger approach and the redo/undo log replication approach (like shareplex, oracle dataguard) where the latter theoretically places a much lower load on the source database side, as well as puts much of the replication overhead outside the scope of the original transaction.
  12. Some Questions[ Go to top ]

    This project looks really promising, we are in desperate need of two way replication. I have some questions though: 1) Is Postgresql really on the roadmap? You mention it in your response to this forum, but its not on the roadmap on the site. 2) If a client can push and pull data to the server, what is the purpose of labeling them server/client. I would think that if you have a cluster of databases all keeping data in sync you would just label every machine as a "node". I am really not concerned about the nomenclature, as much as I am interested in conceptually what I am missing. Why designate one machine the server? Thank you, Jacob
  13. Re: Some Questions[ Go to top ]

    Yes, we recently decided to add PostgreSQL because of the interest. I just added it to the roadmap on the site. Thanks for pointing that out. I agree that the client and server terms are used loosely, usually to describe who initiated the connection. In fact, an instance of SymmetricDS is called a node. We are primarily using SymmetricDS to sync subsets of data between retail stores and a central office server, so we might be throwing around "client" and "server" because of the configuration we use. But your configuration might sync all data in both directions between nodes.
  14. performance question[ Go to top ]

    Hi, after reading the documentation of SymmetricDS a performance question came to my mind. I'm wondering how do you know which data have already been replicated to a particular node and which haven't. I got the impression that for every changed record in master database you create one data_event record per client node. If I have 1000 client nodes and I change a single record in master database, 1000 data_event records would be created in master db. Am I right? Isn't it a performance issue? Thanks, Petr Matejka
  15. Re: performance question[ Go to top ]

    Yes, there will be some overhead. As you probably noticed, the row data is only inserted once. The data_event table is as sparse as we could make it: data_id, event_id, and batch_id. We had to make some sacrifices in order to guarantee delivery, similar to the way a durable JMS message would work. The performance overhead is exactly what you would expect. If you are doing a bulk load then the triggers do allow you to specify a condition under which it doesn’t insert. The Oracle and MySQL dialects also support the setting of a database session variable on which the trigger logic would not be executed.
  16. Re: performance question[ Go to top ]

    Yes, there will be some overhead. As you probably noticed, the row data is only inserted once. The data_event table is as sparse as we could make it: data_id, event_id, and batch_id. We had to make some sacrifices in order to guarantee delivery, similar to the way a durable JMS message would work.
    So, if the transaction generated one insert and I want this replicated to 10 nodes, then this will generate 11 additional inserts, all within the scope of the actual business transaction, while the client is sitting there waiting for it to complete? And, isnt is actually quite possible, even reasonable, to implement guaranteed delivery without this overhead?
  17. Re: performance question[ Go to top ]

    In your example, one insert would cause the trigger to insert 1 Data and 10 DataEvents. The DataEvents are small and relate a Data to a Node so it can be synchronized and tracked through the system. The configuration allows the user to specify an expression on the row in order to select the Nodes that will receive the Data. In the trigger, the old and new values are already available, so it's a natural place to run the expression and generate DataEvents. The model needs the DataEvent in order to route Data to nodes, and it needs the Batch to keep track of statistics and acknowledgments for the data sent. The model supports our features and makes troubleshooting easy. It's possible we could move this overhead out of the trigger and into the OutgoingBatchService instead. There would be more overall processing, but less associated with the online transactions that originally modified data. Another possibility is an option to perform just replication for those looking for higher performance but limited features. This is definitely something I want to think about more for a future release. I appreciate your questions and challenges to help us improve the software.
  18. Re: performance question[ Go to top ]

    Another possibility is an option to perform just replication for those looking for higher performance but limited features. This is definitely something I want to think about more for a future release.
    OK, sounds good. The typical scenarios for me is either a heavily loaded OLTP application that needs to replicate changes to a report database, or a set of regional databases that needs to replicate changes to a central database. The trigger approach might be OK for the second scenario, but it is pretty much impossible to use for the first. Thanks for your answers.