Things they don't tell you about MongoDB

Discussions

News: Things they don't tell you about MongoDB

  1. MongoDB is by far the most popular NoSQL database in Brazil (at least based on the amount of blog posts and articles writen about it here that I read). It’s really an amazing solution but what really bothers me is the fact that very few people know about it’s limitations. So I see the same story repeating itself: people unhappy with it treating his limitations as if they were bugs.

    This post is about some of it’s limitations that really caught me by surprise, so that if you are thinking in adopting it at least you’ll be warned about them and so avoid these headaches.

    Hungry for bytes

    This was my first surprise: MongoDB consumes too much disk space. This is related to the way it is coded to avoid issues with disk fragmentation by pre-allocating space in large files. Here is how it works: when you create a database will be created a file named [db name].0 and it will have (by default) 64 Mb of size. When more than half of this file is used, another one will be created named [db name].1 with 128 Mb. This will happen again and again, so that files with 256, 512, 1024 and finally 2048 Mb will be written on your disk. All subsequent files will have 2048 Mb size.

    If storage space is a restriction to your project you MUST take this in consideration. There’s a commercial solution for this problem: it’s named TokuMX. I didn’t tried it, but if it really works, so the storage consumption will decrease 90%. Also the repairDatabaseand compact commands can help you on the long run.

    More on http://www.itexto.com.br/devkico/en/?p=44

     

    Threaded Messages (21)

  2. It loses data.

    It doesn't scale.

    What else do you need to know?

    Peace,

    Cameron.

    For  the sake of full disclosure, I work at Oracle. The opinions and views  expressed in this post are my own, and do not necessarily reflect  the opinions or views of my employer.

  3. Sounds scary.

    Can you elaborate or provide links that explain in details?

     

  4. (Wow, do I ever hate the forum software here that allowed me to lose 15 minutes of writing .. let me try again ..)

    It's pretty simple to find tons of information about how MongoDB loses data. While I've read many of these accounts over the years, I don't keep track, so I just use Google to find them easily ;-)

    From http://aphyr.com/posts/284-call-me-maybe-mongodb

    clients for MongoDB didn't bother to check whether or not their writes succeeded, by default: they just sent them and assumed everything went fine. This goes about as well as you'd expect. [..] 42% write loss.

    From http://www.infoq.com/news/2011/11/MongoDB-Criticism

    The operational complexity of configuring a MongoDB cluster is daunting with each component bringing its own caveats ..

    Unfortunately, mongos is complete garbage. Under load, it crashed anywhere from every few hours to every few days. Restart supervision didn't always help b/c sometimes it would throw some assertion that would bail out a critical thread, but the process would stay running. Double fail.

    It got so bad the only usable way we found to run mongos was to run haproxy in front of dozens of mongos instances, and to have a job that slowly rotated through them and killed them to keep fresh/live ones in the pool. No joke.

    From http://pastebin.com/FD3xe6Jt

    Don't use MongoDB================= I've kept quiet for awhile for various political reasons, but I nowfeel a kind of social responsibility to deter people from bankingtheir business on MongoDB. Our team did serious load on MongoDB on a large (10s of millionsof users, high profile company) userbase, expecting, from early goodexperiences, that the long-term scalability benefits touted by 10genwould pan out.  We were wrong, and this rant serves to deter youfrom believing those benefits and making the same mistakewe did.  If one person avoid the trap, it will have beenworth writing.  Hopefully, many more do. Note that, in our experiences with 10gen, they were nearly alwayshelpful and cordial, and often extremely so.  But at the sametime, that cannot be reason alone to supress information aboutthe failings of their product.

    From https://groups.google.com/forum/#!topic/mongodb-user/k82ULvZBpkE

    We use mongo as database of our system. But found that some times,
    It will "drop some data" unexpectedly. We had make sure that there are
    even no "remove" functions called in our code. (We are using C++
    interface of mongo. DBClientConnection).  but occasionally, some
    records just missed.

    From http://diegobasch.com/ill-give-mongodb-another-try-in-ten-years

    I’ll Give MongoDB Another Try. In Ten Years.

    I was happily inserting documents into a MongoDB server on my puny AWS Micro instance somewhere in Oregon. It worked just fine for all of three weeks.

    Yesterday I decided to compute some stats, and I discovered that the most recent document was four days old. Hmmm. I checked my script that fetches and inserts documents; it was running and there were no errors in the log. MongoDB seemed fine too. What could be the problem? Long story short, this blog post from three years ago explains it:

    32-bit MongoDB processes are limited to about 2 gb of data.  This has come as a surprise to a lot of people who are used to not having to worry about that.  The reason for this is that the MongoDB storage engine uses memory-mapped files for performance.

    By not supporting more than 2gb on 32-bit, we’ve been able to keep our code much simpler and cleaner.  This greatly reduces the number of bugs, and reduces the time that we need to release a 1.0 product. The world is moving toward all 64-bit very quickly.  Right now there aren’t too many people for whom 64-bit is a problem, and in the long term, we think this will be a non-issue.

    Sure enough, my database had reached 2GB in size and the inserts started failing silently. WTF zomg LOL zombie sandwiches!

    This is a horrendous design flaw for a piece of software that calls itself a database. From the Zen of Python:

    Errors should never pass silently. Unless explicitly silenced.

    From http://cloudcomments.net/2011/11/07/mongodb-against-the-ropes/

    The basis of the trouble is that MongoDB, under certain load conditions, has a tendency to fall over and, crucially, lose data. That begs the question about the quality of the code, the involvement of 10gen and whether or not things will improve over time, or at all.

    From http://www.borntosegfault.com/2013/03/is-mongodb-still-on-course.html

    Jira hall of fame

    After 3 or 4 years, have a look of the different project's Jiras. I picked up 3 classical MongoDB bugs (subjective point of view), still open and marked as blocking/major bugs :
    • Broken indexes [SERVER-8336]
    • A new server added to a Replica Set Fails to start Replication [SERVER-8023]
    • Socket exception [SEND_ERROR] on Mongo Sharding [SERVER-7008]
    [..] Obviously, every solution has its own (stupid) bugs lot. Some of them will occur under very specific conditions and thus become very hard to fix: full moon, Dvorak keyboard layout or exotic linux kernel (THIS looks crazy, isn't it?). But getting a still open "broken indexes" ticket in 2013 is clearly a youth issue for such a project.

    From http://hackingdistributed.com/2013/01/29/mongo-ft/

    Broken by Design: MongoDB Fault Tolerance

    So, let's imagine that you're building an Instagram clone that you'll eventually sell to Facebook for $1B, and see if we can accomplish that with MongoDB. Your site probably has a front-end web server where users upload data, like putting up new pictures or sending messages or whatever else. You can't really drop this data on the ground -- if you could lose data without consequences, you would just do precisely that all the time, skip having to configure MongoDB, and read all those dev blogs. And if you possess the least bit of engineering pride, you'll want to do an honest stab at storing this data properly. So you need to store the data, shown in yellow, in the database in a fault-tolerant fashion, or else face a lot of angry users. Luckily for you, MongoDB can be configured to perform replication. Let's assume that you indeed configured MongoDB correctly to perform such replication. So, here's the $1 billion dollar question: what does it mean when MongoDB says that a write (aka insert) is complete?

    The answer is none of the above. MongoDB v2.0 will consider a write to be complete, done, finito as soon as it has been buffered in the outgoing socket buffer of the client host. Read that sentence over again.

    From http://hackingdistributed.com/2013/02/07/10gen-response/

    Anyhow, if they're happy with their performance, who am I to complain? It's weird, though, to give up consistency and fault-tolerance for performance, but then to fail at achieving performance as well. If you're going to sell your soul and your data integrity to the devil in exchange for speed, well, make sure he delivers, or else it'll look really bad.

    [..]

    [MongoDB's response] says "we give you what you get." I guess if the system loses data, it's always the developer's fault for not understanding the internal workings of Mongo.

     

    From http://www.kchodorow.com/blog/2010/09/17/choose-your-own-adventure-mongodb-crash-recovery-edition/

    Choose your own adventure: MongoDB crash recovery edition

    If you have a single instance that shut down uncleanly, you may lose data! Use this as a painful learning experience ..

    From https://blog.serverdensity.com/notes-from-a-production-mongodb-deployment/

    There is no full single server durability in MongoDB.

    You can use master/slave replication (ideally with the slave server in a different data centre) but if there’s a failure on the master you need to failover manually.

    From http://forum.spring.io/forum/spring-projects/data/nosql/110470-mongodb-replication-set-facing-some-issues-like-loss-of-data

    Data Loss

    While the data is being written to MongoDB I kill the primary node so
    that secondary node becomes primary.After few seconds I bring back the
    old primary up and now the primary becomes secondary because the old
    secondary had become primary. After all the inserts are done when I
    check the count of messages I find some messages are missing.

    From http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb

    This week marks the one year anniversary of Kiip running MongoDB in production. As of this week, we’ve also moved over 95% of our data off of MongoDB onto systems such as Riak and PostgreSQL, depending which solution made sense for the way we use our data. This post highlights our experience with MongoDB over the past year.

    Although MongoDB has a lot of nice features on the surface, most of them are marred by underlying architectural issues. These issues are certainly fixable, but currently limit the practical usage we were able to achieve with MongoDB.

    We never attempted to horizontally scale MongoDB since our confidence in the product was hurt by the time that was offered as a solution, and because we believe horizontally scaling shouldn’t be necessary for the relatively small amount of ops per second we were sending to MongoDB. Over the past 6 months, we’ve “scaled” MongoDB by moving data off of it. ..

    And so on.

    Peace,

    Cameron.

     

     

  5. Don't be a tool[ Go to top ]

    (Wow, do I ever hate the forum software here that allowed me to lose 15 minutes of writing .. let me try again ..)

    It's pretty simple to find tons of information about how MongoDB loses data. While I've read many of these accounts over the years, I don't keep track, so I just use Google to find them easily ;-)

    From http://aphyr.com/posts/284-call-me-maybe-mongodb

    clients for MongoDB didn't bother to check whether or not their writes succeeded, by default: they just sent them and assumed everything went fine. This goes about as well as you'd expect. [..] 42% write loss.

    From http://www.infoq.com/news/2011/11/MongoDB-Criticism

    The operational complexity of configuring a MongoDB cluster is daunting with each component bringing its own caveats ..

    Unfortunately, mongos is complete garbage. Under load, it crashed anywhere from every few hours to every few days. Restart supervision didn't always help b/c sometimes it would throw some assertion that would bail out a critical thread, but the process would stay running. Double fail.

    It got so bad the only usable way we found to run mongos was to run haproxy in front of dozens of mongos instances, and to have a job that slowly rotated through them and killed them to keep fresh/live ones in the pool. No joke.

    From http://pastebin.com/FD3xe6Jt

    Don't use MongoDB================= I've kept quiet for awhile for various political reasons, but I nowfeel a kind of social responsibility to deter people from bankingtheir business on MongoDB. Our team did serious load on MongoDB on a large (10s of millionsof users, high profile company) userbase, expecting, from early goodexperiences, that the long-term scalability benefits touted by 10genwould pan out.  We were wrong, and this rant serves to deter youfrom believing those benefits and making the same mistakewe did.  If one person avoid the trap, it will have beenworth writing.  Hopefully, many more do. Note that, in our experiences with 10gen, they were nearly alwayshelpful and cordial, and often extremely so.  But at the sametime, that cannot be reason alone to supress information aboutthe failings of their product.

    From https://groups.google.com/forum/#!topic/mongodb-user/k82ULvZBpkE

    We use mongo as database of our system. But found that some times,
    It will "drop some data" unexpectedly. We had make sure that there are
    even no "remove" functions called in our code. (We are using C++
    interface of mongo. DBClientConnection).  but occasionally, some
    records just missed.

    From http://diegobasch.com/ill-give-mongodb-another-try-in-ten-years

    I’ll Give MongoDB Another Try. In Ten Years.

    I was happily inserting documents into a MongoDB server on my puny AWS Micro instance somewhere in Oregon. It worked just fine for all of three weeks.

    Yesterday I decided to compute some stats, and I discovered that the most recent document was four days old. Hmmm. I checked my script that fetches and inserts documents; it was running and there were no errors in the log. MongoDB seemed fine too. What could be the problem? Long story short, this blog post from three years ago explains it:

    32-bit MongoDB processes are limited to about 2 gb of data.  This has come as a surprise to a lot of people who are used to not having to worry about that.  The reason for this is that the MongoDB storage engine uses memory-mapped files for performance.

    By not supporting more than 2gb on 32-bit, we’ve been able to keep our code much simpler and cleaner.  This greatly reduces the number of bugs, and reduces the time that we need to release a 1.0 product. The world is moving toward all 64-bit very quickly.  Right now there aren’t too many people for whom 64-bit is a problem, and in the long term, we think this will be a non-issue.

    Sure enough, my database had reached 2GB in size and the inserts started failing silently. WTF zomg LOL zombie sandwiches!

    This is a horrendous design flaw for a piece of software that calls itself a database. From the Zen of Python:

    Errors should never pass silently. Unless explicitly silenced.

    From http://cloudcomments.net/2011/11/07/mongodb-against-the-ropes/

    The basis of the trouble is that MongoDB, under certain load conditions, has a tendency to fall over and, crucially, lose data. That begs the question about the quality of the code, the involvement of 10gen and whether or not things will improve over time, or at all.

    From http://www.borntosegfault.com/2013/03/is-mongodb-still-on-course.html

    Jira hall of fame

    After 3 or 4 years, have a look of the different project's Jiras. I picked up 3 classical MongoDB bugs (subjective point of view), still open and marked as blocking/major bugs :
    • Broken indexes [SERVER-8336]
    • A new server added to a Replica Set Fails to start Replication [SERVER-8023]
    • Socket exception [SEND_ERROR] on Mongo Sharding [SERVER-7008]
    [..] Obviously, every solution has its own (stupid) bugs lot. Some of them will occur under very specific conditions and thus become very hard to fix: full moon, Dvorak keyboard layout or exotic linux kernel (THIS looks crazy, isn't it?). But getting a still open "broken indexes" ticket in 2013 is clearly a youth issue for such a project.

    From http://hackingdistributed.com/2013/01/29/mongo-ft/

    Broken by Design: MongoDB Fault Tolerance

    So, let's imagine that you're building an Instagram clone that you'll eventually sell to Facebook for $1B, and see if we can accomplish that with MongoDB. Your site probably has a front-end web server where users upload data, like putting up new pictures or sending messages or whatever else. You can't really drop this data on the ground -- if you could lose data without consequences, you would just do precisely that all the time, skip having to configure MongoDB, and read all those dev blogs. And if you possess the least bit of engineering pride, you'll want to do an honest stab at storing this data properly. So you need to store the data, shown in yellow, in the database in a fault-tolerant fashion, or else face a lot of angry users. Luckily for you, MongoDB can be configured to perform replication. Let's assume that you indeed configured MongoDB correctly to perform such replication. So, here's the $1 billion dollar question: what does it mean when MongoDB says that a write (aka insert) is complete?

    The answer is none of the above. MongoDB v2.0 will consider a write to be complete, done, finito as soon as it has been buffered in the outgoing socket buffer of the client host. Read that sentence over again.

    From http://hackingdistributed.com/2013/02/07/10gen-response/

    Anyhow, if they're happy with their performance, who am I to complain? It's weird, though, to give up consistency and fault-tolerance for performance, but then to fail at achieving performance as well. If you're going to sell your soul and your data integrity to the devil in exchange for speed, well, make sure he delivers, or else it'll look really bad.

    [..]

    [MongoDB's response] says "we give you what you get." I guess if the system loses data, it's always the developer's fault for not understanding the internal workings of Mongo.

     

    From http://www.kchodorow.com/blog/2010/09/17/choose-your-own-adventure-mongodb-crash-recovery-edition/

    Choose your own adventure: MongoDB crash recovery edition

    If you have a single instance that shut down uncleanly, you may lose data! Use this as a painful learning experience ..

    From https://blog.serverdensity.com/notes-from-a-production-mongodb-deployment/

    There is no full single server durability in MongoDB.

    You can use master/slave replication (ideally with the slave server in a different data centre) but if there’s a failure on the master you need to failover manually.

    From http://forum.spring.io/forum/spring-projects/data/nosql/110470-mongodb-replication-set-facing-some-issues-like-loss-of-data

    Data Loss

    While the data is being written to MongoDB I kill the primary node so
    that secondary node becomes primary.After few seconds I bring back the
    old primary up and now the primary becomes secondary because the old
    secondary had become primary. After all the inserts are done when I
    check the count of messages I find some messages are missing.

    From http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb

    This week marks the one year anniversary of Kiip running MongoDB in production. As of this week, we’ve also moved over 95% of our data off of MongoDB onto systems such as Riak and PostgreSQL, depending which solution made sense for the way we use our data. This post highlights our experience with MongoDB over the past year.

    Although MongoDB has a lot of nice features on the surface, most of them are marred by underlying architectural issues. These issues are certainly fixable, but currently limit the practical usage we were able to achieve with MongoDB.

    We never attempted to horizontally scale MongoDB since our confidence in the product was hurt by the time that was offered as a solution, and because we believe horizontally scaling shouldn’t be necessary for the relatively small amount of ops per second we were sending to MongoDB. Over the past 6 months, we’ve “scaled” MongoDB by moving data off of it. ..

    And so on.

    Peace,

    Cameron.

     

     

    I just googled "oracle losing data". 24.3 million hits.

    Then I googled "mongdb losing data" and only 65,000 hits.

    SO CLEARLY MONGO IS MORE RELIABLE. LOL.

    Seriously CP. Serioulsly. Still? The same schtick. This many years. The sad part is how far it got you.

    I admire your low IQ thoughtless drivel to personal wealth ratio. Look how far you got with so little content. 

     

    Then I googled "Coherence sucks"... 1.7 Million hits while "MongoDB" sucks only has 43,000 hits. 

     

    If you can't find a blog complaiing about some tech, then you are not looking very hard. 

     

    MongoDB has some warts. It fixed some. Some are being fixed. It is not right for every application. 

    Developes have to read the documents before using it but the same can be said for Oracle and Coherence. 

    I was in a meeting the other day. We wanted to use MySQL to stage some data. It is high-speed writes app. 

    A dev wanted to use MongoDB... I was like NOOOO! Then he said MySQL would not scale... I WAS LIKE ARGH!

     

    But you spilling this FUD does not help either. 

     

    There are many apps where MongoDB would work out fine. 

  6.  

    Allow me to use Cameron's briliant MongoDB analysis algorithim to compare Oracle, Coherence and MongoDB.

     

    I just googled "oracle losing data". 24.3 million hits.

    Then I googled "mongdb losing data" and only 65,000 hits.

    SO CLEARLY MONGO IS MORE RELIABLE. LOL.

    Seriously CP. Serioulsly. Still? The same schtick. This many years. The sad part is how far it got you.

    I admire your low IQ thoughtless drivel to personal wealth ratio. Look how far you got with so little content. 

     

    Then I googled "Coherence sucks"... 1.7 Million hits while "MongoDB" sucks only has 43,000 hits. 

     

    If you can't find a blog complaiing about some tech, then you are not looking very hard. 

     

    MongoDB has some warts. It fixed some. Some are being fixed. It is not right for every application. 

    Developes have to read the documents before using it but the same can be said for Oracle and Coherence. 

     

    I was in a meeting the other day. We wanted to use MySQL to stage some data so QA could see the results for testing. It is high-speed writes app. 

    A dev wanted to use MongoDB... I was like NOOOO! Then he said MySQL would not scale... I WAS LIKE ARGH!

     

    But you spilling this FUD does not help either. 

     

    There are many apps where MongoDB would work out fine. 

     

  7. bullshit[ Go to top ]

    MongoDB does not loose data. And yes, mongoDB does scale. I understand the lies you as an Oracle drone have to spew or you could be fired by the mothership.  It is amazing at how scared Oracle is of little ole MongoDB.

  8. Nice FUD. Real quality coming from an Oracle "manager". Are revenues being hit by MongoDB uptake so much that they have people like you spouting this stuff as well? Or maybe its down to the amount of time you spend at that company and the corporate mantra takes over?

  9. Trying to change the subject?[ Go to top ]

    Hi Neil -

    Nice FUD. Real quality coming from an Oracle "manager". Are revenues being hit by MongoDB uptake so much that they have people like you spouting this stuff as well? Or maybe its down to the amount of time you spend at that company and the corporate mantra takes over?

    Why are you attacking me? Either what I say has technical merit, or it doesn't. Disagree with my technical points, if you'd like, but try to leave the ad hominem attacks where they belong: Congress. ;-)

    Peace,

    Cameron.

     

  10. mongo bongo[ Go to top ]

    I agree with Cameron, tons of user experience to back this up, don't know why the kids are so irate today.

  11. looses data ?[ Go to top ]

    having used it for last 2.5 years on a big data project, built or beta tested a lot of new drivers that were getting buit at the time... i have been pretty happy with performance, disk space and most importantly - not loosing my data. Infact i have not heard of loosing data anywhere online ever.... unless ofcourse its a programming/config issue. i would love to hear more details. the post seems to cross sell another product than telling the monGo problems.
  12. Hi Nathaniel -

    MongoDB does not loose data.

    MongoDB does lose data. Just because some users have not lost data yet does not mean that MongoDB does not lose data. MongoDB was simply not built to not lose data. Even calling it a "database" is an insult to all the good database software out there that was designed explicitly to not lose data -- and many of those databases are also free and open source!

    In addition to the dozen links above, see: http://inthecloud247.com/2013/02/17/mongodb-still-devnull-webscale/

    Little things like:

    • "The data never left the front-end box, no packet was sent on the network, and yet MongoDB believes it to be safely committed for posterity."
    • "MongoDB drivers now internally set WriteConcern to 1 instead of 0 by default. While this is a change for the better, it is still not sufficient. To wit: This change does not make MongoDB fault-tolerant. A single failure can still lead to data loss."
    • "MongoDB can lose data in many startling ways"
    • "MongoDB actually once deleted the entire dataset"
    • "The real problem is that so many of these problems existed in the first place. Database developers must be held to a higher standard than your average developer. Namely, your priority list should typically be something like: 1. Don’t lose data, be very deterministic with data, 2. Employ practices to stay available, [..] 5. Raw req/s per resource.10gen’s order seems to be, #5, then everything else in some order. #1 ain’t in the top 3."

    So don't tell me that MongoDB does not lose data. In the face of overwhelming evidence, you cannot claim that MongoDB does not lose data. MongoDB loses data. End of story.

    And yes, mongoDB does scale.

    I should have clarified my comments. MongoDB can obviously scale local (i.e. inconsistent) reads -- what most people would refer to as cached replicated data; what it cannot scale are consistent or durable writes. MongoDB is not architected to scale, and it struggles under scale. Again, see the links above.

    So why don't more people know about this problem? Because almost no one runs more than one node of MongoDB. That's right -- 99%+ of MongoDB installations are a single node -- which (as you can see from the links above) is a very bad idea unless you want to lose data!

    I understand the lies you as an Oracle drone have to spew or you could be fired by the mothership.  It is amazing at how scared Oracle is of little ole MongoDB.

    I don't even work in the database group, and I have absolutely no fear of being fired. Living in fear is an awful thing (which is why people should use a reliable database .. ;-)

    Just to give you some context, I work primarily with our Enterprise Java products. We've even prototyped JPA support for MongoDB (see EclipseLink, which is open source.)

    MongoDB doesn't suck because I work at Oracle. MongoDB sucks because it loses data and doesn't scale.

    Peace,

    Cameron.

  13. In fact all of limitations are documented and there is no conspiracy e.g. exclusive DB level locking for write operations.

    This stuff simply cannot scale on regular desktop machine and I would not expect it to scale on cluster.

    It would  not  compare  it to high end databases but  there are many mature open source databases like  PostgreSQL,

    I have no idea about  PostgreSQL scaling on cluster but newest versions at least scale on one  machine.

    Edited by: Cameron McKenzie on Nov 5, 2013 11:21 AM
  14. ?[ Go to top ]

    Cameron, alls you can do is point to some random guys log who obviously has an axe to grind with 10Gen?  Color me unimpressed.  

    In my experience with Oracle RAC, we have randomly lost data to database crashes with the "unbreakable RAC" on several occasions.  Or the problems we have with dirty reads, or more exact, null reads on RAC because the servers are not in sync. Or the common JDBC connection pool crashes on RAC which cause our applications to die.  I can go on and on about the buggy RAC software and the outrageously overpriced garbage Oracle database software.  But i don't as we are moving off Oracle onto SQL Server which not only performs better in our testing, but runs much more reliably.

    We have also done some performance testing and prototyping on a MongoDB cluster and have had no issues with data loss or non scalabiliy.  Perhaps you should do some of your  own testing before citing some clueless web blog.  I would not expect much more from an Oracle drone.

  15. this is serious ...[ Go to top ]

    Mr. Peace Purdy

        these are some serious observations. curious - when you developed the jpa support, did you see these issues yourself ?

    i m now going to check back with the vendor too. its astonishing.

    thanks TSS

     

  16. this is serious ...[ Go to top ]

    Shawn -

    No, I haven't personally run into the types of issues with data loss that I pointed out, because I haven't operated any production systems actually built on MongoDB. What I did run into were results from performance testing on MongoDB that didn't make any sense, which is why I started digging in initially to figure out what MongoDB was doing (or more correctly, as it turned out, what it wasn't doing.)

    I'm just surprised that so many people are willing to overlook such obvious flaws, when there are so many good options out there.

    Peace,

    Cameron.

  17. this is serious ...[ Go to top ]

    I'm just surprised that so many people are willing to overlook such obvious flaws, when there are so many good options out there.

    Peace,

    Cameron.

    Any chance you'd share some of these other "good" options. I have encountered the error issues you speak of with MongoDB but to be fair most of the "eventually consistent" architectures seem to have problems that are being bandied about in this thread. CAP theorem is beautiful in theory but in the real world ...

  18. this is serious ...[ Go to top ]

    All of productions systems lose data or can potentially lose it and it is important to have backup strategy but high availability contradicts scalability.

    MongoDB has optional journalizing and it can block before to flush log and  should be possible to recover DB if journalizing is enabled  but it won't scale  better than any "elephant DB"  with journalizing.

    I guess all of  disappointed users expect both: scalability and high availability  at the same time. It is non-trivial and it should be very expensive with any DB.

     

  19. WTF?[ Go to top ]

    I don't understand why people are attacking Cameron on a personal level. He has shown that he is competent and doesn't play silly spoilt child games.

    No, I don't work at Oracle or for Oracle, but I've known Cameron for a while, before he was Oracle, which isn't the case of those who are attacking him on a personal level.

    You have a case to prove, take it on the technical level, otherwise back off.

  20. WTF?[ Go to top ]

    Cedric -

    I'm afraid it's human nature. I think we all get defensive about the things we like, and look to discredit those who would undermine our choices. (I know that I personally suffer from this, even if I try not to.)

    So I guess I'd say that I've learned not to take it personally. Besides, I think my posts can be a bit rude sometimes (like my original and very terse comment on this thread), so it's good that people held me to account and forced me to actually back up my claims.

    And who knows? Maybe MongoDB finds a way to fix these problems and ends up much better? It would be a good ending.

    I think what always bothers me is when companies avoid the hard challenges (like making working products) and focus on the marketing, the bells and whistles, etc. And while MongoDB hasn't done a great job on making the product internals solid, they have done a nice job on APIs and usability (both of which are important), so I do have respect for those aspects of the product.

    Peace,

    Cameron.


  21.  If you pick a DB that is geared for high-speed reads and low writes, and try to scale it out to do high-speed writes, you will hurt yourself. At a minimum you will spend a lot more money on hardware. Last I checked MongoDB uses MMap and MMap uses virtual memory, which gets paged in/out by the OS in the most efficient manner possible, which is actually quite good, but.... If you store a giant BTree in memory and then ask the OS to page it in and out of memory to HD in small pieces by nature this does a lot of disk seeking. It also by nature is going to get behind, and the longer it gets behind the more data loss you will have if you use a single machine. This is why the last five versions of MongoDB come with journaling turned on by default because if you are running MongoDB on a single machine, you have to periodically flush or you have a high probability of data loss in an outage due the hysteresis of the eventual consistency model. If you replicate to at least two machines, you can reduce the data loss (also if you use the it-must-exist-on-the-other-box-option of data safety that comes with MongoDB).

    Data safety is an application requirement. It has a cost.  Less = faster. More = less fast (or more expensive to go fast). You have to balance that cost with how safe you want the data. Can you afford a few virtual tractor sales getting lost on Farmville then dial back the data safety a bit? Security is a continuum like data safety. It is more secure to keep your servers in a closet with no Internet access. See.... Clearly since it is more secure to do it this way... all apps must be done with servers locked in a closet with no Internet. My point is that the each app has a different tolerance for security as it does data safety and availability. There is not a one size fits all solution because each problem has its own requirements. Hell banks don't use distributed two phase commits not because they don't want consistency but because they need some level of performance and it is easier to manually correct the 1 in 10,000,000 times something goes wrong than buy and engineer the hardware to handle a two phase commit at the scale they need. Engineering tradeoffs. Period. Most of the early errors with MongoDB were operator error. Know your tools.

    Disks are still slow at seeking after all of these years. Disks are really fast at reading / writing long sequences (300MB per second, higher if you use RAID 0 20 disk machine) but bad at seeking around the disk for small bits. This is why Cassandra and Couchbase are so good at high-speed writes, and MongoDB not so great (yet... maybe that have improved... it is hard to keep up). You can speed up MongoDB write by sharding, but that is not a free lunch (got to pick a good shard key, setup is a lot harder, etc.). MySQL and Oracle can be setup to be very fast at high-speed writes. If an app has a lot of writes vs. reads, I shy away from MongoDB. 

    LevelDB is good at high-speed writes. It is more or less a write only database that uses bloom filters (is my data in the gak), perfectly balanced BTrees written into gak blocks and the equiv. of GCish for perfectly balanced long sequences of gak (gak gets compressed and filtered into other longer sequences of gak, it never gets updated, just consolidated and deleted/merged into large sequences of gak, this allow it to prune / consolidate / merge quickly -- same as google tablets). LevelDB is good at high speed writes, but mediocre at high-speed reads (not bad, but not great). This is why for example Apollo uses LevelDB to store messages.

    How can you mitigate MongoDBs disk seeking weakness? Use SSD would help so would shards. SSD have no issues seeking. Shards spread the writes across many machines but is complex to setup. Still though are you sure MongoDB is right for this app? Have you compared / contrasted the way it works to CouchBase, Cassandra, MySQL, Oracle, LevelDB? Do you know how much data you are going to handle? Do you know how many write per second and at what size? Have you benchmarked it? Have you done load testing? When you deploy it, are you monitoring it? How far behind is the eventual consistency?

    If it gets too far behind, what is your mitigation plan? At some level of writes, all DBs become hard to deal with MySQL, Oracle, NoSQL, NewSQL, etc. There is no free lunch and magic web scale sauce... there exists physical limitation of hardware. Physics will eventually stop you. Oracle DB can and does handle petabytes of data so NoSQL fan boys step off. You don't have to use a fully normalized RDBMS, you can skip some normalization, and setup really high internal caching, this would get you close to NoSQL ease of use with some very tried and true products. MySQL is actually pretty damn amazing tech as is Oracle DB. MongoDB has its place too. :) LevelDB, MariaDB, Drizzle, etc. they don't get enough attention. MongoDB and Redis get all of the NoSQL glory. They are not the only two NoSQL solutions by a long shot. MySQL is solid and fast. 

     

    There is no magic in the world. If you replicate the data, then you have a better chance of it not getting lost data.

     

    Here is a point to take home. If you don't fsync on every call, you could lose data. If you do fsync on every call, you will not scale! (For Java devs, outputstream.flush()). If you use a single machine, have a DBA who is monitoring things from time to time, and doing backups, it is probably ok for a lot of apps. If you can't afford downtime or data loss, you need replication. As reliable as disks are, they fail. You can mitigate failure with RAID and hot swap-able backups. All replication has failure edge cases. But even disks die so you have RAID, Replicated data, Fiber, highspeed NAS, etc. etc. etc. If you have the money and an IQ above 85, it can run like a bat out of hell and be very fast... but nothing is free. There is no magic web scale sauce.

    How many RDBMS projects I have been on where someone says, why is this listing so slow. How many records are in the table? 10,000,000. Ok what are you sorting on? Last name? I see. What indexes are on this table? BLANK STARE? PUNCH ME IN THE NUTS! F! F! F! Really? Do I go on TSS and say how slow SQL is? No! Knowing the tool is the developer's job! Writing a tool that is fool proof is impossible. 


    The same things they say about MongoDB, 20 years ago the MAINFRAME DB guys were saying about Oracle DB. I think Mongo's success has more to do with marketing and ease-of-use than technical merit, but this does not mean it does not have a place or some merit. The same could be said about a lot of companies. Cough! cough!

    MongoDB has some warts. It fixed some. Some are being fixed. It is not right for every application.  Developers have to read the documents before using it but the same can be said for Oracle DB and Coherence. 
    I was in a meeting the other day. We wanted to use MySQL to stage some data so QA could see the results for testing. It is high-speed writes app.  A dev wanted to use MongoDB... I was like NOOOO! Then he said MySQL would not scale... I WAS LIKE ARGH! Sometimes, I want to punch myself in the nuts when I hear people regurgitate crap they read.


    But you spilling this FUD does not help either.  There are many apps where MongoDB would work out fine.  There are many where MySQL, Oracle or Cassandra would be a much better choice. 

     

    My point, CP, if you can't find a blog complaining about some tech, then you are not looking very hard. As a VP or Oracle, I expect you to not openly spread FUD about competitors. For Christ-sakes, that is why you have a marketing department my friend. At your level, they are never suppose to see who fired the shots. Think Stealth. You are not hawking the world's most expensive HashMap anymore. You are in the big leagues now so shut-up and let your minions do the dirty work. How are your sailing skills? You need to focus! How is the cut of your jib? These are things you need to think about.

     

    http://rick-hightower.blogspot.com/2013/11/20-years-of-fud-cameron-purdy-still.html

    --War

    --Rick Hightower

  22. Was I just "Rick Rolled"?[ Go to top ]

    Rick -

    That giant post of yours didn't really make any sense. Sorry.

    I can understand that you do or don't like something, although I can't tell which and what.

    This isn't about something or someone "losing data". It's about how a product wasn't designed to not lose data. There is a logical difference.

    Peace,

    Cameron.