Discussions

News: Microsoft .NET PetShop 3.x: Design Patterns and Architecture

  1. In my opinion it looks like the .NET PetShop is going the right steps by introducing a clean architecture and a database access layer.

    It will be interesting what you think about the caching of data. As I understood they cache the whole results of the business tier and the cache expires after 12 hours. Doesn't that mean for a performance test that all reads are just calls to the cache in memory?

    I am curious how the J2EE version will respond to this. Tangasol Coherence would be the right answer I think...but it seems to be a nice feature from the .NET framework to have the a Cache API included.

    .NET Caching Code:
    // Get the category from the query string
    string categoryKey =
    WebComponents.CleanString.InputText(Request["categoryId"],50);

    // Check to see if the contents are in the Data Cache
    if(Cache[categoryKey] != null){

       // If the data is already cached, then used the cached copy
       products.DataSource = (IList)Cache[categoryKey];
    } else {
    // If the data is not cached,
    // then create a new products object and request the data
       Product product = new Product();
       IList productsByCategory = product.GetProductsByCategory(categoryKey);
       
       // Store the results of the call in the Cache
    // and set the time out to 12 hours
    Cache.Add(categoryKey, productsByCategory, null, DateTime.Now.AddHours(12), Cache.NoSlidingExpiration , CacheItemPriority.High, null);
       
       products.DataSource = productsByCategory;
    }

    // Bind the data to the control
    products.DataBind();

    Read about the .NET PetShop Architecture

    Threaded Messages (57)

  2. Caching for hours is not a good practice.
    Caching data at the app layer is not a good practice.

    You should cache data at data layer or the model layer ( I use iBatis DAO) with auto flush (in case of update).

    View, Controller, and App layer just use data, and should not be aware of any cacheing or flushing.

    .V
  3. Caching for hours is not a good practice.

    > You should cache data at data layer or the model layer

    Completely agreed, Vic. They got greedy, and completely trashed their design to squeeze a few extra nanoseconds out of the benchmark.

    From the looks of the API, the Cache that is provided is not transctional (rollback will not restore the original state of the cache). That means it is a fancy Hashmap, significantly less powerful than what is provided by other solutions such as Hibernate, CMP Commit-Option A, Coherence, etc...

    So you don't have to wait for JCache to be able to do this. You already have many, more powerful options to choose from.
  4. Data caching[ Go to top ]

    12 hours of data caching makes sense for this type of application. It is indeed usual practice to keep prices constant during the business day (mistakes are obviously fixed asap, but they are hopefully the exception rather than the rule). For example, BarnesAndNoble.com and Overstock.com are providing data to JavaShelf.com (and other affiliates) once a day via FTP transfer.

    Also, you don't necessarily need a lot of memory to keep a catalog in cache as a whole. All depends on how large the catalog is. For example, JavaShelf.com does it with a 1.600+ items catalog on a 512 Mb RAM *non-dedicated* server. I would personally consider a PetShop with 1.600+ items a rather large pet shop...

    Bertrand Fontaine
    JavaShelf.com: Your Java bookstore on the Web!
  5. Data caching[ Go to top ]

    I dont think too many object to the caching idea, it's moore a question
    where and how the cashing should be done. As ppl already pointed out,
    caching of data should be done at .....(drum role), the data layer!
    I cant back this up with links to research but it seems obvious to mee.
  6. Data caching[ Go to top ]

    There are costs to caching at the data layer...network latency, etc. It is sometimes a very good idea to cache static or nearly static data at the app layer. App servers cache data for crying out loud.

    If MS came out with a cure for cancer you guys would still rip 'em.
  7. Data caching[ Go to top ]

    If MS came out with a cure for cancer you guys would still rip 'em.


    I think that cacheing at the app level is pretty smart, and probably everybody on thid board coded something like this at least once in their lifetime(regardless of what they said here). Au contraire, caching done by a data layer agnostic of the business rules can be dangerous.
  8. Data caching[ Go to top ]

    There are costs to caching at the data layer...network latency, etc. It is sometimes a very good idea to cache static or nearly static data at the app layer. App servers cache data for crying out loud.

    >
    > If MS came out with a cure for cancer you guys would still rip 'em.


    If M$ do something according to Computer science no one willrip them.

    It is easy to implement cache for a pet store with 200 items, but how this "design pattern", would be used in the real application.
    Cache keep things that are used frequently, and (according to the algorithm LRU or the other) dispose another, and updates itself and underlaying storage.
    Where information is kept and network latency comes later. It may be kept on the application level, that would not neccessary be on the other (remote) machine, rather in hte cluster, so network latency won't be of that much significance.

    Milan
  9. Relax[ Go to top ]

    There are costs to caching at the data layer...network latency, etc. It is sometimes a very good idea to cache static or nearly static data at the app layer. App servers cache data for crying out loud.

    >
    > If MS came out with a cure for cancer you guys would still rip 'em.

    Several people have presented valid reasons for caching at the data layer or the application layer. Both have their place. However, that is not what this code snippet is doing. The cache rules are being managed by a fine grained web component. I think we all can agree that that this is A Bad Idea that is just begging for errors. If I saw a java application where a Struts action was making cache decisions, I would say this is bad design, too.

    Just because a Microsoft application is being ripped by Java developers doesn't mean that it isn't deserved. We call 'em like we see 'em ;-)

    Ryan
  10. Relax[ Go to top ]

    "Several people have presented valid reasons for caching at the data layer or the application layer. Both have their place. However, that is not what this code snippet is doing."

    My argument was targeted at this statement:

    "caching of data should be done at .....(drum role), the data layer!"

    "The cache rules are being managed by a fine grained web component. I think we all can agree that that this is A Bad Idea that is just begging for errors."

    I agree.

    "Just because a Microsoft application is being ripped by Java developers doesn't mean that it isn't deserved. We call 'em like we see 'em ;-) "

    The .NET train is coming. Hop aboard before it's too late!
  11. The .NET train is coming. Hop aboard before it's too late!


    Oh even before the .NET train, there is a LINUX bullet train. If you don't move from the M$ track, you can get stomped over.

    Regards,
    Musaddique Husain Qazi
    http://www.the5ants.com/tech_news.html
  12. Microsoft cures cancer?[ Go to top ]

    Sartoris: If MS came out with a cure for cancer you guys would still rip 'em.

    From http://www.microsoft.com/press/gates-cancer-cure.aspx:

    Redmond, WA - 29 May 2003 - FOR IMMEDIATE RELEASE

    Speaking today at the Seattle Press Club Association's annual Search for the Cure banquet, Bill Gates announced Microsoft's new Cure.NET initiative. "What we have now is the ability to cure cancer using these inventions of high technology that the ordinary American has now in your living room," said the founder of Microsoft, standing in front of a giant screen with his trademark Blue Screen of Death demonstration. "While we have not yet decided how to license Cure.NET, we feel it is a valuable addition to our set of market-leading products."

    The US Department of Justice is considering action against Microsoft. "What we have here is the leveraging of the Windows monopoly to secure a dominant position for Microsoft in the cancer cure market. We are concerned that only Windows customers can be cured of cancer, and further that smaller cancer curing vendors will not have a chance to gain a toe-hold in this market should Microsoft bundle Cure.NET with Windows." An aide interrupted the announcement to provide the spokesperson with information that the current president has received his requested campaign donation from Microsoft. The spokesperson continued, "However, in light of the public's need for a simpler wizard-based cure for cancer, we are now dropping any any and all further litigation against Microsoft for the next four years."

    Speaking from the Palm Trees private golf course, Scott McNeally claimed that Sun is leading the drive to organize an open standards organization to organize an organization to provide organizations with organized choices to counter the new Microsoft product organization. Said Scott: "Microsoft is acting alone. They may have cured cancer, but do you want to buy your cancer cure from only a single provider? Sun is part of a 50 member consortion, the Standard Open All-Inclusive Peer-Based Cancer Alliance Consortium Committee, which will be announcing elections for board level members by mid-2004. Microsoft doesn't stand a chance. Fore!"

    Larry Ellison was unavailable for comment, due to prior commitments in the Vende Globe Race. However, an official Oracle spokesperson claimed that Oracle cured cancer 138 times as fast as Microsoft did, and the results were Unremissionable!

    To settle the question once and for all, The Middleware Company, Inc., a subsidiary of TheServerSide.com, a subsidiary of Precise Software Solutions, Inc., a subsidiary of Veritas, Inc., a subsidiary of The Middleware Company, Inc. will be conducting a bake-off between the various solutions. According to Salil Deshpande, President of The Middleware Company, Inc., a subsidiary of TheServerSide.com, a subsidiary of Precise Software Solutions, Inc., a subsidiary of Veritas, Inc., a subsidiary of The Middleware Company, Inc., "We will be measuring the speed at which these products can actually cure cancer, and comparing the Lines Of Colon (LOC) measurements to decide which solutions are the most complex. This is, of course, a completely independent study, and as such has been commissioned and paid for and hosted by Microsoft."

    [end transmission]
  13. Microsoft cures cancer?[ Go to top ]

    Good one!

    What about the folks at Mac? Designer pill bottles?????
  14. Microsoft cures cancer?[ Go to top ]

    :-)

    Pretty Prosy Parody, Purdy.


    Salil Deshpande
    The Middleware Company
  15. Data caching[ Go to top ]

    Well... It depends on what you mean by the data layer... But I'm not too convinced that caching should only be done at the data layer. By data layer do you mean the layer than contains Entity EJBs and DAOs? Cacheing at the app layer (closer to the client in the web tier say...) will also improve performance maybe even more so. i.e rather than making some (potentially) remote call to your data access layer (Entities behind a Session Bean facade). Rod Johnson Discusses this in his book. Any thoughts...?

    Cheers

    Smythe
  16. Data caching[ Go to top ]

    12 hours of data caching makes sense for this type of application. It is indeed usual practice to keep prices constant during the business day (mistakes are obviously fixed asap, but they are hopefully the exception rather than the rule). For example, BarnesAndNoble.com and Overstock.com are providing data to JavaShelf.com (and other affiliates) once a day via FTP transfer.


    I'm glad somebody pointed this out. How long an object is cached should depend on its stability. If a product catalog changes at scheduled intervals and caching the entire catalog is not taxing on the application, why not? Plus, if the dataset is particularly large, you can cache a subset of the data. Building on the catalog example again, you could cache the entire catalog by just caching key data - name, short description, price, etc. The rest of the data can be pulled at run time (or from a LRU cache?). Data such as long description, customer reviews, etc.

    That said, I found the .NET code awful. A web-layer component making caching decisions? Is every web component in this application that needs to extract data responsible for this? Quite an error prone design. If this was a hacked attempt to speed things up for a ridiculous benchmark, shame on them. If this is an example of good design, they failed. Either way, this code should be trashed.

    Ryan
  17. Data caching[ Go to top ]

    So, this is NOT a clustered cache? NOT a write trough cache? NOT a write behind cache? What kind of cache is this?

    Regards,
    Horia

    P.S. use JBoss instead. :)
  18. Data caching[ Go to top ]

    So, this is NOT a clustered cache? NOT a write trough cache? NOT a write behind cache? What kind of cache is this?

    This thing looks to me as just a simple hash table with a Timeout. You can implent that very easily using HashMap.
    What is the point of calling it a cache. I mean, it is a cache, but an effective cache should be transparent.
  19. Data caching[ Go to top ]

    So?
    If I don't need any of these features - and let's face it, lot's of small/middle business sites don't - I'm happy to use an extremely simple cache.
    Vlad
  20. lots of fish[ Go to top ]

    Bertrand: Also, you don't necessarily need a lot of memory to keep a catalog in cache as a whole. All depends on how large the catalog is. For example, JavaShelf.com does it with a 1.600+ items catalog on a 512 Mb RAM *non-dedicated* server.

    For very large caches, you can use any cache product that supports regions, or even easier use the Coherence distributed cache that automatically partitions the data across the cluster. So for an application running on a couple dozen cluster nodes caching half a gig each (in-proc or out-of-proc, on-heap or off-heap or disk-spooled), you can put support a 10GB+ cache rather easily.

    Bertrand: I would personally consider a PetShop with 1.600+ items a rather large pet shop...

    I don't know ... have you ever looked in the fish sections of these pet stores? ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  21. Data persistence at the app level is n
  22. Data persistence at the app level is not a good practice, but even a grommet coder like me knows that if the caching is regulated by business logic, there should be business layer code regulating that cache.

    While hard-coding twelve hours isn't the best move, you should look at the code (displayed on this page) before commenting with your J2EE Best Practice Expertise.

    In this case, they are caching the product catalog. You'd be a fool not to. And twelve hours is a likely period for catalog expiration. The only reasons not to cache the catalog would be if the catalog is not a commonly used part of the application, or if you couldn't easily fit it all in memory (in which case, you'd want to cache the frequently requested items, something that could best be determined, again, by business logic.)

    Maybe it's the former BOFH in me, but if you think it's a good idea to call up the DBA or Sysadmin whenever you want to update the catalog or tweak the cache, I think you're nuts.
  23. Caching for hours is not a good practice.

    I am afraid it is a bit generic statement. In one project (JSP) we used a DB twice per day being updated from the mainframe. So it was exactly 12 hours cache.


    >Caching data at the app layer is not a good practice.
    If you follow to the model (paradigm) where application layer, data layer etc. exist, right? :-)

    Dmitry Namiot
    Coldbeans
  24. me tarzan, caching good[ Go to top ]

    Vic: Caching for hours is not a good practice.

    Such a carte blance statement needs some explanation ... Sybase and Microsoft SQL Server cache certain things from the moment the RDBMS starts until the moment it shuts down. For some applications, that could be a very long time ;-)

    Similarly, if data is read-only, the data set is relatively small and is used often (or has a high cache hit ratio) then you'd be remiss not to cache it!

    Similarly, if data is read-mostly and you can guarantee that the cached data is up-to-date, then you'd be remiss not to cache it!

    Similarly, if the data is write-intensive and can be write-coalesced, you would be remiss not to cache the writes if you could guarantee the safety of the data!

    Here's a quote from a customer of ours that just went live: "one of our integration servers was updated to use coherence .. a 10min job now takes 1-2s". (No surprise here: they are a referenceable customer.) Ask your end users the following question: "Would you rather wait 10 minutes or just 2 seconds for the same task to complete?"

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  25. me tarzan, caching good[ Go to top ]

    \Cameron\
    Similarly, if data is read-mostly and you can guarantee that the cached data is up-to-date, then you'd be remiss not to cache it!
    \Cameron\

    And if the locality of reference of data is poor?

    \Cameron\
    Similarly, if the data is write-intensive and can be write-coalesced, you would be remiss not to cache the writes if you could guarantee the safety of the data!
    \Cameron\

    Umm...by definition you can't normally guarantee the safety of cached-writes. You can get high throughput by queueing writes and checking their result asynchronously, but that's quite another beasty.

    \Cameron\
    Here's a quote from a customer of ours that just went live: "one of our integration servers was updated to use coherence .. a 10min job now takes 1-2s". (No surprise here: they are a referenceable customer.) Ask your end users the following question: "Would you rather wait 10 minutes or just 2 seconds for the same task to complete?"
    \Cameron\

    I would retort by saying that caching should not be applied blindly, but instead judiciously. The end result of blind caching is _always_ either out-of-memory or cache thrashing. I have a feeling you already know that, but it does need to be said that there are many cases caching can noticably degrade performance (or even cause unnecessary outright failures).

        -Mike
  26. you mike, caching good[ Go to top ]

    Hi Mike,

    Cameron: Similarly, if data is read-mostly and you can guarantee that the cached data is up-to-date, then you'd be remiss not to cache it!

    Mike: And if the locality of reference of data is poor?

    PMI but what does that mean? (There's too many terms in this industry for me to keep track of.) What I remember from my days designing the Pentium (just kidding) is that this has to do with the CPU pulling in a 64-byte line and hoping that the second access (the one that isn't predicted yet) will be in the same cache line, right?

    Cameron: Similarly, if the data is write-intensive and can be write-coalesced, you would be remiss not to cache the writes if you could guarantee the safety of the data!

    Mike: Umm...by definition you can't normally guarantee the safety of cached-writes. You can get high throughput by queueing writes and checking their result asynchronously, but that's quite another beasty.

    I meant write-behind, as in "the app changes the data 100 times, and an hour later, the data gets written to the database one time" write-behind. By setting the write-behind latency sufficiently high, and having a large enough cluster (to cache huge amounts of data with the distributed cache), you can reduce database reads by almost 100% and writes by well over 90%. That means reducing costs for high-scale applications, since the database (or God-forbid, the mainframe) is usually the most expensive piece of server in the datacenter. And when you cut the 32-CPU database server ($4MM) requirement down to a commodity 2-CPU database server ($40k), you save about 99% of the cost, or in the datacenter, you end up getting a lot more apps on the same shared database servers and mainframes and message buses, and the end user gets better response times.

    Cameron: Here's a quote from a customer of ours that just went live: "one of our integration servers was updated to use coherence .. a 10min job now takes 1-2s"...

    Mike: I would retort by saying that caching should not be applied blindly, but instead judiciously. The end result of blind caching is _always_ either out-of-memory or cache thrashing. I have a feeling you already know that, but it does need to be said that there are many cases caching can noticably degrade performance (or even cause unnecessary outright failures).

    I would not try to convince anyone to use caching for bad uses of it. That includes apps that run fine without it. That includes blindly caching. There are too many obviously good uses for it to try to apply it to everything ... unless it means winning a benchmark ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  27. you mike, caching good[ Go to top ]

    \Cameron\
    PMI but what does that mean? (There's too many terms in this industry for me to keep track of.) What I remember from my days designing the Pentium (just kidding) is that this has to do with the CPU pulling in a 64-byte line and hoping that the second access (the one that isn't predicted yet) will be in the same cache line, right?
    \Cameron\

    Sorry if I mixed and matched jargon - I have a hard time keeping it all in line at times.

    What I meant by "locality of reference data" is a data analogy to CPU code caches. In a nutshell - if your code jumps all around, your cache is going to end up flushing and re-filling itself constantly. AKA thrashing. Precisely the same thing can happen to a data cache. Assuming your cache size is somewhat smaller than your total data set (e.g. it doesn't all fit in memory), there are many algorithms and applications that can very, very easily lead to cache thrash. If they tend to sample pieces of data from many places in a way appears random to an LRU algorithm (or other cache-eviction policy), or if they app just happens to scan through alot of data, you can easily end up thrashing your cache. And that's always worse performance wise than having no cache at all.

     
    \Cameron\
    I meant write-behind, as in "the app changes the data 100 times, and an hour later, the data gets written to the database one time" write-behind. By setting the write-behind latency sufficiently high, and having a large enough cluster (to cache huge amounts of data with the distributed cache), you can reduce database reads by almost 100% and writes by well over 90%. That means reducing costs for high-scale applications, since the database (or God-forbid, the mainframe) is usually the most expensive piece of server in the datacenter. And when you cut the 32-CPU database server ($4MM) requirement down to a commodity 2-CPU database server ($40k), you save about 99% of the cost, or in the datacenter, you end up getting a lot more apps on the same shared database servers and mainframes and message buses, and the end user gets better response times.
    \Cameron\

    I agree 100%, with a caveat - you're gaining performance at the expense of data integrity. Pull the plug on your server, and all that data the system swore up and down it "committed" can either disappear or become corrupted.

    In general what you said here and before made sense - except for the "guarantee the safety of your data" part. Not everyone needs the higher levels of safety, of course, but that's what you're really getting with the high end machines - speed _and_ safety. In the scenario you've described, you're sacrificing one for the sake of the other. This is valid - so long as you understand what can go wrong and accept it.

         -Mike
  28. you mike, caching good[ Go to top ]

    Mike: I agree 100%, with a caveat - you're gaining performance at the expense of data integrity. Pull the plug on your server, and all that data the system swore up and down it "committed" can either disappear or become corrupted.

    No, that isn't true. Pull the plug on a server, and responsibility for the write-behind fails over to another server (actually, the write-behind data on the failed server will fail over in a load-balanced manner to all remaining servers). Coherence is built on clustering, and all services fail over and fail back transparently and without data loss. Otherwise you'd be insane to use write-behind ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  29. you mike, caching good[ Go to top ]

    Mike: I agree 100%, with a caveat - you're gaining performance at the expense of data integrity. Pull the plug on your server, and all that data the system swore up and down it "committed" can either disappear or become corrupted.

    >
    > No, that isn't true. Pull the plug on a server, and responsibility for the write-behind fails over to another server (actually, the write-behind data on the failed server will fail over in a load-balanced manner to all remaining servers). Coherence is built on clustering, and all services fail over and fail back transparently and without data loss. Otherwise you'd be insane to use write-behind ;-)

    That's okay if you're only planning for a single server fail over.

    For certain types of applications, you need to assume that anything that was said to be committed was committed, and if all your app servers fall over you can pull it back out. Nothing less than a full collapse of your database should be taken as a reason for losing data.

    Wether you need this level of reliability is a good question; there are relatively few sorts of failures that will take out an entire clustered system, but leave the DB intact. I also suspect that most systems which say they do don't (I know that's the case for the ones I've got to deal with).

    Most likely scenario I've come up with: your servers (unknown to you) are unstable at really high load. One server gets that high, falls over. Its workload is immediately picked up by the other nodes in the cluster (which were already stretched). Maybe they stay up, but most likely one goes down. You now have a cascading system failure which knocks out all of your cluster nodes in a relatively short time (too short for manual intervention, at least). Of course, this is a catastrophic situation, and probably needs special recovery procedures anyway, but sometimes you can't convince people of that.
  30. Cameron: Pull the plug on a server, and responsibility for the write-behind fails over to another server (actually, the write-behind data on the failed server will fail over in a load-balanced manner to all remaining servers). Coherence is built on clustering, and all services fail over and fail back transparently and without data loss. Otherwise you'd be insane to use write-behind ;-)

    Robert: That's okay if you're only planning for a single server fail over. For certain types of applications, you need to assume that anything that was said to be committed was committed, and if all your app servers fall over you can pull it back out. Nothing less than a full collapse of your database should be taken as a reason for losing data.

    That's why we support write-through caching, so if you commit, it commits. Write-behind caching is a feature that you can choose to use.

    I will say that in practice, a cluster of app servers probably has higher availability than a database. I know of at least one customer of ours (an ASP) that lost their database and didn't even notice for several hours because everything kept working. (They cache their entire data set, and use write-behind.) When the database came back up, the queued writes that had been re-queueing while the database was down finally went through and everything was back to "normal". The end users were never impacted.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  31. you mike, caching good[ Go to top ]

    \Purdy\
    No, that isn't true. Pull the plug on a server, and responsibility for the write-behind fails over to another server (actually, the write-behind data on the failed server will fail over in a load-balanced manner to all remaining servers). Coherence is built on clustering, and all services fail over and fail back transparently and without data loss. Otherwise you'd be insane to use write-behind ;-)
    \Purdy\

    First, a question - on a write, does Coherence propogate the data to all other servers and get a response that it was received before returning to the caller? It's important that the data actually makes it to all other clusters (and you know it has) for failover to work properly. Assuming you do so, what's the cost for doing this vs. writing to the database anyway?

    Second, and this is a bit of a nit pick - what about the non-clustered case? I wasn't initially thinking in clustered terms.

        -Mike
  32. you mike, caching good[ Go to top ]

    Hi Mike,

    Mike: \Purdy\

    Please, call me Cameron. "Purdy" is my brother. ;-)

    Mike: on a write, does Coherence propogate the data to all other servers and get a response that it was received before returning to the caller? It's important that the data actually makes it to all other clusters (and you know it has) for failover to work properly. Assuming you do so, what's the cost for doing this vs. writing to the database anyway?

    Absolutely! Our partitioned cache has a configurable number of backups; most customers set it to one for write-behind caching or to zero for who-cares-if-it's-lost caching. For this example, I will assume it is one.

    When the application changes the data and puts it into the cache, the data goes directly to the cluster node that owns it (the "primary"). The primary then sends the data to the backup node (or "backup nodes" if the backup count is greater than one), which responds to the primary that it has been received, and then the primary acknowledges the receipt to the application. Even during this delicate processing, either the primary or the backup server can die and the processing will still complete successfully (meaning the data is on two servers) because of Coherence's transparent clustered failover.

    Mike: Assuming you do so, what's the cost for doing this vs. writing to the database anyway?

    In a set of performance tests with an application running on a high-end Oracle server, write performance increased over 25x. Under load, that "25x" gets even bigger, because the database becomes a bottleneck. The cost is not the network time (which for this example would be in the very low single digits of milliseconds total), it's the disk I/O and the data transformations. We eliminate disk I/O altogether and defer the data transformations until the write occurs.

    Mike: Second, and this is a bit of a nit pick - what about the non-clustered case? I wasn't initially thinking in clustered terms.

    It's not a nit pick; that's a reasonable question since 90% of the apps out there are not clustered! Apps that are not clustered should not do write-behind caches, because the app server is a single point of failure in that case. In general though, caching in the non-clustered scenario is way easier because you don't have to worry about coherency among cluster nodes. That's why there is no market for selling "non-clustered" caching products -- anyone can write a "local" cache in an afternoon. In Jave, there are half a dozen open source projects that do this, and for C# there's even one built into the .NET framework.

    However, Coherence is for clustering, so we focus on the 10% of apps that are clustered. And since our software is so affordable, I'm sure that the 10% number will be growing ;-)

    p.s. What's your email? Drop me an email at cpurdy/tangosol.com.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  33. They might have infinite memory to cache results for 12 hours.

    Is putting 12 hours of caching a new marketing tool or just voddoo ?

    They should publish "The Best Practices in Voddoo by M$"
  34. I have never in my life seen a system where something is explicitly cached for 12 hours.. unless you count static html-pages....

    Well, I guess its "screw realism and useful-code, as long as we win this useless benchmark". :)
    Although I am a bit suprised they havent "integrated" petshop written in assembler into the OS (hey, "it worked" with IE, didnt it?) to be able to say its 0 LOC (just a bookmark in ie) and fast as ****..
    Ok, just being a bit sarcastic, I just think its a bit pathetic to what lengths people are willing to go in order to prove a point (even if they have to make up the point).. And this doesnt just go for MS and .Net, it basically goes for most of the industry.
  35. By the way..[ Go to top ]

    Before anyone picky tries to put me in my place: I dont think IE is written in Assembler, just a metaphor for doing whatever it takes to win the petshop benchmark..
  36. By the way..[ Go to top ]

    Wille: Before anyone picky tries to put me in my place: I dont think IE is written in Assembler, just a metaphor for doing whatever it takes to win the petshop benchmark..

    IIRC - The last Microsoft application built in assembler was "write.exe" (the precursor to WordPad). My memory is a bit fuzzy, but I think it was written during a Christmas holiday by the then VP of the Windows group. ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  37. PetStore and PetShop links[ Go to top ]

    http://jinx.swiki.net/428

    http://java.sun.com/blueprints/code/index.html#java_pet_store_demo

    http://www.ibatis.com/jpetstore/jpetstore.html

    http://xpetstore.sourceforge.net/

    http://www.theserverside.com/home/thread.jsp?thread_id=18161

    http://blueprints.macromedia.com/petstore/

    http://dreambean.com/petstore.html

    http://www.theserverside.com/home/thread.jsp?thread_id=9797

    http://www.theserverside.com//resources/article.jsp?l=PetStore

    http://www.middleware-company.com/j2eedotnetbench/

    http://www.oreillynet.com/pub/wlg/2253

    Microsoft PetShop

    http://www.gotdotnet.com/team/compare/petshop.aspx

    http://www.theserverside.com/home/thread.jsp?thread_id=19636

    http://msdn.microsoft.com/architecture/default.aspx?pull=/library/en-us/dnbda/html/PetShop3x.asp
  38. what is so bad about caching, if things are done the right way?
    I think it's okay and a step in the right direction.

    http://www.prevayler.org/

    cheers, Allan
  39. Re: What's wrong with caching?[ Go to top ]

    what is so bad about caching, if things are done the right way?

    > I think it's okay and a step in the right direction.

    Doing it in the right way's the hard part.

    Let's say you've got a distributed, clustered system. If you put a cache in, you can get issues where one node in the cluster updates the data (and its local cache), but other nodes don't see the change. Ouch.

    In a multi-node environment, your caches have to be able to talk to each other. Otherwise you've improved your performance at the cost of scalability. Remember: the Pet Store is meant to show how you design for scalability. Everyone pretty much agrees that if you're really building such a small system, you wouldn't go as far as they do.
  40. Re: What's wrong with caching?[ Go to top ]

    hi,
    >
    > Let's say you've got a distributed, clustered system. If you put a cache in, >you can get issues where one node in the cluster updates the data (and its >local cache), but other nodes don't see the change. Ouch.

    yup, that's why the link: http://www.prevayler.org/
    why not go all the way :-)

    cheers, Allan
  41. By the way, you can use MS-similar cache in Java too. See:
    www.servletsuite.com/servlets/cachetag.htm for component level cache and
    www.servletsuite.com/servlets/cacheflt.htm for page level cache
  42. JDO and Caching[ Go to top ]

    Hi All

    "cache expires after 12 hours ... it seems to be a nice feature from the .NET framework to have the a Cache API included."

    Several JDO implementations provide much better caching than this. And unlike this solution the caching is completely transparent to the application. The cached results are automatically evicted when changes are made to the underlying objects without any dodgy timeouts.

    Cheers
    David
    JDO Genie - High Performance JDO for JDBC
  43. JDO and Caching[ Go to top ]

    David,

    I agree that a 12 hour cache timeout is "abit" unrealistic for a real world app. But having the cache updated everytime a change is made to the data might not necesarily be a requirement of the app. i.e. a period where the cache is "stale" for a period of time may be acceptable for a particular use case. So i don't think "dodgy timeouts" are always a bad thing. IMHO... use whichever is suitable for the use case.

    Cheers

    Smythe
  44. JDO and Caching[ Go to top ]

    Hi Smythe

    Yes you are correct that having a fixed timeout for cached data makes sense for some applications. I probably should not have used the word "dodgy" :)

    JDO caching is done at the data level and is transparent to the application. If you run a JDOQL query and sometime later run the same query with the same parameters you avoid going to the database. Because the JDO implementation knows when you modify data it can evict the cached query results automatically. This is more broadly applicable than a timed cache.

    Cheers
    David
  45. It would seem that Microsoft's recommendations for architecting systems seem to be edging closer to principles and practices used in the Java realm for years. And I mean "USED" - as in the past tense. This kind of stuff has been around for years now, and MS seems to be catching up on it right now.

    Problematic if you have to actually build big systems based on these architectures. The lack of working equivalents of ORM software or JDO-like tools, for instance, is particularly problematic.

    Sandeep
  46. Quote from the abstract of the MSDN article that accompanies the PetShop 3.x demo app:

    "The third revision is also fully compliant with the Middleware Company Application Server Benchmark Specification, and will serve as Microsoft's entry in the upcoming Middleware Application Server Benchmark this spring: a second round of testing by the Middleware Company to compare the scalability of .NET and the J2EE platforms for building and hosting enterprise Web applications."

    How come we haven't heard any official word about this "second round of testing" from the TMC, except for the original propaganda stuff? Doesn't TMC want input from the J2EE community? If not, people can't help but speculating about "conspiracy" ...
  47. More details from the same MSDN article about the Petstore vs. PetShop rematch:

    "The upcoming Middleware Benchmark will test the new .NET Pet Shop 3.0 implementation to compare its performance to two new J2EE implementations—one based on CMP 2.0 and one based on a pure JSP/Servlet architecture with no EJBs. The testing will be conducted by the Middleware Company and published on the ServerSide with a variety of J2EE application servers. A panel of J2EE experts has been established and is overseeing this benchmark to ensure fairness across tested products and adherence to specification/best practice for all implementations. In addition, the major J2EE application server vendors have been invited, along with Microsoft, to participate by commenting on the specification, providing implementations to be tested, and being on site for tuning/optimizations and the testing process. Microsoft has opted to fully participate in this second round of testing."

    Sounds promising. Let's hope it is a truly fair competition. It would be nice if TMC can publish the specs of Round 2 to get input from the developers community before the rematch starts.
  48. PetShop Rematch Specs[ Go to top ]

    "How come we haven't heard any official word about this "second round of testing" from the TMC, except for the original propaganda stuff? Doesn't TMC want input from the J2EE community? If not, people can't help but speculating about "conspiracy" ..."
    <br>
    There was word and a whole discussion about the specs for the second round in this thread. Have a look, although it looks like activity on that thread has died down.
  49. Caching[ Go to top ]

    I'd just like to quickly point out that MANY applications have some kinds of data for which long (hours) timeout caching makes sense. Obvious examples are:

    (1) CMS-type applications
    (2) meta-data driven applications
    (3) applications with a lot of "reference data"


    Of course, very simple caching mechanisms are not good for *transactional* data.


    Nobody has addressed a question in the original post: should Java have a standard caching API?

    I absolutely agree that it should; caches generally have quite simple APIs - most of the "innovation" is under the hood. It seems like a perfect example of something that *could* be standardized profitably.

    I would love to be able to avoid the cost of writing an adaptor layer (even if it is admittedly very thin) for each new cache mechanism Hibernate will support (JCS, SwarmCache, etc).
  50. Caching[ Go to top ]

    I'd just like to quickly point out that MANY applications have some kinds of data for which long (hours) timeout caching makes sense. Obvious examples a
  51. Caching[ Go to top ]

    I'd just like to quickly point out that MANY applications have some kinds of data for which long (hours) timeout caching makes sense. Obvious examples are:

    >
    > (1) CMS-type applications
    > (2) meta-data driven applications
    > (3) applications with a lot of "reference data"

    Are you talking about web-layer caching here, i.e. caching of the HTML? The software I'm working on is a CMS with heavy use of metadata, and precisely because of that it's (pretty much) impossible to cache the output. If the content contains embedded metadata-references then caching the rendered content could result in showing stale metadata values, which is not acceptable.

    We *do* cache quite heavily on the model layer though, so caching as such is definitely important. But caching of HTML is very rarely useful, at least in our case.

    > Nobody has addressed a question in the original post: should Java have a standard caching API?
    >
    > I absolutely agree that it should; caches generally have quite simple APIs - most of the "innovation" is under the hood. It seems like a perfect example of something that *could* be standardized profitably.

    Isn't this what JSR-107 is all about?

    http://jcp.org/en/jsr/detail?id=107
  52. Are you talking about web-layer caching here, i.e. caching of the HTML? The software I'm working on is a CMS with heavy use of metadata, and precisely because of that it's (pretty much) impossible to cache the output. If the content contains embedded metadata-references then caching the rendered content could result in showing stale metadata values, which is not acceptable.

    >
    > We *do* cache quite heavily on the model layer though, so caching as such is definitely important. But caching of HTML is very rarely useful, at least in our case.
    >

    Hej Rickard,

    I have some experience with CMS systems and caching, which I would like to share, and which might contradict (parts of) your point-of-view on the usefulness on caching at the presentation layer (generated HTML), and to put the (where) to-cache or not-to-cache debate into a slightly different perspective.

    A few months ago, the site for which I am partly responsible for, started to show signs of gradually decreasing performance up to a point where it caused general instability and eventually a server (cluster) hang. A project was formed with the mission to find the causes to this and to come up with fast and effective solutions. In the end, we managed to decrease response times for the most accessed URLs with on average (50-95%). And how did we do that? Actually there was no one solution to rule them all, but a number of solutions applied in different layers of the application and to the system configuration.

    So what has that to do with caching in general, or with caching of static HTML in particular? Well, the thing is that we added a presentation caching layer built on top of the current CMS systems, which in turn already is more of a content (page element) object cache with rudimentary CMS mechanisms on top of our business model of content. Added complexity for a small benefit you might say, but I argue to the contrary. Why? Well, since our CMS system already had hooks for knowing when a content page was published (changed in the production/runtime environment), the addition of invalidation and refresh of the HTML data in the presentation layer caches was straight-forward.

    Also, which is the core of my argument, content pages now, page objects already loaded into memory through lazy-load mechanisms and now so even the static HTML caches, loaded in an instant from the HTML cache, instead of having to be rendered from cached objects and "content business rules". All in all, resulting in extremely short response times, tying up the most scarce and precious resources in most high-traffic systems - network sockets - for the shortest possible time, resulting in a much better end-user experience and adding a lot to the overall stability to the site in the progress. In addition, when fetching data from external resuorces, with both incoming and outgoing sockets on our part, and the fetchee being all else but quick and rock stable, the positive effect on the overall stability was even greater.

    Granted, if our content mechanisms had been more affected by constantly changing underlying metadata or, due to some other reason had to be re-rendered at (nearly) each request, the overhead of flushing and refreshing would have given a negative total end effect. Thus, my take on the (where) to-cache or not-to-cache issue, is that it, as every other technique in systems development, has to be applied judicously, on a case-by-case basis, and that caching in the presentation layer might yield greater positive effects than first expected, due to the fact that optimization is not all about CPU cycles and memory, but sometimes even more about network sockets.

    Keep up the good work!
    /Par

    par@middleman.se
  53. more funny is the .net users discussion of the Pet Store..

    http://msdn.microsoft.com/library/shared/comments/asp/threadbody.asp?aID=983329&collapse=0
  54. Pretty amusing..
    If their own proponents are critizising their implementation then.. Ok, wont go there.
    But I guess I wasnt to far of with my "Assembler"-metaphor perhaps.. :)
  55. PetShop Comparisons[ Go to top ]

    A really important question is why for example The Middleware company is doing such a comparison of J2EE PetShop(s) and .Net PetShops(s).
    For me its not clear what the goals of such a comparison are, especially for real life projects.

    Do they want to compare the best design decisions of one PetShop (e.g. using cache in the middle tier) or do they want to compare the power of a language. I think the first part is much more important and if the best Petshop at the end is the one using J2EE or .NET doesn't mean anything to me. Because the decision for the language used in a project is only one of many design decisions and in many cases more a political and/or skills decision.

    I like the PetShop tests, but for me they only show which design is best for the non-functional requirement that is most important in these tests: performance!

    Mirko
  56. <Mirko>
    I am curious how the J2EE version will respond to this. Tangasol Coherence would be the right answer I think...but it seems to be a nice feature from the .NET framework to have the a Cache API included.
    </Mirko>

    I would like to reiterate, as it was already mentioned several times in this thread, that .NET-framework Cache API is non-clustered and non-transactional. As Cameron cleverly suggested, creating local caches is for the most part very trivial, which is exactly the reason why we don’t see many commercial offerings for non-clustered cache implementations.

    As to how J2EE would respond to .NET in regard to cache API, there are several by-far-more-sophisticated Cache products in today’s J2EE market, Coherence, Spiritsoft, and xNova™ (shipped by our company) could serve as effective examples of it.

    We at Fitech Labs took it much further and also provided .NET distributed Cache service (among many other pre-built services), that shares exactly the same features (non-replicable design, clustering, JTS-like cache transactions with distributed two-phase-commit) as our Java/J2EE Cache service. Furthermore, we even allow users to cache the same objects concurrently in both, Java and .NET runtimes, which means that both, Java and .NET caches, may participate in the same cache transaction and same invalidation process.

    Hence, in my opinion, you can find a feature-rich distributed cache implementation not only for Java/J2EE, but for .NET as well.

    Best regards,
    Dmitriy Setrakyan
    Fitech Labs, Inc.
    xNova™ - Service Oriented Technology for Java and .NET
  57. Grow up Microsoft![ Go to top ]

    Do you think we get carried away by such results. .NET will take at least 2 years for the architectural maturity of J2EE. Till then M$ can continue their FUD. I am not impressed.

    Regards,
    Musaddique Husain Qazi
    CEO & CTO , 5 Ants Inc
    http://www.the5ants.com
  58. Compare on the Same Level[ Go to top ]

    The Cache in question is an API and is simply part of the .Net Framework in the same way that the DriverManager is just part of the Java API (as downloaded directly from Sun). If we look at it on the same level, Cache is simply a class that does not have an equivalent in Java.

    It seems that there are too many criticisms on something that came "out-of-the-box". An API is what it is: an API. It can never be a solution by itself. Groups may choose to build solutions using the API. Or they may choose to create their own from scratch. Come to think of it, at least in this case, something can be used out-of-the-box.

    I guess expectations are set too high as to what an out-of-the-box API can do. I don't think Microsoft can provide a caching API to be as powerful as the products described by some in this thread. In fact, I don't think even Sun would provide an API for caching in Java as sophisticated as what some sought in this thread. We already know people, groups and companies who capitalized well on these facts by investing and dedicating resources in order to build the solution.

    As I see it, the way Cache is used here, it is enough to serve it's purpose...