Home

News: JBoss Founder Marc Fleury: Why I love EJBs

  1. JBoss Founder Marc Fleury: Why I love EJBs (154 messages)

    Marc Fleury of JBoss has published a paper he has entitled "Why I Love EJBs - and Introduction to Modern Java Middleware".

    He discusses the merits of CMP, the JBoss concept of the generalized AOP/EJB container, proxy clients and the like. A very interesting read.

    It's at: http://www.jboss.org/blue.pdf.

    Cheers
    Ray

    ------------------------------
    Floyd (Editor's) Note December 4th.
    ------------------------------
    I attended the Toronto JUG lastnight where Marc Fleury did a talk. Rather than do another news post about it, here are some of the interesting things he said (that are not already in the BLUEpaper).

    - Web Services are a marketing trick from Sun. Marc can't get why J2EE would embrace it - it lets .NET talk to Java. He compared the embracing of WS from J2EE as 'letting the wolf in the barn'.

    - He quoted a Togethersoft survey of companies which reported appserver marketshare with JBoss at 48% and BEA at 28% (and the others with less).

    - "Open source is like pee, you can put it in the swimming pool but you can't take it out".

    - JBoss often wins at companies that have hundreds of CPU's to put appservers on, due to the zero cost per licensing model.

    Personally, I was pretty blown away by some of the features Marc is planning for JBoss 4. He's taking all the things we learned from EJB and generalizing it, making it available to plain java objects. For example, in JBoss 4 you will be able to take a plain java object and transparently remote it, persist it, apply security to it or wrap its methods with transactions. All of these EJBish services will be available to any java ojbect using dynamic interceptors to add this behaviour to the objects in an AOP-like fashion.

    If they can really pull all this off, JBoss 4 could have a huge advantage over all the other stick-to-the-spec vendors. Think about it, you can get all the power with EJB without the API and compile time burdens!

    From a business perspective, I was also quite impressed. Contrary to my previous encounters with Marc at the last couple of Java Ones, the Marc I saw last night sounded more like a CEO than the anti-corporate guerilla I remembered in the past. Last Java One they were trying to make money by selling t-shirts, but last night Marc was mentioning training consulting deals in the hundreds of thousands of dollars, and at one point in the night even mentioned JBoss Group having a million dollars in the bank (I think I heard that anyway).

    So, things have come along way for open source J2EE, and from the looks of JBoss 4, there's a lot more excitement yet to come.

    Floyd

    Threaded Messages (154)

  2. I'm glad that Marc has spoken up in favour of EJB. Caching has always been the big boon for entity beans, when used properly. However, it wasn't until modern application servers began shipping with distributed entity bean caches that this boon actually made entity beans viable. Unforunately, it might be a little too late, since the general sentiment I am observing in TSS forums is that people have given up on entity beans in favour of lighter weight approaches (the plethora of O/R mappers on the market are a good indicator of that too).

    The entity bean caching we have on TSS allowed the site to be super scalable since it launched in early 2000. Now that we have a cluster running BEA and Oracle, Tangosol Coherence is providing entity bean caching across these two servers really well.

    If only appservers supported clustered entity bean caching back when EJB was first launched, I think entity beans would have had a lot more industry adoption today.

    Now, I am speaking purely from a performance perspective. From a development perspective, entity beans are still quite heavy weight compared to JDO.
  3. From a development perspective, entity beans are still

    >quite heavy weight compared to JDO.

    And what do you think should be changed/added in EJB for
    developers?

    Dmitry Namiot
    Coldbeans
  4. I think entity beans should be changed from a persistent component specification to a persistent object specification (or a simple standard for O/R mapping). There are too many changes to list here. Just compare entity beans to JDO/Toplink/Cocobase and note how much easier it is to code with plain object model rather than a component model.

    Every one is using entity beans as an O/R framework anyway (hidden behind session beans) so whats the sense in pretending?

    The J2EE spec leads have already set a precedent for simplifying complex/unnessarily cludgy API's with JSP 2.0 JMS 1.1. Both these new specs introduced new and simpler ways to do what was API heavy in previous versions. I recommend that the EJB expert group adopt a similar mindset and simplify entity beans.

    Floyd
  5. Yes, but with JDO we have to implement the cache and we lose the solution for a big group of problems. (Lots of users + small database modifications)
    Maybe we should have a combination of those, but then we have cache synchronization problems. Maybe EJB would be the answer with the possibility to deactivate the cache and to use JDBC for heavy weight processes.
  6. I would say that combining the caching features that make EBs so great with the simplicity of JDO would be the best combination.

    Most O/R JDO vendors already include distributed caching in there products; I couldn't say about ODB-based implementations.

    --Matthew
  7. <Floyd>
    I think entity beans should be changed from a persistent component specification to a persistent object specification
    </Floyd>

    And if you add another two specs: interceptors(AOP) and cache. And the ability to use implementations of these three specs as plug-ins. Is not is the direction that JBoss is moving, at least it was the impression that I get from "BLUE" ?
    But then was JDO supposed to be persistent object specification and why Mark does not like JDO?

    oleg
  8. Oleg,

    "
    Is not is the direction that JBoss is moving, at least it was the impression that I get from "BLUE" ?
    But then was JDO supposed to be persistent object specification and why Mark does not like JDO?
    "

    Marc doesn't like JDO because he is going to replace it with features provided by JBoss 4, which will allow you to persist a plain java object model without compile-time byte code modification.

    JBoss 4 will include run-time byte code modification - very cool stuff!

    Floyd
  9. Floyd, will this application of the JBoss CMP engine to POJO work outside the JBoss container? If so, it might be useful as an O/R solution that I can pick and choose independently of my app server choice.
  10. Mike, Marc said that they are building their persistence stuff on top of an open source project called Javasyst from Japan (I havn't been able to find any links to this project). So, its possible that this functionality might be available stand alone.

    Floyd
  11. http://www.csg.is.titech.ac.jp/~chiba/javassist/ - God bless google ;-)
  12. Floyd

    Marc doesn't like JDO because he has spent so much development effort into EJBs. Marc has done an excellent job and the downsides to EJB Entities are not under his control, he has tried to implement them in what he sees is the best way.

    I suspect that Marc and other App Server vendors view JDO as duplication of effort.

    Java needs a simple standard way to persist like JDO, there are too many non-standard persistence mechanisms already. So why doesn't he make the API to his new persistance functionality JDO compliant AND still have cool features like runtime byte-code modification?

    Dan
  13. Floyd:
    [ ... Marc doesn't like JDO because he is going to replace it with features provided by JBoss 4, which will allow you to persist a plain java object model without compile-time byte code modification. ...]

    I thought the debate was settled a long time ago as far as Java developers are concerned -- Standards are good, proprietary stuff is bad. We never doubted that proprietary stuff ran more efficiently on the respective vendors' preferred platforms (in most cases at least), but didn't we all made that sacrifice in the name of cross-platform deployment and interoperability? I'm sure what Marc is proposing is wonderful, but somehow, I fear that Bill Gates is going to have the last laugh.
  14. Excellent points and I am with you. The last thing we need is to have every vendor coming up with their own proprietary persistence schemes. I am quite happy with the rate at which EJB specs and implementions are progressing and have no interest in getting locked in to any J2EE provider.
  15. "I thought the debate was settled a long time ago as far as Java developers are concerned -- Standards are good, proprietary stuff is bad"

    IMHO this rule does not apply under all circumstances. Proprietary is bad in so far as a single company cannot be relied upon to fix their bugs, produce new releases, and do all that in an affordable fashion. Standards solve the problem because of competition - they create pressure on each vendor by opening up alternatives for customers.

    However, this reasoning doesn't apply if you look at GPL products with huge market share. When you install Apache or Linux, do you really need to worry about an exit strategy?

    With that, I believe that jboss still doesn't have the necessary market share, but maybe one day it will, if they get the right attitude.
  16. Floyd Marinescu said above: "Marc doesn't like JDO because he is going to replace it with features provided by JBoss 4, which will allow you to persist a plain java object model without compile-time byte code modification. JBoss 4 will include run-time byte code modification - very cool stuff!"

        Marc Fleury wrote in blue.pdf: "In other words, if the CMP2.0 engine’s applicability goes beyond EJB alone, why couldn’t we imagine a CMP engine working on abstract plain old java objects? We will look at making it the default service for persistence in JBoss."
     
    Maybe I am missing something... but this seems VERY EXCITING! However, what API will JBoss use for persisting plain Java objects? Some sort of API will still be required... how will they e.g. do Queries? With EJBQL, even though in their future models persistent entities do not actually need to be written as Entity Beans anymore?

    Probably obvious where I am going... ;) Why not use JDO as API? And JDO is an API, an API for plain vanilla object persistence. Nothing "more", nothing less.

    Granted of course it's not exactly that trivial, there are differences; but: if you take away the "distributed" from Entity Beans (note Marc mentioning "collocated" in the PDF repeatedly), remove the EJB API reliance as they seem to have the intention to, and realize that JDO PC instances are definitely transactional objects as well... then wouldn't this be a) technically absolutely feasible, b) extremely intesting? (And no, before somebody screams, the JDO specification does not (anymore) explicitly mandate compile-time bytecode enhancement; it's just what most JDO implementation vendors do today; but if JBoss can do something cooler with AOP-based run-time byte code modification; go for it!

    So what am I missing? This could be really nice. Thoughts? Anybody from the JBoss dev folks with us here?
  17. Floyd,

    Again and again, entity beans' cache becomes a trouble when you're using shared databases. You can't manually order to invalidate only some of the instances in cache (when using CMP) and that means that with every transaction a hundreds of beans in cache automatically getting synchronized with database... What a performance overweight! Correct me if I am wrong.

    Alex
  18. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    <Q>
    Again and again, entity beans' cache becomes a trouble when you're using shared databases
    </Q>
    Then don't share the database. Those who think of it as a database, which is different than knowing it is, will have problems using things like EJBs.

    So explain how business logic is kept in synch in a shared database scenario? (I think I know and it makes your DB vendor very happy)
  19. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    "<Q>Again and again, entity beans' cache becomes a trouble when you're using shared databases</Q>
    => Then don't share the database."

    Like you have the choice!... Sharing a database is usually not an implementation detail, it is most of the time an architectural constraint you have to deal with when doing your analysis.

    "So explain how business logic is kept in synch in a shared database scenario?"

    Keeping business logic in sync in a shared DB scenario does not really make any sense as a statement (business logic is not data and only data needs to be kept in sync in a caching approach) or did I miss anything ? As far as data is concerned, it all eventually boils down to staleness management. Having a database that is not shared removes that concern at the cache level, which positively affects performance, but as I said, you haven't always the choice...

                  Yann
  20. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    <Q>
    Like you have the choice!... Sharing a database is usually not an implementation detail, it is most of the time an architectural constraint you have to deal with when doing your analysis.
    </Q>

    You should have a choice. If the database already exists then you probably are already in trouble - on many levels. If you are starting from scratch, the database should never be shared. In fact, don't even mention database - use persistance instead. If you are not starting from scratch - fix the problem - stop sharing the database or forget using techniques and technologies that treat databases as persistance.

    <Q>
    Keeping business logic in sync in a shared DB scenario does not really make any sense as a statement
    </Q>

    My point exactly.

    <Q>
    business logic is not data and only data needs to be kept in sync in a caching approach
    </Q>

    What use is data without the info (business logic) on now it is used or to be used? OO techniques combine data with logic.
  21. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    <quote from Mark Nuttall>
    You should have a choice. If the database already exists then you probably are already in trouble - on many levels. If you are starting from scratch, the database should never be shared. In fact, don't even mention database - use persistance instead. If you are not starting from scratch - fix the problem - stop sharing the database or forget using techniques and technologies that treat databases as persistance
    </quote from Mark Nuttall>

    I really don't think that this applies to most enterprise database scenarios.

    The ability to keep a database, even a new one, from being touched by other sources is very difficult.

    In my experience, the data stored within an application often has more value than the software used to access it. This value usually increases with time, as the volume of data increases. The data is considered as "belonging to the business", and often "the company at large", and not any "individual application or piece of software".

    I've never seen a situation where I've been able to work with a enterprise-level database (not the RDBMS - the actual database containing the data) and be able to deny sharing that information with other people in the business. Other people, who might use a totally different set of technologies to access that information.

    Data sharing is a way of life in enterprise systems. In fact, it is often part of the ROI statement in many project proposals. That the data will be available for people to use. The more access to the data that can be supported, the merrier. (I do think that web services can potentially make access more centralized. That IMHO is the easiest way to define a standard interface to a business logic layer of the kind you mentioned - one that has sole access to data.)

    I'm not saying that the ideal ISN'T an unshared database. Its just that I've yet to come across such situations in all the projects I've worked in.

    Therefore, cache coherency is a significant design issue that significantly limits the utility of data caching
    within app servers.

    In fact, from personal experience, I would say that if you were working on a functionality that used only readonly data (or data with very low volatility), go ahead and cache away till hell freezes over. Otherwise, let the database do what it does best. Manage data.

    Also, I believe that the evil devil child "serialization" is not as big a monster when it comes to database connectivity. My experience is that databases are good (and very fast) at working with data. Cost-effective "connectivity" (minimal network packets, CPU cycles, disk I/O) is something that we need to continually focus on.

    Meanwhile, without caching, CMP 2.0 might not be the best approach to persistence. At least not in its current avatar.

    Sandeep.
  22. Hi,

    Another huge (and commonplace) consideration in terms of cache coherency issues is reporting and other business analytics needs that have a real-time nature. You just cannot have every single technology in the world be able to work off the same data cache. Because that is what you will need to do to ensure data synchronization across the entire breadth of an enterprise solution.

    Of course, you could always come up with a web service interface to the entire cache (either via session beans or to entity beans directly) and restrict access to the database that way. That would an interesting solution, but we aren't there yet.

    And of course, SOAP is an inefficient protocol. But then again, you can't have everything.

    Sandeep.
  23. If you are reading this thread and you are using Oracle's Real Application clusters or shared DB2 on z/OS (using the Coupling Facility), can you say something about your experience and costs? I suspect that those two are viable approaches to scaling a database so you don't have to cache ANYTHING.

    Greg Pfister claims that clustering using the z/OS Coupling Facility, with a FULLY SHARED WORKLOAD, adds only about 13% overhead. The reason is basically is that the data sharing is done using synchronous data transfers, taking a few microsecs each - no context switches, no TCP stack - nothing! I would love to use this stuff but we are talking huge budgets.
  24. <Sandeep>
    Another huge (and commonplace) consideration in terms of cache coherency issues is reporting and other business analytics needs that have a real-time nature. You just cannot have every single technology in the world be able to work off the same data cache. Because that is what you will need to do to ensure data synchronization across the entire breadth of an enterprise solution.
    </Sandeep>

    There is such a coherent cache - it's called a RDBMS :-)

    Or have people forgotten the meanings of ACID and RDBMS transaction benchmarks?

        -Mike
  25. Um, er, Mike.

    I am not quite sure that you followed all pieces of the thread.

    I AM an advocate of leveraging the inherent data management capabilities of RDBMS. I AM also trying to say that caches don't quite work.

    Sandeep.
  26. <Sandeep>
    Um, er, Mike.
    I am not quite sure that you followed all pieces of the thread.
    I AM an advocate of leveraging the inherent data management capabilities of RDBMS. I AM also trying to say that caches don't quite work.
    </Sandeep>

    Sorry, I was attempting to back up your statements by being overly clever (well, at least I put in the smiley). Specifically, the rant on ACID properties and TPC benchmarks were not directed at you, but at the faceless hoard of developers who follow the spec du jour but never bother to actually load test various scenarios themselves.

    <Side Note> I've actually had a very senior developer argue the great virtues of Entity beans about a 1 1/2 ago, going on and on about the performance of the caching. After some confusion and much in-depth conversation, I found that he never even attempted straight JDBC (or lightweight stateless ession beans). He just "assumed" that entity caching would be faster and that I was an idiot to rely on the database.

    A year later and he's one of the most bitter people I know - and someone who will take any opportunity to cuss-out the entity bean spec. Apparently he got nailed bad by his entities getting pessimistically locked and blowing out his cache.
    </Side Note>

        -Mike
  27. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    <Q>
    Data sharing is a way of life in enterprise systems
    </Q>
    Sharing 'Data' is different from sharing a database.

    <Q>
    I really don't think that this applies to most enterprise database scenarios.
    </Q>

    It does apply. It probably doesn't and won't happen because so many are short sighted.


    <Q>
    The ability to keep a database, even a new one, from being touched by other sources is very difficult
    </Q>

    True. But that can change if people's presuppositions change. As long as people think 'data' and 'database' - it won't.

    <Q>
    In my experience, the data stored within an application often has more value than the software used to access it.
    </Q>

    It may seem that way but what good is the data without descriptors and constraints and ... ?
  28. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    <Mark Nuttall>
    You should have a choice. If the database already exists then you probably are already in trouble - on many levels. If you are starting from scratch, the database should never be shared. In fact, don't even mention database - use persistance instead. If you are not starting from scratch - fix the problem - stop sharing the database or forget using techniques and technologies that treat databases as persistance.
    </Mark Nuttal>

    God bless everyone who has the option to use a private database. This solves a whole raft of issues, of which data caching is only a small part.

    Sadly for myself and many others, having a private database is not an option. Often you have to work with a database schema developed awhile back (sometimes way back!). And in other cases, you have no choice but to accept data coming in from a background process/daemon/whatever which has no hooks into your app server. That's life in doing true enterprise applications with many data sources. You can try partioning off only the information you need and creating a private database based on that, but this only moves your problems to another level. Instead of working about stale beans you have to worry about keeping multiple databases in sync.

    Personally, I think application/EJB caching is overrated and used far too often. Databases often do a fantastic job of caching, and are only slow because a surprising number of application developers don't know how to optimize queries or add indexes to tables. If you think about it, who's going to cache data better - Oracle/PostGres/et al, or your CMP entity bean provider? I'll put my money on the guys who've been doing it for 20 or more years.

        -Mike
  29. "Personally, I think application/EJB caching is overrated and used far too often. Databases often do a fantastic job of caching, and are only slow because a surprising number of application developers don't know how to optimize queries or add indexes to tables. If you think about it, who's going to cache data better - Oracle/PostGres/et al, or your CMP entity bean provider? I'll put my money on the guys who've been doing it for 20 or more years."

    Depends, what about this scenario (probably way of - but I think it shows my point ;):

    java-database/database-java (total) serialization cost: 200ns (each translation as fast as one database query...)
    java ejb cache (when hit in about 80%): 10ns
    database cache (as above): 1ns
    database no-cache: 100ns

    so lets do some math, if you disable java ejb cache and let the database "do its magic" with cache you get:
    1.0*200 + (0.8*1 + 0.2*100) = 220.8 ns/call

    if you enable java ejb cache and disable database cache you get:
    (0.8*10 + 0.2*200) + 1.0*100 = 148.0 ns/call

    So, it seams that this depends entirely on how much time the database query takes compared to the total time of serialization the data between database and java.

    Which one do you believe (know?) is faster in your case?

    (disclaimer: with cmp2.0 local entities in *one* jvm, e.g. no other serialization needed in ejb cache)
    /Anders
  30. I am *tired* of uselsess ' architechts ', ' Developers ' , ' System architects ' ' <who-everthef**-u-r'
    The fact is, you know JACK SHIT. All this nonsense re ejb VS BLAH BLAH vs Performance.

    Shut the f***up and just do it *PROPERLY* and see. You must be * really* rubbish/incompetent .......just like roman, **** why do i wast my time with u loosers ?

    My box

    jboss-version = jboss 3.0
    [dummies] impressions/hits = 25,000 a day


    argue hits/impressions etc i do not care, i server and i earn ...

    suck it hard.


    K
  31. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    Wrong planet Kal El, the people who want to read your rude ignorant opinions are several million AU's away :-)
  32. Seems like a last breath of EJBs when Marc Fleury wants to convience people about the glory of EJBs. There is so many articles about why I hate EJBs, so this one was predictable. Still EJBs are nothing, sorry, I have been there, done that. Nobody cares, but the fact is that EJBs should be hated instead of loved (no offence here, but they are not very practical, nor performant (yeah, give those blazing numbers, I can tell you, you pay overhead)). Only thing that can be used (in case of EJBs) is stateless session beans. Not very thoughtfull post, but it's my opinion and I think that I'm not alone with that.
  33. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    I found Kal El's comments to be both insightful and thought provoking. It is about time that intellectually stimulating discourse on the merits of J2EE is put forth on this forum.

    Two things about this brilliantly lucid commentary struck me:

    1. The cunning use of "u" and "r" in place of "you" and "are". What a tremendous time saver!

    2. The icing on the cake for me was the quote: "I server and I earn".

    Didn't Cicero once say that? Powerful.

    Obviously, this is a superior mind. Take note, serverside!
  34. Jason,

    There should be a law against you! I almost fell off my chair and injured myself. ;-)

    Sandeep

    PS: You beat me to it.
  35. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    <snip>
    I am *tired* of uselsess ' architechts ', ' Developers ' , ' System architects ' ' <who-everthef**-u-r'
    </snip>

    So what are you ? A butcher or sumthin' ? If you are tired of all those, shouldn't you be on IRC somwhere injuring people ? I tell you that here you'll find only these "uselsess ' architechts ', ' Developers ' , ' System architects '"

    Argh !
  36. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    hehe back in the day in the amiga scene, a sure sign of a loser was that he spelled loser with 2 o's

    l8r
    morten wilken
  37. <Anders>
    Depends, what about this scenario (probably way of - but I think it shows my point ;):
    java-database/database-java (total) serialization cost: 200ns (each translation as fast as one database query...)
    java ejb cache (when hit in about 80%): 10ns
    database cache (as above): 1ns
    database no-cache: 100ns
    so lets do some math, if you disable java ejb cache and let the database "do its magic" with cache you get:
    1.0*200 + (0.8*1 + 0.2*100) = 220.8 ns/call
    if you enable java ejb cache and disable database cache you get:
    (0.8*10 + 0.2*200) + 1.0*100 = 148.0 ns/call
    </Anders>

    Let me start by saying that the point is to always be "fast enough" - not fastest. Second, the price of one a single call is uninteresting. What's interesting is the price of 200 calls/second (or whatever load is appropriate in your system).

    That said - using your numbers, I'd never want a an EJB cache. For all the complexity that's in there, you're getting less than a 2 times improvement. A cache just isn't worth it if you're only halving retrieval time.

    Scalability is also very important (at least to me). How well does that EJB cache do with 500 clients, and say 50 simultaneous requests in flight? How does Entity EJB pessimistic locking affect the results? Are you getting real hits out of your cache under that load, or are you thrashing the cache?

    The nice thing about database caches is that they've been tuned literally over decades of very hard-core application use, and they scale wonderfully. An individual client may see some latency - but 50 clients will all get around the same response times. Can you say that about your EJB cache? Is it really scaling under load?

    Three other factors should also be kept in mind. First, your database is going go guarantee ACID no matter what. Clustering is not going to be a big bugaboo as it can be with EJBs. Second, it's very common to tier machines in an enterprise system: the web server goes on blades or other very light boxes, the app server instances go on medium boxes, and the database goes on the biggest hardware you've got. In this kind of setup your RDBMS is going to soak up all of that big hardware and give you performance you'll be hard pressed to even approach in a clustered EJB environment, particularly if you're stuck on "medium" sized boxes. Third, an RDBMS is infinitely more tunable for performance than any EJB container I've ever seen. There are people's who's main job is to maintain and tune databases - they're called Database administrators. A good one can give you an order of magnitude speed boost to your app without touching a line of code.

    My main point here - performance isn't about latency on one request. It's about serving hundreds or thousands of requests without bottlenecks.

    I'm not saying that application-level caches are useless. There are times when they give a big boost. But a surprising number of developers use caches blindly without even timing straight-through stateless session bean or JDBC code under load. A cache should be a very special case that's undertaken because of problems found during direct performance measuring under load. It should not be the way you routinely handle data.

        -Mike
  38. "Let me start by saying that the point is to always be "fast enough" - not fastest. Second, the price of one a single call is uninteresting. What's interesting is the price of 200 calls/second (or whatever load is appropriate in your system)."

    yep, I agree :)

    "That said - using your numbers, I'd never want a an EJB cache. For all the complexity that's in there, you're getting less than a 2 times improvement. A cache just isn't worth it if you're only halving retrieval time."

    did you notice the part "same time as a database query...": e.g. in general I believe the time taken to do serialization from java -> bytes -> send over wire to database machine and back again will be orders of magnitude slower than the database query part! The numbers I choose deliberatly to be the same for query and serialization just to show ejb cache is almost twice as fast not counting that serialization cost could be 1000 times more costly than the database query - now do you see my point?

    I'm not arguing against database cache being a lot faster than the ejb cache - the main point is that the *transport* over wire (or even just out/in from jvm to native - maybe direct buffers will help here in a not so distant future though) from database into a real java object and back *hurts* :)

    "My main point here - performance isn't about latency on one request. It's about serving hundreds or thousands of requests without bottlenecks."

    True, we're seem to be arguing about different things here - I was trying to show that using a ejb cache will be *faster* for a java solution - not that it will be more reliable or scalable. I find it quite easy, if it doesn't work/scale - just disable it and live with slower response times - that's just life. Or find a product which do work - that what I like most about j2ee - there are just so incredible many options!
  39. <Anders Dahlberg>
    did you notice the part "same time as a database query...": e.g. in general I believe the time taken to do serialization from java -> bytes -> send over wire to database machine and back again will be orders of magnitude slower than the database query part! The numbers I choose deliberatly to be the same for query and serialization just to show ejb cache is almost twice as fast not counting that serialization cost could be 1000 times more costly than the database query - now do you see my point?
    </Anders Dahlberg>

    What concerns me is the phrase "I believe". There's no reason to guess, or extrapolate, or even to apply logic to this sort of situation. Instead - just go and test it.

    For my systems, I'm not really serializing objects - there's code that's explictly transforming Java types into SQL parameters (and vice versa), and that code is not nearly as slow as you seem to imply. On windows systems the time it takes to do this is less than the resolution of the windows timer calls.

    On the sending-over-the-wire part - 100MBit connections are common now. Is that network send and receive really as slow as you believe? Have you timed it?

    <Anders Dahlberg>
    I'm not arguing against database cache being a lot faster than the ejb cache - the main point is that the *transport* over wire (or even just out/in from jvm to native - maybe direct buffers will help here in a not so distant future though) from database into a real java object and back *hurts* :)
    </Anders Dahlberg>

    It does? Then quantify it for 1 request/second, 10 req/sec, 100 req/sec. How bad does it hurt (in my experience - as long as you're not literally using the Java serialization mechanism, it doesn't hurt at all).

    <Anders Dahlberg>
    "My main point here - performance isn't about latency on one request. It's about serving hundreds or thousands of requests without bottlenecks."

    True, we're seem to be arguing about different things here - I was trying to show that using a ejb cache will be *faster* for a java solution - not that it will be more reliable or scalable. I find it quite easy, if it doesn't work/scale - just disable it and live with slower response times - that's just life. Or find a product which do work - that what I like most about j2ee - there are just so incredible many options!
    </Anders Dahlberg>

    Scalable and "faster" go hand in hand, and you really can't look at just one or the other. For the type of work I do at least, the speed of one message end-to-end is meaningless - literally no one cares. What they care about is user-perceived response times when 50, 100, 500 clients are hitting the system simultaneously. I know my database cache can handle it. Up to now, EJB entity caches have invariably failed to handle that sort of load properly. Plus, as others have mentioned, you can't even properly use an EJB cache if the database is shared.

    Also, you mention "I find it quite easy, if it doesn't work/scale - just disable it and live with slower response times - that's just life". You do realize the reason it's slower has nothing to do with the database, but with entity bean semantics, right? Try straight-out JDBC or JDBC used from within a stateless session bean, and time _that_ against the entity solution - cached and uncached - and see how it works. If you haven't tried it, the results might surprise you.

        -Mike
  40. Suppose we use DB cache only. What if we need some cache
    management, e.g. some sort of cache-events driven
    callbacks? How to do that without EJB?

    Dmitry Namiot
    Coldbeans
  41. Benchmarks please[ Go to top ]

    Mike can you post your results? I would be very interested to see them
  42. Mike,

    I could not agree more. Let the database take the load.
     
    The big databases are incredible monsters capable of 200,620 to 455,000 tpmC. Unclustered! It is a little funny that applications servers who have problems with only 3000-4000 users think it’s necessary to "help" the database..

    Regards
    Rolf Tollerud
  43. Rolf,

    Why not cache? So what if the database scream. Getting data from intra-JVM cache WILL BE FASTER. Why is wanting to do this a bad thing?

    Say you have a complex data structure (like a deep product catalog category tree) that all of your users access multiple times and the data is reasonably stable. Are you advocating getting this data from the database everytime - thousands of hierarchically-related objects thousands of times per minutes?

    Or what about if you have several clustered app servers. Should they all pound on the database for every piece of data, or should they try to localize some of it?

    Finally, you reap the benefits of caching NOT because the database is slow, but because you have to serialize the data across the wire. Sure, the database may cache things great, but that does you no good unless you can get that data back to your app - and that is slow compared to in memory cache.

    Whacking the database for everything is not really a scalable approach. Perhaps you should reconsider your thinking - caching can be a good thing when done right.

    Ryan
  44. Ryan,

    Duplication of data is always a great source of troublemakers. That is why for instance it is so important to normalize the database in the first place.

    The aspect of sharing the data with other users and applications is very important - fundamental. This is what business computing is all about!

    It seems to me that the Unix and the Windows world have very different views of users. Correct me if I am wrong, but the Unix idea of users is to keep them down as much as possible and hit them hard if they try to stick up. On the contrary in the windows world I can do nothing without thinking "how shall I make this easy for my superusers". So they can make their Crystal Reports, Excel aggregate summations, OLAP analyzes and so on, without the business world would come to a stand still tomorrow.

    This is reason number one that I can not use cashing. The database can at all time been updated from some other source.

    Second. If you just keep to this simple rule: Always stateless, then you will have no problem with scaling. At least I never have succeeding making my database even sweat a little. (BTW what do databases do in between the calls, do they have a hobby?). So why shall I make a problem with something that was not a problem in the first place? Isn't it unnecessary to "create" problems?

    EJB should only be used in the place of Corba IMO and even then only when for some reason you can not use SOAP/Web Services.

    "It is not much use when the tools that are supposed to help you with the problem are more complicated that the problem that are to be solved".


    Regards
    Rolf Tollerud
  45. Rolf,

    You're spewing garbage. First of all, OS has little to do with usability.

    But if it did, take for example, my boss. Head of the department of Computer Science at Oregon State University, *Intel* Faculty Fellow, Director of the Northwest Alliance for Computational Science and Engineering (NACSE, where I work) and recently conferred the status of Senior Fellow at IEEE (highest you can get I think) for.......

    That's right, "For technical leadership in improving the usability of computing technology." -- IEEE. Now all that sounds pretty good to me.

    Did I mention that we are primarily a Unix shop?

    Some people, including NACSE and a lot of other development shops, spend an aweful lot of time working on usability. Despite my sarcasm, our usability work has ZERO to do with what operating system we use.

    -Jason McKerr
    Northwest Alliance for Computational Science and Engineering
  46. Jason,

    Sorry Jason, my respect for "Heads of the department of Computer Science" is zero and my respect for "Computer Science" and impractical theorists in general is very low.

    Programming is closer to an artform than to a science IMO. Do you think you can teach people to become Michelangelo? And in that case who will be the teachers?

    Regards
    Rolf Tollerud
  47. Domenico Ghirlandaio? Even Michelangelo studied his art..
  48. <Rolf>
    Programming is closer to an artform than to a science IMO. Do you think you can teach people to become Michelangelo?
    </Rolf>

    Are you saying that nothing you know about programming was taught to you. That you never learned any designs/techniques from those that have come before you. That all of you development knowledged was "gifted" to you as an artist. That programming isn't about applying proven solutions to new requirements in a similar contexts (like other "sciences"?

    Damn, I guess you really are a genius. When does your code gallery open :-)

    Ryan
  49. Ryan,

    <q>Damn, I guess you really are a genius. When does your code gallery open :-)</Q>

    Certainly I am not a genius. I learn from reading code done by people better that myself, people like Jurgen Hoeller, Vic Cekvenich, Jonathan Gibbons, Mike Spille, Sandeep Dath..

    In other words, someone with a lot of real world experience. Then you have authors like Rod Johnson and Bruce Tate.

    It so happens that Java attracts people that are interested in computer science - it is very unfortunate. They who can do, the others teach.

    Regards
    Rolf Tollerud
  50. <Rolf>
    Programming is closer to an artform than to a science IMO. Do you think you can teach people to become Michelangelo?
    </Rolf>

    <Ryan Breidenbach>
    Are you saying that nothing you know about programming was taught to you. That you never learned any designs/techniques from those that have come before you. That all of you development knowledged was "gifted" to you as an artist. That programming isn't about applying proven solutions to new requirements in a similar contexts (like other "sciences"?
    </Ryan>

    I'm guessing that you aren't an artist, or haven't had one in your life (please correct me if I'm wrong). The artists that I've known (well, the ones that showed real talent IMHO) had enormous training. They study the masters. They learn various mediums (sculpture, oils, pastels, old junkyard metal :-). They study interactions of lights and shadow, the human figure, perspective.

    After all that study - some of them excel at making cartoons. Others at selling ocean oils at the local mall. Some make millions bolting together old iron junk. And every once in awhile there's a genius who creates a new genre. But they never would have gotten there without study - and none of their study has much to do with engineering, but instead with creation.

    Scientists, lest we forget, discover fundamental laws of nature through postulation and experimentation. There _is_ no fundamental law in the background of computer programming - it's all made up. HTML, Java, .Net, and COBOL have no basis in reality, they're all made up by people to get a job done. Our only scientific reality is processor speed, bus speed, memory size/speed, I/O throughput - which increases every year. We deal in ideas, not science.

    Likewise - there can be no engineering without a science to back it up. Feel free to through out Big O notation and algorithms if you like - but you'll find that's all mathematics, not science or engineering.

    Not all artists are Michelango, for sure. But please don't mistake an artist for an EE, or Mech E, or a physicist, or biologist, or....there's just no fundamental aspect of nature behind any of it that qualifies as science or engineering. Ultimately it's pure thought as expressed by today's (soon to be subsumed) computer.

        -Mike
  51. Mike,

    This is unbelievable, are you deliberately misinterpreting? I should have known better than to introduce some non computer subject in TSS. Ok, then again:

    I don't say that you do not learn from others.

    What I was (am) saying is that you will not find them (persons to learn from) in any school, but in the real world (as Michelangelo).

    And to whatever programming is a science, please. Science requires controlled experiments that can be repeated in other laboratories and agreed upon.

    People in TSS for example can not agree even on the color of an orange.

    "Computers are useless. All they can do is give you answers."--Pablo Picasso

    Regards
    Rolf Tollerud
  52. <Rolf>
    And to whatever programming is a science, please. Science requires controlled experiments that can be repeated in other laboratories and agreed upon. </Rolf>

    Broaden your horizons, man. While I agree with you in general principle that there are aspects to programming that are more "craft" (which I prefer to "art", because it implies an apprentice/journeyman/master progression)than "science", your baseless denigration of computer science is ignorant, uncalled for, and straight-up makes you look like an egocentric idiot. For reference, try http://dictionary.reference.com/search?q=science .

    sci·ence n.

    1. The observation, identification, description, experimental investigation, and theoretical explanation of phenomena.
    Such activities restricted to a class of natural phenomena.
    Such activities applied to an object of inquiry or study.
    2. Methodological activity, discipline, or study: I've got packing a suitcase down to a science.
    3. An activity that appears to require study and method: the science of purchasing.
    4. Knowledge, especially that gained through experience.

    Anyone who can't apply definitions 2, 3, or 4 to computer science is just deluded. Just because YOU program and are not a computer scientist does not necessitate that no computer scientists are programmers.

    We now return you to your regularly scheduled discussion on caching.
  53. John,

    You can call it what you want John.

    I am happy as long as I do not have any "Computer Scientists" on my team.

    Regards
    Rolf Tollerud
  54. Rolf, you're a prickly customer all right :-)

    I'd like to know what your ideal developement team would look like, and what sort of technologies you would use?

    You really seem to be a fan of the 'Techies are the High Priests of the Machine God' style of thinking.

    Thats very attractive to us techies, unfortunately the business people dont like it, and they control the purse strings now...
  55. Hi Rolf,

    First of all, I'm not sure if I really deserve your praise, but thanks for it anyway ;-) Funny that I'm a computer scientist originally... in the sense that I hold a M.Sc. degree in computer science.

    But I agree with you that pragmatic thinking and experience are far more important than any degree. I know some computer science graduates that are very fond of the academic approach of solving "pure computer science" problems instead of the customers', even at rather small software companies. I wouldn't invite them to join my current team, for that matter.

    Regards,
    Juergen
  56. Rolf,

    >>"EJB should only be used in the place of Corba IMO and even then only when for some reason you can not use SOAP/Web Services."

    When someone uses "should only", "never" and "always" when giving technical advice, this is really spreading FUD.

    Why do you lurk here when you don't give sound J2EE advice?

    What is your agenda?

    I remember you writing how great Visuial Basic is for developing software, thus, how can you be credible?

    Stick to .NET, so you can point, click, drag and drop your way to code.

    J2EE is not for "non-technical" programmers...
  57. George,

    There is so much say about Visual Basic and EJB.

    (old) Visual Basic is still after five years a better "more cost effective, faster execution" way to make windows client programs than Java + Swing. Despite the fact that there has not been a new version in five years..Can be regarded as one of the biggest successes in the history of computer software.

    EJB have a certain place instead of Corba but all in all must be regarded as of the biggest failures in the history of computer software.

    <Q>J2EE is not for "non-technical" programmers</Q>

    Is it me or you who are the non-technical programmer? Please explain.

    Regards
    Rolf Tollerud
  58. Rolf...I'm an information architect, not a developer, but in my experience, I'd much rather work with Java-oriented/Unix/Linux people. They're more open to thinking of the business domain apart from the technology itself. They are much more supportive of the usability work I do. They don't freak out when something comes across to them that doesn't fit the current application or technolgy they're building. They work with me and respect my job.

    Sorry, but I've been doing this since 1998, and that's been my experience. It's the reverse of what you said here. I don't like absolutes, but I'd choose those developers to work with any day, and they with me.
  59. <Ryan Breidenbach>
    Why not cache? So what if the database scream. Getting data from intra-JVM cache WILL BE FASTER. Why is wanting to do this a bad thing?
    </Ryan Briedenbach>

    Short answer: because it's more complex.

    Medium anser: Well - will it really be faster? Many people assume that a cache is always faster, even a naive one. Then they experience cache thrashing, deadlocks, and bottlenecks because the cache isn't well-tuned to the application usage patterns.

    <Ryan Breidenbach>
    Finally, you reap the benefits of caching NOT because the database is slow, but because you have to serialize the data across the wire. Sure, the database may cache things great, but that does you no good unless you can get that data back to your app - and that is slow compared to in memory cache.
    </Ryan Breidenbach>


    Do you say this because you've tried it and found it slow? Have you timed the cost of various SQL requests and determined that they're too slow?

    Or are you just assuming that an application level cache always wins?

    <Ryan Breidenbach>
    Whacking the database for everything is not really a scalable approach. Perhaps you should reconsider your thinking - caching can be a good thing when done right.
    </Ryan Breidenbach>

    Funny, I've done just that on several projects that needed to support hundreds of requests per second. And the database (sans application cache) scaled beautifully.

         -Mike
  60. Mike,

    It's great to hear that you have written high-volume, clustered apps that do not use cache and perform well. So have I. That is not the point.

    The point is that there are several circumstances (not even complex circumstances) where caching makes sense:

    1. Data that changes predictably.
    2. Data the rarely changes.
    3. Data that you can live with being invalid (that will later be validated at a critical point).

    If caching data that fits these one of these profiles leads to significant performance boost, I say cache away!

    To say caching is always bad is as naive as to say caching is good for all circumstances. And in answer to your other question - yes, I have compared cache/no cache performances serveral times. Sometimes the performace payoff is not worth the complexity, but sometimes the performace payoff is HUGE. Especially when multiple complex queries are needed to build the object graph that is cached (for example - a dynamic HTML page that is made up of many page components that can each be individually updated and given date/time criteria for which they are valid).

    Finally, in my original post, I noted that database lookups (unless they are quite complex) lose most of there performace going across that wire. I noticed you never ackowledged that portion of my post. Why is that?

    Hope this clarifies my stance.

    Ryan
  61. Hi,

    Here is my 2 cents. EJB are an overkill for 90% of webapps.
    Dont get me wrong, I love EJB's but I have seen so many "EJB Enterprise" guestbooks app out there, its crazy.
    Afterall there are only a hand full of Enterprise level clients in the world anyway ;-)

    If you follow the desing patterns for EJB,or try it, you will see that JDBC is preferred for reading. So that is ussually 90% of the webapp. Updates are in most cases confined to a single row and table, nevermind databases, so transactions are used with optimistic row locking.

    Here is my criteria for EJB:

    1) Use only for domain objects that are mutable.
    2) and only when updates span tables and db's or concurrent updates could occur. (Financial and Healthcare worlds)

    Session beans are more handy , but are mostly used in conjuction with EJB in the Facade pattern, or as Business rule containers.

    EJB does add certain Panache to any app, but trust me, they are not as portable as one would like.

    In the business world open source = "who's line is it anyway" ;-) and for companies to gain an competitive edge, they need to add and tweak features so BOOM! there goes the cross vendor spec.
  62. Really simple JBoss getting started guide:

    http://sammaher.com/jboss/

    More to come,
    toastchum
  63. <Ryan Breidenbach>
    It's great to hear that you have written high-volume, clustered apps that do not use cache and perform well. So have I. That is not the point.
    </Ryan Breidenbach>

    I believe that's the fundamental point. You can write high-volume, clustered apps that don't use a cache and perform well. That one statement speaks volumes. Specifically: you don't need a cache for general purpose use.

    <Ryan Breidenbach>
    The point is that there are several circumstances (not even complex circumstances) where caching makes sense:
    1. Data that changes predictably.
    2. Data the rarely changes.
    3. Data that you can live with being invalid (that will later be validated at a critical point).

    If caching data that fits these one of these profiles leads to significant performance boost, I say cache away!
    </Ryan Breidenbach>

    I agree with your list, but I'll take your last statement and turn it into a fourth point:

    4. The cache makes a perceptible difference to users under load.

    Without point 4, there's no point in caching. As to "cache away" - perhaps. But keep in mind that this can come back to bite you hard at a later time. Like it suddenly becomes important that the data is indeed valid. Or the data's not so read-mostly anymore. This happens more often as projects progress than some cache advocates care to admit.

    <Ryan Breidenbach>
    To say caching is always bad is as naive as to say caching is good for all circumstances. And in answer to your other question - yes, I have compared cache/no cache performances serveral times. Sometimes the performace payoff is not worth the complexity, but sometimes the performace payoff is HUGE. Especially when multiple complex queries are needed to build the object graph that is cached (for example - a dynamic HTML page that is made up of many page components that can each be individually updated and given date/time criteria for which they are valid).
    </Ryan Breidenbach>

    I don't believe I said that caching is always bad. I think caching should be used only when there's a demonstrable need for it - and that those turn out to be rare occasions, not the norm.

    <Ryan Breidenbach>
    Finally, in my original post, I noted that database lookups (unless they are quite complex) lose most of there performace going across that wire. I noticed you never ackowledged that portion of my post. Why is that?
    </Ryan Breidenbach>

    Possibly because it's not a very valid argument. What is the cost of going over the wire? This may shock some, but if your RDBMS machine and app server machine are on the same LAN segment, "going over the wire" is sub-millisecond for many queries. The impact of easily available 100MBit ethernet connections has been overlooked by many, and "serialization" is only a significant burden if you actually are letting Java literally serialize an object for you. When doing a query via JDBC, no Java serialization is happening.

    I'll say it again - time your total database times round trip under load before you say it's too slow. For some cases it may indeed be slow, but I assert that these are the exceptions, not the rule.

         -Mike
  64. Mike: "I'll say it again - time your total database times round trip under load before you say it's too slow."

    Absolutely. A customer we're working with sees an average of over 200ms for reads and 800ms for writes for their user data, which is used and updated by most requests based on the business requirements of the application. Their application scales beautifully and cost-effectively, except for the database. So they cache.

    Caching is almost always the cheapest way to get more performant scalability out of an application. Not all apps require "clustered" caching or fancy-shmancy distributed architectures. Even with single-server environments, caching can be a huge win. Under load, no database that I've seen handles anything in 1ms. From our testing, a tuned large-scale Oracle server on a 100Mb network under reasonable load will turn around simple queries out of cache in about 11ms. If the data were pulled from a cache in the JVM, the latency would be around 0ms.

    That's why databases cache. That's why operating systems cache. That's why CPUs cache. Reducing latency, particularly inside the big-O, means significantly better performance improvements.

    As for relying on JDBC to be scalable, that's not a bad idea for most applications. Relatively few applications require the complexity that most application developers enjoy employing. Furthermore, most applications don't have to support a massive amount of load. Caching is about "bang for the buck". It also follows the laws of diminishing returns. You can get a lot of bang for a little buck at first (actually saving huge amounts on hardware and software in some cases by caching). By the time you get to things like distributed transactional caching, very few applications actually need it, and the complexity is higher, and the cost to purchase and implement the cache is higher.

    A developer should neither dismiss nor assume caching without thought. Likewise, evaluate the problem before oversimplifying or overcomplicating the design. Make it as simple as you can, but no simpler.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  65. That's why databases cache. That's why operating systems >cache. That's why CPUs cache. Reducing latency, >particularly inside the big-O, means significantly better >performance improvements.


    and by the way, MS PetShop from the latest test does the
    same. See their code:

    HttpContext.Current.Cache.Insert(CACHE_KEY_PRODUCTS, products, null, DateTime.Now.AddDays(1), Cache.NoSlidingExpiration, CacheItemPriority.High, null);

    and it is for SQL code like that:
    SELECT ProductId, Name, Descn FROM Product

    Dmitry Namiot
    Coldbeans
  66. I get the feeling from this discussion that the benefit of ejb caching seems quite related to how optimistic one dares to be.
  67. Mike,

    I think we agree for the most part, especially about only caching when "The cache makes a perceptible difference to users under load." I never meant to imply that you thought caching is always bad. I was originally replying to Rolf, who was saying that app servers shouldn't "help" the database through caching.

    The decision to cache should be made like any other performance optimization decision - don't do it unless you have to. Getting data straight from the database is the way to go unless there is an unacceptable performance penalty in doing so. This is when you examine whether or not caching can solve your problem.

    As for whether cache is faster than getting data from the database - it is. Period. It doesn't even make sense to argue otherwise. I don't care how fast your network is, how fast your hardware is, how fast your database is - it is not faster than in an memory read. That said, worrying about micro-optimizations (e.g. 2ms vs. 18ms) is pointless, so caching won't be needed in a great number of case (the majority by far). However, caching has its place and should not be dismissed.

    Enjoying the discussion, Mike.

    Ryan
  68. I thought we were discussing EJB caches, not caches in general. If you don't use EJB you can use some cashe system if you can take the penalty of not sharing the database.

    The big mistake is using EJB at all. The performance drop is significant and no cashing in the world can help you back to the performance for example Resin.

    <Marc Fleury>
    EJB’s are natural caches. I think that one of the greatest things EJBs have introduced is a formal requirement, at the specification level for distributed middleware cache. This is really powerful, as having the data from the back end systems already mirrored in the java layers, means blazingly fast access times to the data from the web applications. Working with data from cache, meaning from memory, as opposed to retrieving it say from a database or another VM can yield 10x to 50x speed increases. We are not talking about 20% increase or 50% increase or whatever code-clowns like to call optimization, we are talking about orders of magnitude faster
    </Marc Fleury>

    This is quite simple not true; it has been discussed extensively in TSS before.

    So - you don't need EJB for "distributed middleware cache", and therefore, you do not need a J2EE compliant Server at all (in non Corba situations).

    With this statement Mark have taken his place besides Larry Ellison in the department of exaggerate downright lies.

    Regards
    Rolf Tollerud
  69. <Ryan Breidenbach>
    As for whether cache is faster than getting data from the database - it is. Period. It doesn't even make sense to argue otherwise. I don't care how fast your network is, how fast your hardware is, how fast your database is - it is not faster than in an memory read. That said, worrying about micro-optimizations (e.g. 2ms vs. 18ms) is pointless, so caching won't be needed in a great number of case (the majority by far). However, caching has its place and should not be dismissed.
    </Ryan Breidenbach>

    You're making a big assumption here - that the data you want is in the cache. This is a key point that's easily missed.

    For example: let's say your cache size is 1000 objects. Average users are requesting, say 10 objects at a time, with some overlap. Great - your cache is in good shape. Let's say there's 20 of these guys doing simple, light requests.

    Now a super-user-type (floor manager or what have you) comes in and regularly requests 900 objects at a time.

    What happens? Cache thrash!! Your super-user-type is going to be repeatedly flushing out the results of all the users with his own stuff. In this very-real-world situation, your 10 users doing light 10-object requests are rarely going to get a cache hit.

    A cache is only fast when it generates alot of hits. A cache that's too small, or a cache that doesn't adapt to pattern of incoming requests will actually be noticably slower than no caching at all, because you spend significant amounts of time re-filling the cache constantly.

    This is why you only want to cache very specific items after alot of analysis. If your data is too changable, or vulnerable to cache-thrash due to large requests, you can run into problems. Going back to EJB caching, the pessimistic locking alone can also show grossly worse performance than avoiding EJBs altogether and hitting the database "naively" using JDBC.

        -Mike
  70. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    <mark>You should have a choice.</mark>

    Sounds like utopia (ie NOT the real world). There are many, many, many corp IS departments out there who are a cog in the enterprise that (necessarily) don't control the enterprise data architecture and are bound to real business constraints in competitive environments (aka schedules driven by products and services rollouts). These are the challenges many of us deal with everyday. We architects and engineers need to remember who signs the paycheck, and strike a balance between the utopian solution and what can really get done for the sake of the business, all without proliferating stovepipe solutions. No room for elitism.

    <mark>If the database already exists then you probably are already in trouble - on many levels</mark>

    These are the challenges for which we get paid. Multiple apps writing to the same DB happens all the time (heard of EAI?). It requires discipline and coordination in design and change control. It's a reality of a heterogenuous IT environment - particularly those in M&A-heavy businesses. I don't know about anyone else, but we can't afford to throw away legacy DBs with many years (20+ here) of business, technical, and operational inertia for the sake of purity.

    Mike
  71. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    Mark,

    Different ways to keep the business logic in synch. One of the best ones - do not store data that could be shared in memory. If you really need them there (i.e. for caching purpose), you can invalidate using RPC (i.e. RMI or Web Services calls) or JMS from the second application. You can setup invalidation timeouts as well (it's great that JBoss can do that with CMP, but others can't).

    Alex
  72. After reading this article and some of the other specs I am getting the impression that as an industry we have very myopic view of the "enterprise" data access, transactions, and persistnace issues. In their current state EJBs, JDO, ADO, with any application server can&#8217;t truly solve "enterprise" data persistance needs. They are just point solutions that solve quite a few problems but not all of them especially at the enterprise level. Now that I have made this bold statement let me explain this by providing some background on the "enterprise problems" that I have seen.

    Typically in an enterprise applications are written in different technologies using different data sources (databases, files etc.). Every once in a while someone in the corporate comes to the realization that why don&#8217;t we consolidate our data in a central place or use same technology for all the applications. Rarely organizations succeed in these endeavors and if they do get lucky, they can use the single technology for the data access/transactions and have caches that support transactions and read/write transaction. But even if they are successful the success of centralization and standardization doesn&#8217;t last very long and business goes and acquire companies that have totally different technology, databases schemas and asks the IT organization to go and integrate the data sources and applications. Pretty soon the smart IT managers realize that there is no way they can keep up with continuous migration of applications and data to a "corporate standard platform". This realization lead to a different approach to integration that is instead of trying to consolidate data and applications why don&#8217;t we let the applications talk to each other regardless of technology and language they are written in. This approach has evolved to a point where with SOAP there is some promise to be successful as all the vendors are planning on supporting it including Microsoft. Still the problem that is not being address is that the corporate data is sitting in different data sources and one needs a central transaction manager that can handle transaction on the logical business objects that are distributed in various data sources. Interesting constrain is that this transaction/data persistance layer should support different applications written in different technologies.

    In my opinion ideally we need a data persistance layer that support transactions on distributed data sources and provides a generic (e.g. SOAP) interface that every application can use regardless of language in which the application was written in. To truly solve the enterprise data persitance/transactions needs we need a solution that supports the following:

    1)Can be used by various applications using different technologies e.g. J2EE, .Net etc (e.g. SOAP support for both synchronous and asynchronous data exchange).
    2)Distributed Transactions and support for long running transactions.
    3)Allow homogeneous access to enterprise data utilizing logical data objects instead of a varying set of heterogeneous physical access methods. Basically solves the problem of accessing logical objects in various data sources. Different clients to the layer may be interested in accessing the entire logical objects or different subset of the logical business object.
    4)Distributed object locking (optimistic as well as pessimistic)
    5)Integration and synchronization with the distributed backend data sources.
    6)Object caching.
    7)Give separate access to the reporting tools so that they can leverage their own core competencies, but through the same style of abstraction as other clients.
    8)Dynamically generated Data Access Object (DAO) to alleviate the need for regenerating and redeploying the code at the server side to reflect schema changes at the enterprise level.
    9)Notification (event to interested parties) in case there is a change in the object state.
    10)High availability and fail over support.


    Jamal
  73. "After reading this article and some of the other specs I am getting the impression that as an industry we have very myopic view of the "enterprise" data access, transactions, and persistnace issues."

    I agree that the article barely scratches the surface of persistence problems (basically it says that it's good to have a cache ...) But I don't know that the whole industry is miopic. If this article represents the state of the industry, then yes - does it?

    Thank you for your comments on the true challenges of accessing enterprise data. IMHO, I think the SQL specification is obsolete - especially concurrency control - and that the time has come for a more sophisticated approach. Unfortunately all the really innovative database efforts are for object databases, and the relational and object industries don't talk to each other.
  74. "If this article represents the state of the industry, then yes - does it?"

    At least I haven't seen any technology/standard that provides a comprehsive solution to the enterprise data persistance/access issues listed in my original post. Given that JBOSS is the leading solution for the "enterprise" data persistance issues then arguably we are in a very primative stage and we as an industry are not even thinking about solving the true enterprise data persistance/access challanges....

    Jamal
  75. Floyd,

    spot on ! The lack of cacheing was something that surprised me for an 'enterprise' solution that was aimed at high scalability as the lack of re-use of data made one wonder what the advantage of EBs was. The 'roll your own' approaches to read only data was another mess since an EB 'Singleton' was not supported in the spec. Also the simplistic approach of DB to EB mapping taken prior to EJB 2.0 was a problem. Post EJB 2.0 I think the mapping is still a mess and convoluted to the point where I think EBs may be dead. Remember the old saying 'first impressions are lasting impression'? I think that Sun put the persistence aspects of EJB into the 'too hard' basket early on which was a major shortcoming. By experience, 80% of performance (excluding 'bad' design) tends to be tied back to database access so I would have thought that this might have been considered in the spec rather than ending up with the mess that we have now. The lack of usefullness of the spec with regard to persistence is clearly shown by the number of O-R mapping tools that are being used.
  76. I disagree that EJB was too much incomplete when it came out.

    Good and usable things take time to develop (anyone who used windows before 3.0 ?).
    Any research takes time to make it in the real world, and takes time to be refined to work. EJB is no different. .NET is no different IF it is new technology (which it probably isn't so it works from day one).

    Java only recently became able to replace the previous wave of technologies in the server side, which were CGIs written in C.

    I am also happy that Marc is telling the truth about SOAP, which will put applications and computers on their knees, and we developers will be executed for that extra millisecond of computation while it takes seconds to read all the useless bytes of xml, verify them, parse them and finally starting to do something with it.

    At the same time RPC is the only way for reliable computing. You need distributed computing because that's the way your brain works and it proves that millions of slow processor will beat any BillionHZ computer.

    It's only time before the distributed revolution will become mainstream and the server-client model (along with EJB's, web-servers, HTML, web browsers etc...) will die.
  77. Mr Bonechi writes

    "...truth about SOAP, which will put applications and computers on their knees, and we developers will be executed for that extra millisecond of computation while it takes seconds to read all the useless bytes of xml, verify them, parse them and finally starting to do something with it..."

    Sorry to have to tell you Marco, but you are in the minority. Web services/SOAP is a Good Thing (TM) and it's going to become ubiquitous.

    The rule is simple, in general anything that adds machine instructions but simplifies things for people is a good tradeoff. SOAP does this over RPC, Corba and any other binary distributed IPC system.

    Previous examples of this principal, O/S's, GUI's, Assemblers, Compilers, Shell scripts, HTML, java on so on.
  78. Probably because the 'majority' know jack about
    the problems of distributed systems...SOAP is a solution
    for a small problem space that is being applied
    to a much larger problem space - SOAP will not simplify
    it will add complexity, create fragility and dramatically increase the cost of change in large systems.
  79. Right. Almost every legitimate use of SOAP is better implemented directly as XML messages over RPC, or HTTP, or even SMTP,. Yes, web services are a good way to interoperate between disparate systems, but I can't think of a good synchronous use of SOAP, except Microsoft's original idea of controlling your computer from Redmond by pushing everything through port 80 to circumvent firewalls.
  80. "The rule is simple, in general anything that adds machine instructions but simplifies things for people is a good tradeoff. SOAP does this over RPC, Corba and any other binary distributed IPC system.
    Previous examples of this principal, O/S's, GUI's, Assemblers, Compilers, Shell scripts, HTML, java on so on."

    Based on my experience, I'm forced to disagree. There's a big difference between languages, runtime systems, and data transfer protocols.

    The examples you cite are all runtime systems or languages, and that's very different from a protocol for transferring data. Super compilers and ultra-smart runtimes, in addition to ever increasing processor speed can perform miracles in turning what used to be a slow language or system into a faster one. But a fat protocol will always remain fat.

    Transferring XML back and forth between systems is going to be very expensive for the foreseeable future. Besides the obvious size of XML messages, every language out there is effectively going to have to translate their binary representation into text-based XML, and vice-versa. This may seem cheap - until you have to do it hundreds or thousands of times a second. And the killer is, you can't optimize it, unless you happen upon a language that uses XML as its native data type.

    I've seen some systems become successful using various canonical/self defining data formats in vertical markets, such as the FIX protocol in brokerage houses. But unfortunately, XML is far away the fattest data format I've ever seen. The redundancy is just incredible. It may seem like a great thing - until you need to parse 1,000 of them a second, with each message 5K or 10K in length. Or you have a 50MB log of XML data to deal with.

    The only thing XML has going for it is ubiquity, and ease of parsing. Ever wonder why people convert XML streams into something native as early in their software layers as possible?

        -Mike
  81. <quote>
    Sorry to have to tell you Marco, but you are in the minority. Web services/SOAP is a Good Thing (TM) and it's going to become ubiquitous.
    </quote>

    To be in minority does not mean being wrong....
     
    I would agree that _Document_ oriented model has some sense but XML based RPC implementation looks... inefficient at least. Axis, GLUE and so on require exact same steps as ORB tools during the development and at runtime.
    WSDL->wsdlc-> Connection library generation + runtime library.
     
    That is convenient enough but price! XML based RPC is 15-50(fifty!) times slower than "slow" CORBA implementation in JDK, not to mention lack of features...
     
     
    ( I do not mention Soap::Lite like approach because I do not like lack of type control. I do not like hashtables and java collections by the same reason: I never know what might be inside.)
     
    Hey! What is really wrong with binary transport protocol? If it works (means that a library/tool implements it correctly) nobody would ever bother to look at the transport layer. Does anybody try to look at TCP/IP protocol messages recently? HTTPS?
     
    Why not to use XML instead of TCP/IP for readability?
     
    How many of us are trying to work with HTTP messages directly rather than rely on HttpRequest API?
     
    I doubt that somebody_really_ wants to see XML messages when does RPC. We have to look at them sometimes, but it is just because SOAP frameworks simply do not work correctly.
     
    Hashtable/key-value pairs are usefull, but it does not mean they should be always used instead of data structures and classes.
     
    PS: I love EJB.
    Did you noticed that RMI is stepping back in favor of IIOP?
    Did you noticed that .NET uses .net remoting protocol (it is binary, not XML based) for something other than "Hello World"?

    Good article Marc!!!
  82. Hi Konstantin Ignatyev :

     I agree with your idea. Actually when I first see XML as transport protocol, I know it is going to fail.

    James
  83. Konstantin,

    >I do not like hashtables and java collections by the same reason: I never know what might be inside...

    Queer approach...

    Alex
  84. JBoss rocks, no doubt, but I don't think Marc should write off JDO so easily...I don't see JDO as threatening JBoss in any way. JBoss offers tons of great things besides CMP.

    One of the major advantages of JDO over EJB 2.0 CMP is that there is no need in JDO to deal with relationships explicitely. Though relationships are not particularly tricky with EJB 2.0 CMP there are hassles such as creation of redundant value-object graphs. Basically I see entity beans trying to cobble functionality that JDO has inherintly.

    As with everyone else who has worked with entity beans through a few versions of the spec, it has been dissapointing to not see clustered EB coherence as part of the spec, and to have to rely on proprietary "smart stubs" or worse, implement your own cache invalidation layer using JMS (as I saw as a pattern presented at Java One in '01).

    I hardly think CMP will eclipse JDO ... quite the opposite, many people who are excited about JDO are particularly interested in JDO because they have learned the ins and outs of EB CMP and are ready for "the right thing".
  85. Wow! Is this list becoming useful?
  86. I believe JDO (or writing you own persitance framework)
    is still better then EJB 2.0. I was a defendor of the entity
    bean when EJB 1.1 first came out and I was really deceived form their performance and usability.
     
    I have hoped that the 2.0 spec may resolve the issues of performance of
    EJB 2.0 but I am still not convinced of the results that's why (2 main reason, still I have others):

    1-If you need to install your application on many application servers you have
    to retype the specific xml files (jbosscmp-jdbc.xml weblogic-rdbms-jar.xml ejb-inprise-xml etc ...).
    And maintain these files and tests them all for each release.

    Using JDO you have only ONE xml file for all Application Servers.

    The application server vendor (or free provider)
    (Mark Fleury for example) may want to SELL the idea of CMP since JBOSS manages CMP 2.0, but using it will make
    you depedent of JBoss (what if a prospect refuses the change his application server since already he owns another:
    imagine yourself retyping the weblogic-rdbms-jar.xml and including the
    relations that you may have used since they are available in CMP 2.0, if you had 100 entities
    and about 50 relations between them)

    2-If you need to access a read-only collection (like the catalog in the petstore example)
    and you used ENTITY beans (which is not recommended by the blue-print even the latest petstore1.3.1 )
    you still have the drawback of the container maintaining the references after calling the home finder
    (I admit that local references, if you used EJB 2.0 local home - and pass by reference of the objects
    that come whith local interfaces give a boost of performances)
    but the container still need to maintain the object in a memory cache after the call of the finder until
    the remote (or local) call to the business methods
    of each of the objects found is invoqued
    (for greedie finder, OR get them one by one - ouch - from the DB when getting the values lazy finder )

    This can be an advantage (the caching of entities)
    if the same data is accessed regularely but what
    if data is personal to the user conneted and not sharable.

    Using JDO (for an stateless session bean as a façace to JDO entities instead of EJB entities)
    will give you the read only collection of data you
    need without the need for the container to maintain the data in a cache between home finder
    and business calls.

    AND you can, if you needed to cache the data (for the data that need to be shared)
    manage your own cache using for example stateless session beans of servlet context.
    (depending on the level of distribution you need - without depending on the
    clustering option specific to an application server)


    3-The recommendation of using a session as a facade to entities (the one Mark Fleury defended in the begining
    that have become a standard now) - Coupled with the
    new recommendeation of accessinng entity beans using
    local interfaces (EJB 2.0) is a must

    SO WHY use entities since you have all the advantages
    of EJB in the session façade ? (transaction, distribution ,
    caching etc ...)

    I think JBOSS is great, Open source IS great, JDO is great
    but CMP 2.0 is a trap (like the CMP that came with EJB 1.1)
  87. Typing deployment descriptors should not be a concern. One thing you need to realize is that EJB is a spec written for code generators. I don't think Sun envisioned that everyone would write EJB and deployment descriptors by hand. Most of EJB spec is boiler plate. Most can be automated or generated.

    For example, you can use XDoclet[1] to generate most EJB boiler plate code and deployment descriptors for you. Most of the so-called "enterprise IDE" also have something like this.

    Well, even taking out the retype/boiler plate factor there is still something wrong with EJB... but that's another thread.
  88. Gabriel,

    <quote>
    1-If you need to install your application on many application servers you have
    to retype the specific xml files (jbosscmp-jdbc.xml weblogic-rdbms-jar.xml ejb-inprise-xml etc ...).
    And maintain these files and tests them all for each release.
    </quote>

    When I use an EJB 2.0 based app server that has its own CMP engine (many good ones do - oc4j, jboss, (I am assuming) weblogic etc) - I put my relationships in ejb-jar.xml and that's it. So just one standard file to deal with, which should deploy across all app servers.

    One thing I have notiched though is that these EJB 2.0 CMP engines are still maturing. One advantage to your point of using some other persistence framework is that the behaviour will be consistent across app servers, which isn't always true with some vendors versions of CMP. But those engines are always improving.

    Of course, each app server will have its own tuning mechanisms and that will typically require you to fiddle with the various config files specific to the product. So if you are deploying on multiple app servers, you will need to work with multiple types of config files, JDO or not. But such is life.

    Cheers
    Ray
  89. Ray

    "Of course, each app server will have its own tuning mechanisms and that will typically require you to fiddle with the various config files specific to the product. So if you are deploying on multiple app servers, you will need to work with multiple types of config files, JDO or not. But such is life. "

    NO, when unsing JDO (or Toplink )you can deploy your work on any application server whithout being concerned about
    the mapping (provided you are using
    a JDO persitence engine (Or your own engine)
    that is no bound to a particular application server)

    "you will need to work with multiple types of config files"

    Config files are easy to maintain for multiple app servers
    and may not change every day (and the sdandard one the
    ejb-jar.xml may be sufficient)

    But O/R mapping file WILL change with the database every
    time you create a table and the standard ejb-jar.xml is not
    enough

    Cheers

    Gabriel
  90. Marcus

    <quote>
    Typing deployment descriptors should not be a concern. One thing you need to realize is that EJB is a spec written for code generators. I don't think Sun envisioned that everyone would write EJB and deployment descriptors by hand. Most of EJB spec is boiler plate. Most can be automated or generated.
    </quote>

    Provided that you buy the latest developement tool that
    have multiple application server deployment capabilities
    (wich is not always possible)

    Remember when Websphere 4.0 first whent out, and your
    only tool was VisualAge for java 3.5 (knowing
    that the deployment descriptors of Webshpere 4.0 have
    completely changes copared to Websphere 3.5)
    And JBuilder 5 (wich was still in Beta version code
    named darwin ) could only deploy to Websphere 3.5
  91. Gabriel -
    Maybe I'm not understanding what you are trying to say. When I use an EJB2.0 CMP container, my ejb-jar.xml file contains the information I need for my entities and their relationships. I don't need to edit other config files to define those relationships. I can of course tweek various bits in vendor specific files, O/R related or otherwise. Is your point more along the lines of if I am starting with an existing database or if I change a table I may need to change a vendor specific O/R related setting?

    Cheers
    Ray
  92. Ray

    <quote>
    Maybe I'm not understanding what you are trying to say. When I use an EJB2.0 CMP container, my ejb-jar.xml file contains the information I need for my entities and their relationships. I don't need to edit other config files to define those relationships. I can of course tweek various bits in vendor specific files, O/R related or otherwise. Is your point more along the lines of if I am starting with an existing database or if I change a table I may need to change a vendor specific O/R related setting?
    </quote>

    ejb-jar.xml is not enough, you may be using the
    default mapping (which will not work
    if the all the fields in the database respect
    a patterns that will not be the same for all
    the containers. ANd may not be conform to your database
    dictionary (lenght of varchar of numbers etc...)

    Are you using this on a real word project ?

    EXAMPLE :

    In ejb_jar.xml you describe your entities and their fields
    NOT THEIR MAPPING TO the database

    ....
          <persistence-type>Container</persistence-type>
    ...
          <cmp-field>
    <field-name>key</field-name>
          </cmp-field>
          <cmp-field>
    <field-name>value</field-name>
          </cmp-field>
    ....

    in web-logic-cmp-rdmbs.jar you add THE MAPPING

    <table-name>TABLE_NAME</table-name>
    <field-map>
    <cmp-field>key</cmp-field>
    <dbms-column>ID</dbms-column>
    </field-map>
    <field-map>
    <cmp-field>value</cmp-field>
    <dbms-column>LAST_VALUE</dbms-column>
    </field-map>

    in jaws.xml OR jbosscmp-jdbc.xml you add THE MAPPING


    <table-name>TABLE_NAME</table-name>
    <cmp-field>
    <field-name>key</field-name>
    <column-name>ID</column-name>
    <sql-type>NUMBER(10)</sql-type>
    </cmp-field>
    <cmp-field>
    <field-name>value</field-name>
    <column-name>LAST_VALUE</column-name>
    <sql-type>NUMBER(10)</sql-type>
    </cmp-field>



    Cheers
  93. Gabriel,

    Yes, I use entity ejbs all the time in real world projects - thanks for asking! We do EAI/data warehouse work.

    Technically speaking you are correct, of course, additional files are part of the whole process (orion-ejb-jar.xml for instance for orion/oc4j) which contain the mappings as you describe in your posting. However, those files are typically generated automatically from the ejb-jar.xml - we typically don't have to touch those (except for container specific performance tweeks) and so maintenance isn't an issue. We still end up maintaining one file for our persistence++ - the ejb-jar.xml.

    In this real world experience, we find that we only need maintain the ejb-jar.xml in the majority of cases. It has the full entity schema - including table name as the abstract-schema-name. It has the relationships, declarative security and transaction scopes. The only thing left is the java/sql mapping depending on the database(s) used. Most good containers will have a file or group of files that maintain that mapping - orion, for instance, has a set of xml schema defs that provide the java/sql mapping, disallowed fields, etc. Jboss has the standardjbosscmp-jdbc.xml which contains mappings for a large number of databases, along with a few tidbits like sql templates for foreign key constructs and the like. And typically, those mappings don't change and maintenance of the file or files is a non-issue.

    We are after more than persistence and relationship management in our use of EJBs, so it is unlikely that we would go down the route of a different persistence manager even if EJB2.0/CMP required the management of multiple files.
    Don't get me wrong, I absolutely see the value in JDO and the like, but there is certainly value in EJBs as well and their use isn't as difficult as people think.

    Cheers
    Ray
  94. Ray

    So you always targeted the application servers that your
    developpement tool support.

    But I do have some others concerns while choosing the
    elements of an architecture :

    I gave my opinion to Marcus concerning the automatic
    generation of the XML file, and that come from a experience
    I had more than a year ago:

    Marcus said :

    <quote>
    Typing deployment descriptors should not be a concern. One thing you need to realize is that EJB is a spec written for code generators. I don't think Sun envisioned that everyone would write EJB and deployment descriptors by hand. Most of EJB spec is boiler plate. Most can be automated or generated.
    </quote>

    And I answered :

    Provided that you buy the latest developement tool that
    have multiple application server deployment capabilities
    (wich is not always possible)

    Remember when Websphere 4.0 first when out, and your
    only tool was VisualAge for java 3.5 (knowing
    that the deployment descriptors of Webshpere 4.0 have
    completely changes copared to Websphere 3.5)
    And JBuilder 5 (wich was still in Beta version code
    named darwin ) could only deploy to Websphere 3.5

    If you had 100 entities in you application you may
    have spent many days retyping the mapping (
    not to speak about the time you needed to understand
    the new descriptors - this may be fast if you have
    some clear examples)
  95. <quote>
    those files are typically generated automatically from the ejb-jar.xml - we typically don't have to touch
    </quote>

    No, they are generated by JDevelopper, in the ejb-jar.xml you do not have the name of the database field

    Cheers
  96. <quote>
    No, they are generated by JDevelopper, in the ejb-jar.xml you do not have the name of the database field
    </quote>

    JDeveloper will create an orion-ejb-jar.xml file for you, but you don't need it. In fact, we rarely use JDeveloper. You can deploy an app without the orion-ejb-jar.xml file. Depends on how you define your entities I suppose. We either create our entities from scratch, in which case the cmp-persistence fields are the same as the database fields or we reverse engineer the entities/relationships, in which case the cmp-persistence fields are the same as the database fields. The only file we touch is the ejb-jar.xml file - believe me. The app server does the rest. In the case of, say, Orion/OC4J, it generates the orion-ejb-jar.xml for you automatically. We rarely have to touch it for persistence purposes.

    Here are the situations where we have to monkey around with the app-server specific file:

    1.) Tuning the container
    2.) Entities associated with one ejb-jar.xml file refer to more than one database (as opposed to some default data source) - in our case, rare.
    3.) I have some specific persistence requirement which isn't easily handled by the CMP engine. In our case, rare.

    Cheers
  97. Ray

    <quote>
    In this real world experience, we find that we only need maintain the ejb-jar.xml in the majority of cases. It has the full entity schema - including table name as the abstract-schema-name
    </quote>

    ejb-jar.xml is not supposed to cantains the TABLE NAME.
    (not in the spec ? - I havent seen it ?)

    Is this speific to orion application server ?

    ANd the abstract-schema-name is used for the EQL queries
    (nothing to do with the table name and database schema)

    ANd what about the mapping between database fields and beans properties
  98. Gabriel -
    Maybe it is OC4J/Orion/JBoss specific, I don't know - the abstract schema name gets generated as the table (entity) name in the persistence mechanism by default. Which makes sense. Again, we have the luxury of doing that - not everyone does. So when we deploy some app with the ejb-jar.xml, the container generates the tables for us represented by the abstract schema name, by default, the relationship components by default (foreign key relationships or relationship tables), into the default datasource, by default. The container knows the java/sql mappings for the database we happen to be using. Again, if there are special-case persistence needs or other components like dependent value classes, that sort of thing, then this lovely world falls apart and perhaps the simpler concepts of JDO hold sway. Unfortunately for us, we not only use EJBs for persistence, but for all of the other services provided as explained earlier. For our time and money, it is still worthwhile to occassionally edit those extra files. Done right, those edits are infrequent.

    Cheers
    Ray
  99. I love the fact that so many folks are mentioning sharing databases, and asynch data feeds from non java source. This is exactly where I work, and every system I work on will at some point have some of the data taken away from us, to be 'centralised' or some such.

    In this situation EJB's caching is irrelevant. Add replicated data centers to the equation and you can kiss all caches farewell.

    The database is tried, tested, has caching, supports every tech in the world, including custom feeds etc. Presenting any data in any app has to deal with timely and correct data - i.e. it aint worth caching much. And even reference data has to poll to db for updates, providing the db design supports audit, and so many folks leave it out of reference tables, so you can't even cache them.

    Therefore, EJB is a waste of time on any product with a lifespan of more than 12 months - especially for global enterprise infrastructure systems.

    Hey, I use them on websites, just to keep the skills up, but websites aren't rocket science and non EJB O/R does the job with far less effort.

    Jonathan
  100. ANd in the petstore exemple and the reference EJB container
    of SUN this is decalred in the
    sun-j2ee-ru.xml in a more detailed manner :

    If it was in the sepce (the table and field name in
    the ejb-jar.xml file) SUN would have used if in
    the petstore :

    extract from sun-j2ee-ru.xml for AddressEJB bean


        <ejb>
          <ejb-name>AddressEJB</ejb-name>
          <jndi-name>ejb/local/petstore/customer/Address</jndi-name>
          <ior-security-config>
            <transport-config>
              <integrity>supported</integrity>
              <confidentiality>supported</confidentiality>
              <establish-trust-in-target>supported</establish-trust-in-target>
              <establish-trust-in-client>supported</establish-trust-in-client>
            </transport-config>
            <as-context>
              <auth-method>username_password</auth-method>
              <realm>default</realm>
              <required>true</required>
            </as-context>
            <sas-context>
              <caller-propagation>supported</caller-propagation>
            </sas-context>
          </ior-security-config>
          <gen-classes />
          <ejb20-cmp>
            <sql-statement>
              <operation>storeRow</operation>
              <sql>UPDATE "AddressEJBTable" SET "__reverse_address___PMPrimaryKey" = ? , "city" = ? , "country" = ? , "state" = ? , "streetName1" = ? , "streetName2" = ? , "zipCode" = ? WHERE "__PMPrimaryKey" = ? </sql>
            </sql-statement>
            <sql-statement>
              <operation>loadRow</operation>
              <sql>SELECT "__reverse_address___PMPrimaryKey" , "city" , "country" , "state" , "streetName1" , "streetName2" , "zipCode" FROM "AddressEJBTable" WHERE "__PMPrimaryKey" = ? </sql>
            </sql-statement>
            <sql-statement>
              <operation>deleteRow</operation>
              <sql>DELETE FROM "AddressEJBTable" WHERE "__PMPrimaryKey" = ? </sql>
            </sql-statement>
            <sql-statement>
              <operation>createRow</operation>
              <sql>INSERT INTO "AddressEJBTable" ( "__PMPrimaryKey" , "__reverse_address___PMPrimaryKey" , "city" , "country" , "state" , "streetName1" , "streetName2" , "zipCode" ) VALUES ( ? , ? , ? , ? , ? , ? , ? , ? )</sql>
            </sql-statement>
            <sql-statement>
              <operation>deleteTable</operation>
              <sql>DROP TABLE "AddressEJBTable"</sql>
            </sql-statement>
            <sql-statement>
              <operation>findByPrimaryKey</operation>
              <sql>SELECT "__PMPrimaryKey" FROM "AddressEJBTable" WHERE "__PMPrimaryKey" = ? </sql>
            </sql-statement>
            <sql-statement>
              <operation>createTable</operation>
              <sql>CREATE TABLE "AddressEJBTable" ("__PMPrimaryKey" LONGINT , "__reverse_address___PMPrimaryKey" LONGINT , "city" VARCHAR(255) , "country" VARCHAR(255) , "state" VARCHAR(255) , "streetName1" VARCHAR(255) , "streetName2" VARCHAR(255) , "zipCode" VARCHAR(255), CONSTRAINT "pk_AddressEJBTabl" PRIMARY KEY ("__PMPrimaryKey") )</sql>
            </sql-statement>
            <create-table-deploy>true</create-table-deploy>
            <delete-table-undeploy>true</delete-table-undeploy>
          </ejb20-cmp>
        </ejb>
  101. The real deal with CMP and SOAP and all these inefficient, complex monstrosities is that they are created (designed/implemented) by very bright people who like to solve complex problems, and who tend to think that normal people are vastly stupider than we really are, and tend to "dumb down everything". Just because I don't fully grasp every JSR and RFC ever written doesn't mean that I can't grok a simple SQL statement, or figure out how to open and close a socket.

    If I'm having trouble with some of the calculus in Einstein's theory of relativity, would you try to teach me matrix division so I don't have to do long division?
  102. well He has to. Sun, Weblogic and IBM have to love EJB for the same reason: all app servers are merely Apache Tomcat without EJB .Look at Jboss Testsuite -half of it is about ejbs!
    Faisal
  103. JDO has some problems as CMP[ Go to top ]

    <Gabriel>...you need to install your application on many application servers you have to retype the specific xml files...</Gabriel>

    1. You need to define the mapping for each JDO implementation, since the mapping is left to the vendor

    2. JDO faces all the same caching and object graph problems as any CMP implementation

    3. When using session facade having transaction and distribution at the entity bean layer isn't needed. That is why I like where CMP is heading in JBoss 4 and agree with Floyd on that the component interfaces for entity beans add development overhead that isn't needed. But caching is need. As indicated in the BLUE whitepaper, the database becomes a bottleneck.
  104. <geoff hendry>
    As with everyone else who has worked with entity beans through a few versions of the spec, it has been dissapointing to not see clustered EB coherence as part of the spec, and to have to rely on proprietary "smart stubs" or worse, implement your own cache invalidation layer using JMS (as I saw as a pattern presented at Java One in '01).
    </geoff hendry>

    Why is sending JMS messages to indicate stale data (or a cache update) a bad thing?

    I thought the JMS & JCACHE architecture talk at JavaOne '02 was the best darn architecture I've seen for one of the most intersting real world problems - financial data. Basically, it advocates have JCaches in many places - on the client, regionally distributed on the network, on servers of course, etc. The JCache (product from SpiritSoft that has a JCache JSR) is loaded via rules - the rules can load any kind of object in any way - JDO, EJB, JDBC, JMS message, etc. JMS messages invalidate the Cache as necessary. JavaOne JCache talk.

    I'm about to embark on analyzing this seems pretty cool. Am I missing something? What should I watch out for?

    There's an OpenSource impl of JCache

    On JDO/EJB, I think of it this way - JDO is an abstraction of persistence implementations, a Platform-Independent Model design (in Model Driven Architecture terms - warped a bit). Contrastingly, EJB is an interface for a specific persistence model, not one for persistence models generally. So JDO can be implemented via EJB (not that anyone will). If the data clients code to JDO, then if EJB doesn't work out, use another JDO implementation.

    I hoping to take the idea one step further. Clients should get data from a JCache. This way, it can be loaded into the cache via JDO (EJB/JDBC), JMS, SOAP, whatever. Isolating data management this way allows for flexibility to change the implementation when either something doesn't work or the app needs to work under a new scenario.

    Hope this isn't regurgitation of old news, it's been a while since I've followed this site. (Floyd gave me a t-shirt at JavaOne years ago for posting to the then-new TSS. No one knew who he was then or who he was going to be! He got my respect instantly, he's a special person.)

    Michael Bushe
  105. Oh yeah, this thread is about the blue paper ("Remember Alice? This is a song about Alice..."). The blue paper was fun - but I want to see the red paper - that sounds like great stuff. Is it not out yet?
  106. There's an OpenSource impl of JCache


    There is no JCache API yet, so there are no implementations of JCache.

    Products that claim JCache compatibility are based on a spec submitted by Oracle which was set aside by the expert group. It's unfortunate that the JCP is not more transparent.

    You will probably see open source JCache solutions soon after the API is available. A developer involved with JBoss will be (or is already) on the expert group, for example.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  107. I do not like hashtables and java collections by the same

    >>reason: I never know what might be inside...
    >Queer approach...

    He means that they are not strongly typed - you cannot declare the type of the contents.

    Roll on Generics...

    BTW: Has anyone tried the prototype Generics compiler (JSR014) for code deployed into an AppServer?

    /david
  108. 1. I used to develop for windows 2.04. It sucked then, the underlying API still sucks, on account of its windows2.04 compatibility.

    2. As one of the Axis developers, I want to disagree a bit about your soap criticisms. Yes it is slow, as it is almost he worst case optimisation you can have: spelling out ints and floats, base-64 encoding binaries if you want interop, and of course treating XML as binaries to avoid ID complications.

    But nobody sells SOAP on performance. SOAP is about interop. Its about anything calling your server: Java, .NET, Perl, C++, and you calling a server written in any other language. And it's not just an inefficient RPC serialisation format, in doc/lit mode it is about the exchange of XML documents between applications, documents in whatever XML Schema you put together. I'm protoing a SOAP server in java now, but we may want to implement the product in C++, then embed it into hardware...the fact we have switched implementation wont matter to the client apps, and 10 years from now you will be able to talk to the same device with whatever SOAP impl you have to hand. Try that with EJB/RMI-IIOP, or indeed DCOM/COM+. (actually you could do it with pure Corba, which is a side issue 'why web services hasnt finished stealing a fraction of the Corba design yet')

    -steve
  109. <quote>
    The rule is simple, in general anything that adds machine instructions but simplifies things for people is a good tradeoff. SOAP does this over RPC, Corba and any other binary distributed IPC system.
    </quote>

    <quote>
    But nobody sells SOAP on performance. SOAP is about interop.
    </quote>

    I totally agree with these statements, yet, as always, I think SOAP is the new Silver Bullet, and every salesman is convincing managers that it can do anything easily. Which is true. But it will never do anything efficiently.

    It's the same problem for EJB's, each technology has it's own small domain. Someone was saying how working with 400k records at once with EJB's was not working, so they used JDBC.

    The same will be true in SOAP.

    Some though have correctly noticed that SOAP is about transport, and which human ever cares about that? It's a thing for machines.
    Surely there will be WS that humans will use directly, but the vast majority of it will be used in B2B communications.

    SOAP is where EJB was 4 years ago, it is starting out and people should not fall in love so easily or they will be hurt.

    How about a linux kernel module for handling XML?

    BTW in my current project, out of total indecision and inability to have a plan the integration layer for a new corporate component we are adding, will have to do it all, that is CORBA and SOAP.
    Internally we will use a custom protocol over sockets and RMI for Java-only world (the odd thing is that J2EE was not an option). In my opinion SOAP is adding complexity, not reducing it, because the truth is that they are all different protocols and you can't easily throw them out and keep just one.
  110. I know this is off-topic (again), but there are a number of academic papers indicating why SOAP is Not A Good Thing (NAGT), or at the very least not the silver bullet that everybody is looking for. For example, have a look at a paper called "Latency Performance of SOAP Implementations" - forgotten the authors, but put the title into Google and you're guaranteed to get a hit.

    Basically they conclude that if you're communicating with another server and you're not behind a firewall, don't use SOAP - use CORBA IIOP for disparate clients, RMI for Java-to-Java, and the COM binary protocol for Microsofties. In other words, use it if you have to, but not otherwise.

    Don't believe the hype - be a real software engineer, do your homework and ignore the vendors.
  111. I definitely agree that one should use it when you have to, but from my experience, the applicability in using it is coming up more often. This would at least seem to hold when the organization you are working in grows in size. For example, it is trivial enough to control how a system you are responsible for developing communicates internally with another system you are responsible with. But as soon as anything has to travel outside of your organization, whether it is intra-corp or outside completely, SOAP becomes more compelling (for me at least :-).

    From what I have experienced, it isn't enough to go to developers in another dept that needs to integrate with your system and say "just put a trigger in the database", or "poll our database" when they are downstream from you, as in a order management systems, supply chain systems, whatever. Even worse sometimes, is to take the single technology stance when your are working on a j2ee platform and they are working on vms or wintel or whatever. Even worse than that, is trying to tell your business partner that you will build a single technology programmatic bridge for their system to integrate into yours.

    Agreed that SOAP is fatter, but for many situations, it seems to make more sense. Part of the reason why I think SOAP will succeed where other technologies like Corba have failed is that there seems to be more breadth in the industry supporters, or maybe its because MS seems to be trying :-).
  112. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    <Q>... BTW in my current project, out of total indecision and inability to have a plan the integration layer for a new corporate component we are adding,...
    </Q>

    Check out this article on layering applications for a way to deal with this. http://www.javadude.com/articles/

    We've had problems with developers thinking of the technology or technique being used and not about the business logic and properly designing classes. So the code becomes how to deal with SOAP or HTML and servlets and not how to "Create a Customer", for instance. "Separation of Concerns" anyone? The same goes for databases. Solve the business problems first then the technology ones. Another thing to do is to think about at least 2 ways the code can be used but not code both implementations.
  113. Cache[ Go to top ]

    How does the JBOSS cache compares to, let say, Tangosol Coherence? AFAIK, JBOSS does not support cache updates across a cluster.
    I agree with marcf that collocating the cache and the servlets is a good thing, but how well does it scale in the absence of a distributed cache? If you have a farm of web servers behind a load balancer, the only way to make this work -without losing the benefit of your cache is to have sticky sessions.
    -Vincent.
  114. Cache[ Go to top ]

    Vincent: "How does the JBOSS cache compares to, let say, Tangosol Coherence? AFAIK, JBOSS does not support cache updates across a cluster."

    We don't usually get compared to JBoss, since it's an application server. JBoss 3.0 has integrated a multicast messaging implementation (Javagroups) which allows it to implement certain features in a clustered manner. JBoss also has some forms of EJB caching built in; I'm not certain what the relation is between its built-in EJB cache and the multicast messaging implementation is.

    Several of our customers deploy on JBoss. One, quite interestingly, uses JBoss to host common business logic that (in addition to J2EE apps) is exposed to .NET and even old VB applications (VB to COM provided by .NET to ja.NET to JBoss to Coherence). I think that there's something very ironic about that. ;-)

    Generally, whenever you need to keep data in sync across multiple nodes in the cluster, that's when you'll use Coherence. Coherence isn't an app server at all though.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  115. If only appservers supported clustered entity bean >caching back when EJB was first launched, I think entity >beans would have had a lot more industry adoption today.


    I can be wrong, but I think Persistence was doing it.

    Also, here is a quote from our friend Roger Sessions:

    "So whereas Sun is frantically adding features to EJB, Microsoft is quietly removing them from COM+. In Memory Databases (IMDB) was one of the highly touted features of the COM+ Beta. Unfortunately IMDB had many of the same problems as EJB's Entity Beans, namely a poorly defined integration between the component level cache and a database level cache. This is a recipe for havoc for applications that need to work with legacy systems. Sun's attitude seems to be to let the vendors sort out this mess and to let the customers take their chances. Microsoft's attitude is to make sure that what is there works; if Microsoft isn't absolutely convinced that a particular feature makes sense, out it goes. So for COM+, IMDB that once was in, is out. And for EJB, the Entity Bean model that should have been out, is in."

    "Roman/Oberg point out correctly that COM+ has no built in support for any equivalent COM+ component caching, since In Memory DataBases (IMDB) have been withdrawn from the COM+ product. Personally, I consider this a good thing. I have long publicly advocated that both IMDB and data object caches must be considered highly dangerous technologies. The reason is that there is no mechanism to keep the data object cache/IMDB cache coherent with the back end database; applications that do not use the component system are able to update the database without the data object/IMDB cache knowing that its cached data is stale.

    Without a cache coherency mechanism, applications will eventually succumb to database corruption. It may happen slowly and may take a long time to notice. But unless you can guarantee that every single database application you ever run goes through one and only one component vendor's technology, database corruption is part of your future. You can bank on it.

    The odd thing about this is that none of the Entity Beans technologies, including CMP, offers a significant advantage over the EJB component model called stateless session beans. In the ObjectWatch Newsletter #20, I showed that any component that can be written as an entity bean can be written just as well as a stateless session bean with none of the performance problems and none of the risk of database corruption. The stateless session beans model is exactly the same as the COM+ component model, so in this regard, COM+ and J2EE are exactly equivalent."

    --
    Dimitri
  116. <Q>
    applications that do not use the component system are able to update the database without the data object/IMDB cache knowing that its cached data is stale.
    </Q>

    And that is why they shouldn't. If one thinks persistance instead of database, it will go a long way to solving these kinds of problems. The main problem is that most developers still think GUI and Database and then want to throw objects in the middle. This is usually bound to fail. Something like trying to read 40,000 records, etc. - hmmmm where did I see that? The problem with the standard thinking that "it is just data" is that most data is very closely tied to the application logic and is pretty much useless without it. How many times have I seen 'business logic' duplicated in Reporting tools? Too many to count.
  117. Mark said:
    "And that is why they shouldn't. If one thinks persistance instead of database, it will go a long way to solving these kinds of problems. The main problem is that most developers still think GUI and Database and then want to throw objects in the middle. This is usually bound to fail. Something like trying to read 40,000 records, etc. - hmmmm where did I see that? The problem with the standard thinking that "it is just data" is that most data is very closely tied to the application logic and is pretty much useless without it. How many times have I seen 'business logic' duplicated in Reporting tools? Too many to count. "

    Um, how to avoid duplication of business logic in reporting? Are you suggesting that we all use something like JReport?

    Thanks!

    Mark W
  118. Yes, Persistance was doing it back then.
  119. I wonder how many people here dissing EJB's has actually implemented a fully blown system using EJB2.0

    The way I see it, EJB's, JDBC, Web Services et al are tools in the box. CMP 2.0 is a huge leap forward compared to its 1.1 counterpart and although it has its disadvantages (particularly EJB-QL is limited as to what it can do), when used properly, it can work very well, as it has done for us. Your overriding factor when deciding for or against EJB's should be whether your application requires the use of them.

    In my experience, 80-85% of data in most real world deployments is Read-Only/Read-Mostly. Caching of such data plays a significant part as the advantages are two-fold: We found that it improved performance 10-30 times where CMP caching was used compared to a direct database hit using JDBC calls(Weblogic/Oracle db configuration) and second, it saves a database connection, precious in a heavy concurrent usage scenario.

    To all the skeptics out there, I would suggest giving CMP a go. You might be surprised.

    Some patterns that we followed here that lead to a successful deployment:
    1. Session facade - Local Entity beans, wrapped by Session beans.
    2. Stateless session EJB + JDBC for reading large read only data
    3. Stayed away from BMP, used Data Access Objects and prepared statements where complex queries were involved
    4. 'Required' or 'Requires New' for most session bean method calls. You will find that execution time for a method when enclosed in such a transactional scope is less than if you do not.

    For those that complain that EJB's are a pain in the arse to develop and maintain, I would agree to some extent. If you do not already, use a tool like ejbgen or XDoclet within your build. Makes life a lot easier.
  120. This is the best article (white paper, whatever) that I've read on EJB in a long time. There has been a lot of EJB bashing going on in the press and newsgroups lately and its refreshing to see someone provide such an articulate and fun defense of EJB.

    Bravo!

    Richard Monson-Haefel
    Author of Enterprise JavaBeans, 3rd Edition (O'Reilly 2001)
  121. Looks like Dr. Mark is already working on his autobiography. I liked seeing what was going on in his head, and seeing what kind experiences led to his leadership and creation of the lovely JBoss platform.
  122. The twins are beautiful.

    However I feel that one solution fits all is impossible and that EJB are not always better than JDBC (maybe in 90% of cases). I think that two assumptions are made by Mr. Fleury :
    1. That the number of users is pretty high
    2. That every user does a limited Database acess

    We started an 100% EJB project and finished 80% EJB. If we would insisted to finish 100% EJB we would simply fail. I want to give just 2 situations where EJB were out of discussion.

    a.We had to compare 40000 rows from two sources (one = database. one = file from external vendor application). The process was started from a web page. Results had to be dipalyed. When we used EJB + cache the application server simply crashed after several hours. When we used JDBC the process was done in 5 min. The maximum number of users were 10 so scalability was not an issue.
    b.We had to process 4 million rows and if certain conditions were meet we had to insert the rows into DB. Again, with EJB server the application server crashed, with JDBC - 30 min. The number of users were 1.

    For the 80% of the application where we used EJB, scalability was an issue but every user acessed in average less than 10 database rows (records).

    I think that EJB is better than JDBC when we have an important number of users and every user does a small number of transactions.

    I have a question regarding JBoss + EJB. His performance remains better than JDBC when we have to process for example more than 40000 rows or we see an important degradation across all the session beans activated inside? How about 1 million rows?
  123. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    Adrian

    "when we have to process for example more than 40000 rows or we see an important degradation across all the session beans activated inside? How about 1 million rows?"

    Perhaps there is a flaw in your database design.
  124. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    ("when we have to process for example more than 40000 rows or we see an important degradation across all the session beans activated inside? How about 1 million rows?"

    Perhaps there is a flaw in your database design)

    Kal,

    Things are different when you have to integrate your application with other products. This time we had to integrate the application with an old vendor application and this application knows to transfer data (in and out) just as plain text files.
    However I can think to some situations where accessing large quantities of data makes sense. Reports, audits, yearly processing, etc... On those processes we used JDBC too.
    Here is a problem with PetStore application used as a benchmark. It is a standalone application and this is not always the case in the real world. Also, it does not have a process that manipulates lets say 1 million rows.
    I hope that JBoss team will allocate resources trying to solve the heavyweight processes performance problems (opportunities?).
  125. I think that Entity Beans is one Sun mistake. I had EJB 2.0 projects. EJB can be used in some situations (no dynamic queries in standard) no dynamic fields loading, limited query language. Your project can not fit into this lemitations - then you will not use CMP. Worse if it would fit. Then you start project in CMP then your reqirements will change and than you can trash your work. The problem of J2EE is lack of good universal (can be used independant of selected technology) recordset/datasource or whatever called it. Anyone who used delphi, .Net or even MFC know how it would benefit. In Java there is JDBC some kind of javaBenas (EJB/JDO/DAO), but no standard. javaBeans has no any metadata information (You could even no validate string length without additional infromation). There are some differences with relation mappings, etc. With stadnard datasource could attach your data to any data aware control (JSP tag, Sting control or what ever) Now if you change DAO to JDBC tou have to change your MVC code becouse there is no standard adapter to datasource. It is paranoia

    Marek Mosiewicz
    Jotel Poland
  126. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    An earlier post said that using EJB to query
    40000 rows from two dabases and when EJB was used,
    the server crashed. With JDBC, it took five minutes.

    Can somebody with insight into EJB container implementation
    explains why EJB performance is so bad in this situation?
    Is it true that what EJB does is to
    execute a query like JDBC does and does some
    other security/transaction/RMI invocation etc.?

    Mark's article claims that these EJB specific
    operation is not that expensive. Then why does EJB
    perform so badly?

    The original text regarding EJB vs. JDBC.


    =================
    We had to compare 40000 rows from two sources (one = database. one = file from external vendor application). The process was started from a web page. Results had to be dipalyed. When we used EJB + cache the application server simply crashed after several hours. When we used JDBC the process was done in 5 min. The maximum number of users were 10 so scalability was not an issue.
    b.We had to process 4 million rows and if certain conditions were meet we had to insert the rows into DB. Again, with EJB server the application server crashed, with JDBC - 30 min. The number of users were 1.
  127. When we used EJB + cache the application server simply

    >> crashed after several hours
    >> ...
    >> Again, with EJB server the application server crashed

    This is not really saying anything unless you say *why* it crashed. Was it an appserver bug? Was it a design problem? It seems premature to jump to the conclusion that " it crashed because it was EJB ..."
    (but oh, so common amongst voodoo-programmers - those that dont know *why* stuff happens).

    Also, some of the EJB criticisms would be a little more interesting if they contained some actual detail. (if nothing else, it would give some indication that the author knows what they are talking about). There certainly are valid criticisms of EJB - but I rarely see them discussed in detail.

    -Nick
  128. Once caching becomes part of the JDO spec (currently its a vendor add), the main thrust of this paper (ejbs are to be loved over other options because of their caching) becomes invalidated.
  129. "A brief history of EJB" should be the name of this great article. The topic is complex and Marc manages to give some good arguments for EJBs in 8 pages. The JBoss history part is also very funny...I would like to hear more about it.

    JDO, Castor, EJB, CocoBase, TopLink, JDBC, SQLJ, JCA isn't it great that we can choise which technology fits your problem and our environment (budget, developers, time, etc.) best?


    transient Mirko
    :wq
  130. Great debate!

    “I like EJB” or “I *think* JDBC is faster” is not scientific.

    A bench mark that can be reproduce is scientific, and test show the EJB’s are slow, there are at lest 4 threads that show that on the server side.
    (I __think__ practical experience shows that EJB are relatively harder to develop and cost of ownership is very high )

    Compare to .NET ADO, that is disconnected.
    A disconnected row set is most like ADO and is very fast. More on JDBC 2.1 RowSet: http://www.javaworld.com/javaworld/jw-02-2001/jw-0202-cachedrow.html

    I have used roll your own beans to great success, with real SQL for performance with huge loads, doing EJB would at least make me purchase many more servers.

    The assumption in this article that people do not cache JDBC or that is hard is silly. Look at Poolman JDBC Connection pool, it auto caches results transparently.
    The assumption that the slow part of SQL access is reading (caching) is I think also questionable.
    SQL engines cache the indexes and data leafs that are LRU!
    The slow part of a SQL engine is updates and writes. Doing tree splits of indexes and data leafs is slow. So a queuing algorithm, such as queue in http://gee.cs.oswego.edu/dl/classes/collections/ is needed.
    Describing EJBs as sort of a distributed database with replication? I do not think that is a good architecture.
    And when I need to do master detail updates, working with OO Rows is easier than an array list of objects, what to do when you have many to many. It does not match the business need.

    EJBs are the Achilles heel of J2EE and should be deprecated, imo. EJB should *not* be use for persistence only, they should only be used if you need CORBA.
    I __think__ that even people that love (emotional) EJB (regardless of performance and costs and needles complexity) go to JDBC when for parts that need scalability.
    I __think__ that the only reason Sun pushes EJB is because it licenses it, but in the long run, companies will find that .NET ADO is faster/cheaper/simpler….. unless Sun dumps EJB before hand. Even the concept that if you do DB access, you do not need to know SQL is silly. You need to know more than Java to make the DB access fast!

    If EJBs are fast enough, why do we have to cache things towards the view?
    With Poolman, it is cached in the back.

    Shameless plug: I have sample vertical applications FREE at basicportal.sf.net that does things “good practices” and
    I do training on this at http://64.253.60.21/do/classReservation
    And I would be happy to help anyone convert slow EJB to JDO or DAO, before you are forced to .NET by PHB that wants to save money.

    Anyway, show me the bench mark!

    .V

    ps: I think Open Source JBOSS EJB is best EJB implementation and lowers the cost of ownership.
  131. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    <Q>
    A bench mark that can be reproduce is scientific
    </q>
    Actually, it takes a little more than that. Most benchmarks are far from being consistently reproduced let alone being scientifically proven.


    As for ADO and JDBC - yeah, they are faster (where is my scientific proof?). But that is not the only issue and that is why things like EJBs and JDO exist. Computers are fast enough and we are tired of coding SELECT * FROM CUSTOMERS and the doing customer.Name = rec("CUST_NAME"), over and over again especially when we want to work with objects. I just want to get my objects, work with them and then say I'm done for now.

    I would say that not having a standard persistance store that is OO is more of an Achilles Heel to EJB and J2EE. So we have to deal with RDBMS.
  132. Propaganda. It is so obvious - EJBs are distributed components, with all the overhead needed to maintain the proxy layer. Since persistence is typically fine-grained and process-local, it is/was plain wrong to graft it onto a component model for distributed components.

    Cacheing is unrelated and can be applied in any architecture.

    IMO it shows that Marc started out as a Sun marketeer..

    Christian
  133. The use of interceptors to encapsulate and isolate functionality in middleware is not a new idea. It is, for example, the basis of IONA's ART architecture they used in Orbix2000. It's a good approach, but not revolutionary in 2002.

    Greg
  134. Entity bean technology is still evolving. They are still very useful if used correctly. Currently, they do not scale very well if they are used to represent large collections. I think the introduction of local CMP is a move in the right direction and shows that the spec authors are striving to get rid of the current problems of Entity beans and also fulfilling the demand that entity beans behave as a pure persistence layer (like Toplink or JDO).

    Finally, if we are solving a problem that does not require distributed objects then JDBC or JDO for persistence is the best choice. If performance is the only criteria then JDBC is the way to go. However, EJB offers the best solution if you are creating distributed objects (or you vision a distributed environment - especially if they need to be accessed from a language independent presentation layer).
  135. I am really glad that the jboss design isolates the dependency on the EJB specification as much as possible. If it finally become an industry standard it will become more authoritative than the EJB spec, and having a nice GPLed product to me is better than having a spec. The vendors can keep relying on the spec - so they can use "standards compliant" in their marketing literature.

    However, I feel that the lack of transparency of the JBoss project is holding it back. They should stop locking up the docs. I know it's only $10, but when you are just looking around you don't want to pay $10. Also, if you follow Linus' advice to "release early, release often", then it follows that new docs should also be released often. What a drag if we need to pay each time. Why don't they just ask people to donate money. The whole idea of OSS is based on one's self-image, so if they ask, then people who end up using the app server would probably pay up (I definitely would, if I did use it.)

    That's the reason Linux and Apache are blowing away their competition, whereas jboss hasn't. The key indicator is what kind of jobs people look for. Most developers feel that they are worth more money if they work with WebLogic or WebSphere, not JBoss.

    What do you say, Marc, is it time to revisit your personal implementation of open-source? Since you are a theoretical physicist, may I suggest that you go back and look at Feynman's "Cargo Cult Science" essay. I think he mentions something about bending over backwards to publicize not only what went right in your experiment, but also all the possible problems with it. It takes open docs to do that.
  136. I agree with Guglielmo Lichtner... I was going to review JBoss for a project, I downloaded the soft, tried to find the doc, didn't. Then I went to the site, found out that I had to pay for it (other than the small Getting Started guide). Since I was just "looking at it", there's no way I was going to spend money on a product that may end up in the Recycle Bin. Sorry to say, I didn't review it. I can download a 60 days trial of Weblogic and have access to all the documentation! What about that??

    Documentation is a very important part of the product itself. Sometimes it's even more important than the product. I understand that the project needs money, but maybe there are other ways...

    About the white paper, after reviewing EJB CMP 2.0, JDO and a lot of other O/R solutions (commercial or open), I can't agree that EJB is that good. There are way too much problems with it to be "godsend". And CMP 2.0 is nowhere near the level needed by most "real" enterprise systems...

    JBoss looks like a very nice app server. I hope I will get the chance to review it, eventually...
  137. Jboss and documentation[ Go to top ]

    Strange, "evaluation kind" getting started information have been available during a long time now for the new (soon to become old ;) jboss 3.0 version at
    http://www.jboss.org/docs/#free-30x


    Happy hunting - it sure helped me "getting started" at least.

    If you want to know how to make it scale or perform its best you probably have to pay (or browse the code ;) - but that's only fair, or?
  138. Interesting article and nice to hear a strong proj-EJB voice but parts of the history don't ring true. The idea of using session beans to talk to entity beans (at least among developers) was conventional thinking right from the beginning. In fact, designing applications with "application" objects talking to "business" objects was a common "best practice" in CORBA applications. CORBA did not have formal "session" and "entity" class definitions but design-wise it amounted to the same thing. It is hard to imagine an EJB specification leader laughing at Marc for suggesting this. Why else would they have been included in the spec? Also, his claims to have coined the "He who owns the transaction web, owns the web..." phrase seems shaky. People were saying things like that publicly since the Web emerged, though maybe not those exact words. In any case, that idea was fairly mainstream by 1998.
  139. Nice article, but a few points

    1) Caching has an associated cost, RAM/Swap space. If you want to carry the caching idea to it's logical conclusion you will load your cache on application boot and never access the DBMS. Why pay for ORACLE in that situation? Why not just have a custom optimised binary format and load from a flat file? Isn't that just like having a DBMS on a flash disk no? :-)

    2) Interceptors, whilst being a cool ( not new ) idea, are not j2ee. Whats the good of ear files if they are not compatible across app servers ( apart from specific deployment descriptors of course ).

    3) SOAP, SOAP is big and inefficient, can we all stop repeating it, we all know the sky is blue :-)
    But, like anyone who has worked with a binary protocol and had to write their own diagnostic utils, I know I'd rather use XML Spy or Cape Clear to handle that sort of stuff. How easy is it to test a POP3 server using telnet? And why is that so? Answers on a postcard please :-)
  140. hehe, well:

    1, yes, I've heard of a couple of companies that have there data entirely in ram all the time (ram is quite cheap these days probably...)

    2, sure, but why would interceptors have anything at all to do with portability of ear/ejb.jar/etc files???

    3, yep, if you don't need high performance - yay SOAP! But if you do...
    ...well "machine code or direct wire logic rulez" ;)
  141. re 1): have a look at Prevayler ( http://www.prevayler.org/ ). they seem to be doing the exact same thing you're mentioning (loading the whole DB into memory).
  142. I don't see anything original in Marc's article. It does sound a lot like the Sun marketing hype around EJB that we heard 4 years ago, and which many of us have learnt through bitter experience was naive.

    The argument that caching can only be accomplished using entity beans is flawed. JDO offers more control of caching than standard entity beans. And some O/R mapping products such as TopLink are still streets ahead of EJB CMP. JDBC is often the best approach--many applications can't benefit so much from caching.

    _Many_ projects have run into problems using entity beans, even with CMP 2.0. There are major problems with EJB QL, and CMP is still little use without proprietary extensions. In my new book, Expert One-on-One J2EE I discuss the arguments surrounding EJB and entity beans in detail.

    Marc's discussion of the interceptor-based architecture in JBoss isn't an argument for EJB--it's an argument for an AOP-ish approach that is potentially far more powerful than EJB. I'm coming to think that EJB is a transitional technology that will be seen in a few years time as the path towards an AOP approach to enterprise middleware.

    Dismissing web application architectures that don't use EJB as "retrograde" is absurd. This kind of argument has cost the industry hundreds of millions in unnecessarily complex solutions. Most of the power of J2EE isn't EJB-specific. Many web apps are unnecessarily complicated by using EJB without good reason--see chapter 1 of my book for more details.

    Btw, I like JBoss. EJBs have their place, but they aren't the be all and end all of J2EE, and just as well.

    Rod
  143. JBoss Founder Marc Fleury: Why I love EJBs[ Go to top ]

    I have to say Rod's book is one of the best J2EE books I have ever read, objective and insightful. I thought we should have learned our lesson and take articles from "insiders" with a grain of salt. One won't expect objectivity from SUN, BEA, or JBoss, right?
  144. <snip>
    Marc's discussion of the interceptor-based architecture in JBoss isn't an argument for EJB--it's an argument for an AOP-ish approach that is potentially far more powerful than EJB. I'm coming to think that EJB is a transitional technology that will be seen in a few years time as the path towards an AOP approach to enterprise middleware.
    </snip>

    Yes, Yes, Yes. You are one of the very few people posting on this site that understood the article.
  145. "and then we decorate the bleep out of it for transactionality etc" . . . a quote from a colleague of mine.

    Yep. JBoss isnt alone. The new bea workshop product will be doing the same, web sphere as well.

    Most of the emerging Web services standards are being implemented along the same principle of interceptors in the article.

    Someday it will get easier . . . til then im employed.
  146. Rod,

    This is so very true... CMP in its current state is nothing without the EJB-QL extensions (when they exist) and is still not appropriate for the majority of projects. CMP is not recommended for shared databases, and unshared databases without a cache layer. CMP is extremely restrictive as far as relational data is concerned. In short, CMP only really finds its fit in the object database playground. In other circumstances, it is only an alternative, which can be quickly dismissed in many cases in favour of JDBC or advanced O/R mapping tools.

    However, when object databases take off, EJB and JDO will get the lion's share. Until then...

                    Yann
  147. Rod,

    I just bought your book on Amazon by the way! :)

                    Yann
  148. It would be great to discuss cache implication with some
    real figures. Now it not J2EE vs. NET, it is our internal
    deal ;). Maybe Marc will give us something in the next
    paper? Otherwise it is too emotional (like, dislike etc.)

    Dmitry
  149. I don't find Marc's article interesting that much. Here are my points:

    - EJB is a solution to a real problem, no need to tell the history of enterprise computing to get to that point. The big argument is they could have been a better solution. Unfortunately EJB ended up being a complicated technology. It doesn't work in the real world with a iterative/incremental approachment to application development. It's overkill for small apps and you almost always start from a smaller app and that app grows to a big app iteratively. Java's core works in JavaCard/Mobiles/etc up to desktops, servers and mainframes. EJBs should work in the same way too. It's almost impossible to effectively do test driven development with EJB. I can't start from a object serializarion of Jisp-based persistence system just to make the tests pass and then move on in the next iteration to switch to RDBs. It's such a non-transparent model that impact every method and class of your application.

    - Regarding caching: caching makes wonders! 95% of your application overhead is database access, and whatever you can do to minimize it is worth the hassle. I don't understand why something along the lines of JBoss's commit option D is not in the spec. The best Christmas gift for me is a JCache API :-) But even with caching ejbs are still too much overhead (the ejb finder methods argument, etc).

    - Everyone here imho puts too much emphesize on tools to make EJB development easier. Many people just tell you "use XDoclet and develope EJB happily!". Well, I've been XDoclet's leader for 2 years now and had the chance to see people use our tool for various application servers, different small/big projects/teams and different scenarios. Read my thoughts about correct and incorrect uses of XDoclet at http://freeroller.net/page/ara_e/20021203#codegenerationisadesignsmell. We *have* to use such tools and they make life mcuh easier but that doesn't mean the EJB architecture is correct.

    Btw,

    Ara.
  150. I don't find Marc's article interesting that much. Here are my points:

    - EJB is a solution to a real problem, no need to tell the history of enterprise computing to get to that point. The big argument is they could have been a better solution. Unfortunately EJB ended up being a complicated technology. It doesn't work in the real world with a iterative/incremental approachment to application development. It's overkill for small apps and you almost always start from a smaller app and that app grows to a big app iteratively. Java's core works in JavaCard/Mobiles/etc up to desktops, servers and mainframes. EJBs should work in the same way too. It's almost impossible to effectively do test driven development with EJB. I can't start from a object serializarion of Jisp-based persistence system just to make the tests pass and then move on in the next iteration to switch to RDBs. It's such a non-transparent model that impact every method and class of your application.

    - Regarding caching: caching makes wonders! 95% of your application overhead is database access, and whatever you can do to minimize it is worth the hassle. I don't understand why something along the lines of JBoss's commit option D is not in the spec. The best Christmas gift for me is a JCache API :-) But even with caching ejbs are still too much overhead (the ejb finder methods argument, etc).

    - Everyone here imho puts too much emphesize on tools to make EJB development easier. Many people just tell you "use XDoclet and develope EJB happily!". Well, I've been XDoclet's leader for 2 years now and had the chance to see people use our tool for various application servers, different small/big projects/teams and different scenarios. Read my thoughts about correct and incorrect uses of XDoclet at http://freeroller.net/page/ara_e/20021203#codegenerationisadesignsmell. We *have* to use such tools and they make life mcuh easier but that doesn't mean the EJB architecture is correct.

    Ara.
  151. I have enjoyed reading a lot of the postings in this thread. What strikes me is that quite a few of the posters seem to look at caching as an all or nothing proposition, either caching is bad or it's good.

    I believe the results of caching can be really good (even exceptional) but the problem is that caching goes diametrically opposite to data integrity. There are quite a few situations where this does not matter, many applications have read only data or can accept a quite high loss of data integrity. Then there are the 'J2EE pure-play' applications where you have a dedicated database and the EJB container is non-clustered in which case object caching can be 'easily' achieved and be very effective.

    On the other hand there are quite a few situations where a certain level of data integrity is required, the solution is deployed in a clustered environment and has a shared database. All of a sudden things go from being relatively 'easy' to quite complex. In some of these situations, having a distributed object cache will not be a good solution. One reason is that nowhere in the EJB spec is it stated that a distributed caching scheme is required, so it might not even be provided by the container. Another is that even if it does exist, the cost of trying to synchronize the object states across many servers in the cluster becomes too expensive. Transaction concurrency is not easy, not even when everything is executing in the same address space, much less when you have the data duplicated across many nodes connected by potentially relatively frail network connections. I guess this is one of the reasons people pay big bucks for relational databases.

    To compound the issue, I believe the CMP spec (with all its other flaws, many pointed out by Marc in his paper) has many shortcomings as a O/R persistence mechanism even in the 2.0 spec. Two big ones being:

    1. The spec does not provide any caching semantics. In many applications you want to utilize an object cache in a majority of situations (user is browsing data in screens where the guaranteed freshness is not critical), but when a user performs certain application functions (editing an order, transferring money) it becomes absolutely crucial that object caching is not used but rather those operations take place directly against the underlying datastore as to benefit from the strong features of an RDBMS such as concurrency control. If the spec had recognized that selective caching is part of a real world problem and had contained semantics for explicitly in the code being able to say 'Bypass any object cache here' that would result in a more flexible model where the use of object caching or not can be dynamically dictated rather than as a 'all-or-nothing' at deployment time or perhaps not at all. Even if caching configuration is possible at deployment time it still does not allow for selective use of the cache to be evaluated at runtime. I know some people will say 'but the point of a cache is for it to be transparent, developers should not have to worry about it'. I believe the benefits of providing that type of control outweighs the downside of providing perhaps another way for developers to shoot themselves in the foot ;)


    2. The CMP spec does not concern itself with optimistic concurrency. This seems like a pretty serious omittance considering that many (most?) of the applications being developed using J2EE use HTTP as the invocation protocol between the client and the server. Optimistic concurrency is a necessity in a lot of situations, has been since the client / server days and will be until the end of time or true shared, clusterable object databases see the light of day, whichever happens first. It not being part of the spec is bad enough but to make matters worse the spec is such that it precludes you from adding your own optimistic locking scheme when dealing with clustered environments.

    I know that both of these issues have been addressed by product vendors. It would be nice if it had been recognized by the CMP EJB specification itself thereby being available in all containers.

    Another general issue I have with the paper (and some posts) is that it takes such a black and white view of object relational persistence. It seems to say 'CMP is the only way to go for object relational persistence, ever'. There will however be situations where you find yourself having to do an eight-way join between de-normalized legacy database tables with 200000 rows in them. That just can not be effectively expressed through CMP.
    It all comes back to properly architecting a solution. Yes, I do want to be able to use caching where appropriate but it is a means, not an end. In certain situations I want to not use caching because it would result in an unacceptably low level of data integrity. CMP can be a good hammer in a lot of situations but beware that all object / relational needs are not shaped as nails.

    For the record, I like Java, J2EE, JBoss and have nothing but respect for Marc Fleury.

    /Fredrik Sjodin
  152. I find the technical content in Marc Felury's blue.pdf paper
    peripherial at best. Better read the following PDF dealing
    on the same subject topics, i.e.,

    1. Java Dynamic Proxy
    2. Interceptor chain design pattern
    3. Aspect Oriented Programming

    4. Use of all of above in designing flexible software
       system (one of which is object middleware like
       JBOSS)

    The paper is:

    http://eden.dei.uc.pt/~nsantos/papers/nsantos02rmiproxy.pdf
    ==========================================================
    A Framework for Smart Proxies and Interceptors in RMI

    Nuno Santos, Paulo Marques and Luis Silva

    CISUC, University of Coimbra, Portugal
    {nsantos, pmarques, luis}@dei.uc.pt
    ==========================================================

    Soumen Sarkar
  153. red.pdf 404[ Go to top ]

    Thanks mark for the rare insight into understanding Jboss from a higher level. Look forward to your next article "red".
  154. I was wondering when Sun will clear our minds about when to use EJB, how it can fail, data caching, JDO, JDBC..


    My wish for christimas: I would like to see someone who has great experience in this field, compile into a book that is pratical, clear and show why an technology and pattern is better to choose.

    I would like to see EJB QL full working as SQL Ansi 92
  155. http://www.precisejava.com/javaperf/j2ee/EJB.htm