Effective Enterprise Java: State Management

Home

News: Effective Enterprise Java: State Management

  1. Effective Enterprise Java: State Management (19 messages)

    Effective Enterprise Java talks to you about what is hard about building enterprise applications. One of these items is state management, which is the topic of the free book chapter.

    In the chapter you will read about items such as:
    • Using HTTPSessions sparingly
    • Using object-first persistence to preserve your domain model
    • Using relational-first persistence to expose the power of the relational model
    • Using procedural-first persistence to create an encapsulation layer
    • Recognising the object-heirarchical impedence mismatch
    • Using in-process or local storage to avoid the network
    • Never assuming you own the data or the database
    • Lazy-loading infrequently used data
    • Eagerly-loading frequently used data
    • Batch SQL work to avoid round-trips
    • Knowing your JDBC provider
    • Tuning your SQL
    Read more: Effective Enterprise Java

    Threaded Messages (19)

  2. Use HttpSessions sparingly[ Go to top ]

    The author gives a litany of reasons why HttpSessions are a real resource drain, but says it would be ludicrous to not use HttpSessions at all. I think he's wrong on the last point. It's not that hard to design webapps with semantically meaningful URLspaces that can completely describe the application state, it's really much more useful in the long run, and it turns out to be more performant as well. Furthermore, maintaining webapp state in HttpSession variables can turn out to be a real maintenance nightmare.

    I use HttpSessions to store authentication credentials, that's about it.
  3. Use HttpSessions sparingly[ Go to top ]

    The author gives a litany of reasons why HttpSessions are a real resource drain, but says it would be ludicrous to not use HttpSessions at all. I think he's wrong on the last point. It's not that hard to design webapps with semantically meaningful URLspaces that can completely describe the application state, it's really much more useful in the long run, and it turns out to be more performant as well. Furthermore, maintaining webapp state in HttpSession variables can turn out to be a real maintenance nightmare.I use HttpSessions to store authentication credentials, that's about it.
    +1
  4. Use HttpSessions sparingly[ Go to top ]

    I'm not sure what you're criticizing, because you're basically repeating the author's argument: sessions should be avoided unless necessary, as in caching per-user authentication results.

    If you read the book in its entirety, you'll realize that the author specifically makes such an exception. From page 361:
    "(This [caching of JAAS Subject instance] is one case where we have to have some kind of per-user state, so we have to ignore the arguments made in Item 39 and store it in an HttpSession or equivalent)."
    A single chapter won't give you the full picture. It's just a sample.

    This is an excellent book; I'm going to write a comprehensive review of it soon on my blog.
  5. Use HttpSessions sparingly[ Go to top ]

    "(This [caching of JAAS Subject instance] is one case where we have to have some kind of per-user state, so we have to ignore the arguments made in Item 39 and store it in an HttpSession or equivalent)."
    We are doing this, but found some strange problems with it. Since sessions can be passivated it means that the credentials must be serializable, which is "ok", but when Subject is serialized the credentials are also transient, which means that when the session is activated and the Subject is deserialized it will not be the same. Which means that you have to implement a HttpSessionActivationListener to extract the credentials and put them into the session before passivation, and reinsert them into the Subject upon activation.

    Works, but is a lot of hassle.
  6. Use HttpSessions sparingly[ Go to top ]

    I'm not sure what you're criticizing, because you're basically repeating the author's argument: sessions should be avoided unless necessary, as in caching per-user authentication results.
    I'm merely criticizing the bit in this sample chapter where he says it would be ludicrous to advocate not using HttpSessions. I'm glad the book expands on this theme, indeed I'm glad someone's willing to tackle this issue in print.

    Just out of curiousity, how do y'all document your semantically meaningful URLspaces?
  7. Item #40 leaves me thinking that the objects-first method as he explains it is doomed for failure.. You can't ignore persistence issues... so no pretty, pure OO is possible.
  8. Item #40 leaves me thinking that the objects-first method as he explains it is doomed for failure.. You can't ignore persistence issues... so no pretty, pure OO is possible.
    As with all things, tradeoffs are what's at the heart of the question. Are you willing to trade off a certain amount of performance and/or scalability in order to see a pure (or as pure as possible) object model? Are you willing to sacrifice the relational model in order to support the object model? When the answer to these two questions is "yes", then an O/R mapping layer will usually work quite well for you. When the answer is "no" or a qualified "it depends", then things are going to get sticky.

    In all situations, it's a question of the context in which you find yourself, and the consequences you're willing to live with. Scott Meyers tells a story of teaching "Effective C++" to a defense contractor, and when he gets to the item entitled "If you write operator new, write operator delete" (in other words, if you do custom memory allocation, you must do custom memory deallocation), a guy in the room tells him, "Scott, you really don't need to do this one."

    Scott insists, "No, you really need to worry about this. See, if you use the standard deallocator to clean up your objects...."

    The guy interrupts, "Scott, we write the guidance control programs for the Sidewinder missile--if our program runs to completion, the computer is destroyed."

    As with all things, you must temper the advice in EEJ with your project's needs and desires. :-)
  9. What can I say, the author is just soooo right. The points he makes are not particularly new, but somehow most of the developers don't know them - witness the endless persistence and SFSB good/bad debates on this site. Even better, he gets straight to the point and without any hand waving. A refreshing difference from the other J2EE books (which are better left unnamed).

    There are of course some minor mistakes. For instance, in Item 41 he botched the description of the relational model. If "each relation is a row" (quote), then what is a tuple? Of course, a relation is a table, and a tuple is a row. (Well, not quite "is", as SQL tables are bags aka multisets and vary with time, whereas mathematical relations are immutable). But a reference to "An Introduction to Database Systems" is given so enquiring minds can straighten the facts. Also, rest of the text does not suffer from this mistake.

    Disclaimer: currently I have thoroughly read just first four items of the Chapter 5 and skimmed the rest.
  10. As usual I started reading from the end ;)
    SELECT * FROM Table WHERE column1='A' AND column2='B'
    SELECT * FROM Table WHERE column2='B' AND column1='A'
    Which would you say executes faster?
    It is not explained why the second query may be faster if "the likelihood of column2 being B is a lot less likely". If it is assumed that everyone is smart enough to recall short circuit boolean expression evaluation, than half of the book content can be removed, because there are a lot of other seemingly obvious things.
    Or, as another example, how about these two statements?
    SELECT * FROM Table WHERE column1=5 AND
    NOT (column3=7 OR column1=column2)
    SELECT * FROM Table WHERE column1=5 AND column3<>7 AND
    column2<>5
    Answer: For five of eight popular databases, the second turns out to be faster.
    Again, it is not explained why the first statement may be slower. A reader can only make assumptions about self join and poor optimizer implementation.

    I understand that SQL optimization is a complete separate task, but the scope and complexity of it should be at least outlined so Java developers would lose funny ideas about how simple SQL is ;)
  11. Any cost-based query optimizer that uses short-circuit boolean evaluation based on the position of the operands without checking index selectivities and column positions in a composite index is braindead. For example, if you have a composite index on (column1, column2), the query optimizer should evaluate the sarg on column1 first (because it's the ordered, leading column) regardless if it's to the right or left of the AND operator.

    You can't optimize a query based solely on its syntax. You have to check your indexes and tables first.
  12. Frightfully hard?[ Go to top ]

    "As it turns out, by the way, pinning HTTP requests against the same
    machine turns out to be frightfully hard to do."

    If you have a hardware load balancer, it is likely that it can pin sessions to a particular server based on a cookie. Each server needs to write a cookie with a value unique to the server in the farm. Example, you configure the load balancer to look for a cookie called "HOST". If the value is equal to www1, send it to host www1 in the farm. If there is no HOST cookie, then let the load balancer send the request where ever it wants (usually based on some algorithm like least connections). As a developer, you just have to be sure that the client is excepting cookies and that you don't put anything in the HttpSession unitl you are sure the user has a HOST cookie.
  13. Frightfully hard?[ Go to top ]

    These are called "sticky sessions", and they're supported, for example, in the WebLogic proxy plug-in, where all session-aware requests are redirected to the same server that created them. If that server failed, the requests will be redirected to a secondary server in the cluster, where the sessions are replicated.

    There are other controversial bits scattered around. For example, on page 113, it says:
    So one cheap way to get a certain amount of clustering is to set up the application servers on two or more machines, then set up UDP/IP listeners on each. When it comes time for the middleware layer to find a server to execute some processing, issue the UDP/IP broahcast and take the first server that responds. Taking the first one to respond also provides a certain amount of load balancing, since a server that's being hammered will most likely take longer to respond anyway.
    This is not good :) This setup will create huge broadcast storms on the subnet for any enterprise-grade cluster and number of requests. Usually hardware load balancers, like f5, ping a JSP page, for example, on each member of the cluster on a regular basis to measure their heartbeat and determine which server has the least load. Broadcast for one-time or infrequent discovery/election is OK, but not for regular requests.

    Honestly, these glitches are very few. The majority of the content is first-grade, IMO.
  14. Frightfully hard?[ Go to top ]

    The problem comes when you have dynamic hosts for things like failover. DNS isn't written in stone just because it's happens at a lower level than Servlets.
  15. Frightfully hard?[ Go to top ]

    Unless you turn off caching on your client DNS resolver, you're getting sticky sessions for free, because once the DNS server sends your resolver a given IP, it will be reused for the next 48 hours or whatever the TTL happens to be.
  16. Ted writes exactly as he talks, so it's funny reading this and hearing his voice read it aloud in my head, as if he were there reading it to me himself .. and I've heard him talk about some of these things using the same words, so it's like deja vu all over again ;-). I have a whole bunch of comments on this chapter, not surprisingly since Java state management for clusters is what Coherence is designed for. However, I thought I'd just start with the HTTP session part, and there are two main comments that I had:

    1. Look at the article on TSS labeled Monitoring Session Replication in J2EE Clusters. The points in this chapter are similar: Keep the HTTP sessions small if you have to use them at all, because the app servers can't manage them (many sessions, large sessions, or combination thereof) efficiently in a distributed environment. Ted is correct on several points, including that sessions should only be used if they are necessary, but isn't that the same as for any feature? On the other hand, there are very scalable solutions for distributed HTTP session support (sticky or not) including Coherence*Web, which has no problem transparently managing millions of concurrent sessions across hundreds of servers.

    2. The use of a filter to implement custom sessions is a neat trick, but it doesn't work for a lot of app servers :-(. We had a precursor to Coherence*Web that did just that (and there are several other companies that tried to do the same thing) but the app servers will often cast the session object to its own internal class. There are two possible results: (1) if you let the app server see its own request/response object, it may create a second session and try to manage that too .. I've seen this on Weblogic for example, and the side-effects cause all sorts of application bugs, and (2) if the app server does get a copy of your session object, it will blow up with class cast exceptions. What does this impact? Well, if you're using declarative security, it impacts that directly .. it just won't work with most app servers. This is one of the complexities (seamless app server support) that took our HTTP session management module from 1,200 lines up to 30,000 lines. Here's how the custom request (substituted by the filter) would create the session:
    /**
    * Returns the requested HttpSession.
    *
    * @param fCreate <code>true</code> to create a new session for this request if
    * necessary; <code>false</code> to return null if there's
    * no current session
    *
    * @return the requested HttpSession
    */
    public HttpSession getSession(boolean fCreate)
        {
        // get id from cookie
        String sId = m_sSessionId;
        boolean fNew = false;

        // check whether the session has been invalidated
        if (sId != null && !getSessionCatalog().containsKey(sId))
            {
            sId = null;
            }

        if (sId == null)
            {
            if (fCreate)
                {
                m_sSessionId = sId = generateSessionId();
                preserveSession(m_response);
                fNew = true;
                }
            else
                {
                return null;
                }
            }
        CoherenceSession session = instantiateSession(sId);
        session.setNew(fNew);

        NamedCache catalog = getSessionCatalog();
        long lTime = System.currentTimeMillis();

        //do the new session operations
        if (fNew)
            {
            catalog.put(sId, null);
            session.setCreationTime(lTime);
            }

        session.setLastAccessedTime(lTime);
        session.setMaxInactiveInterval(m_iMaxInActive);
        
        return session;
        }
    For specific app servers, you can plug in ways to do more custom session management, but it's not portable. For example, in Tomcat, you can implement a session manager, and then you get called to create a session:
    /**
    * Construct and return a new session object, based on the default
    * settings specified by this Manager's properties. The session
    * id will be assigned by this method, and available via the getId()
    * method of the returned session. If a new session cannot be created
    * for any reason, return <code>null</code>.
    *
    * @exception IllegalStateException if a new session cannot be
    * instantiated for any reason
    */
    public Session createSession()
        {
        NamedCache catalog = getSessionCatalog();
        Cluster cluster = catalog.getCacheService().getCluster();
        int nThread = Thread.currentThread().hashCode();
        int nMember = cluster.getLocalMember().getId();
        String sDomain = toHexString((nMember << 24) | ((nThread & 0x0FFFFFF0) >>> 4), 8);

        // find a unique ID; since we use the 1 LSD of the member id & the
        // 3 LSD (assuming 16 byte paragraph boundary) of the thread hashcode
        // as part of the ID, and the 4 LSD of the time (in cluster time) as
        // the other part, the ID should already be unique, but we will
        // verify that just in case
        String sId = null;
        while (sId == null)
            {
            sId = sDomain + toHexString((int) getClusterTime(), 8);
            if (catalog.containsKey(sId))
                {
                // need to give up the CPU to let a little time pass so the
                // next time through the ID will be unique
                Thread.yield();
                sId = null;
                }
            }

        Session session = instantiateSession(sId);
        session.setNew(true);
        add(session);
        session.setCreationTime(getClusterTime());
        return session;
        }
    It's also worth pointing out that if you are creating these sessions, you're also going to be responsible for distributing them, invalidating them, timing them out, etc. In the end, you are writing a significant chunk of an application server, just to achieve what (IMHO) it should have done out-of-the-box.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  17. Customer Experience Not Important[ Go to top ]

    From the sample chapter, page 225 (emphasis mine)...

    Transient state is data that the enterprise cares little about—in the event
    of a crash, nothing truly crucial has been lost, so no tears will be shed. The
    classic example of transient state is the e-commerce shopping cart.
    Granted,we don’t ever want the system to crash, but let’s put the objectivity
    glasses on for a moment: if the server crashes in the middle of a customer’s
    shopping experience, losing the contents of the shopping cart,
    nothing is really lost (except for the time the customer spent building it
    up). Yes, the customer will be annoyed, but there are no implications to
    the business beyond that
    .


    This seems shortsighted. The customer is the reason one has written that fancy e-commerce shopping cart in the first place. The customer experience is of utmost importance. I don't give e-commerce websites a second chance after failure: if the shopping cart crashes inexplicably, can I really trust the site with my credit card info? The loss of a new customer is a rather big implication to the business.

    best,
    assmund
  18. Customer Experience Not Important[ Go to top ]

    Agreed. The shopping cart is important and should be recoverable after a crash. This is why most ecommerce sites binds the basket with a long-lived cookie and stores it to non-volatile storage. Some sites, like Amazon, binds the basket to a log-in cookie, so you get the same basket if you log in from any computer.
  19. Ted has tried to address all the points raised in this thread here:
    http://www.neward.net/ted/weblog/index.jsp?date=20041008#1097222655376
  20. I'm surprised nobody's flaming about the author's skepticism regarding many developers' ideal of eliminating SQL from their java code. I actually agree with him, but I was expecting to see many comments to the contrary.