iCommerce Design Issues and Solutions
How to Build a Really Big Distributed J2EE System Using Tools
You Have Around the Office
by: The Advanced Application Architecture Team (A3T)
GemStone Professional Services White Paper Series
"Your mission, should you decide to accept" [back to top]
Your company has been sailing along current course and speed for quite some time. Suddenly, the seas change the Internet vortex looms on the horizon. Your traditional way of doing business is now threatened. Your company desperately needs "Web presence." This desperation is quickly translated into a need for complex, scalable e-commerce applications. There is talk of tens of thousands of Web hits per hour, thousands of concurrent users, on-line catalogs, shopping carts, e-mail, dynamic content, cookies (not the chocolate chip kind).... Persons high on the technonerdity scale suddenly appear in the corporate hallways. (Boggle factor cuts in here and the mind implodes.)
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
Question: How do you take all the technobabble and all the business needs and actually build the e-commerce system? That is the subject of this paper. Whether you're a manager, designer, or down-in-the-bits developer, we hope this paper will help you understand the issues involved in building large, scalable Java 2 Enterprise Edition (J2EE) e-commerce systems, and that it will familiarize you with some design patterns to address those issues. We'll start at a high level and spiral down, so on any section, just read for as much detail as you need. The idea is to jumpstart your system planning and design and help you avoid several traps we've learned about through experience. Read on, and hopefully you'll find some answers to questions you don't yet know you have.
Who Are We, Anyway? [back to top]
GemStone has been in the distributed object world for well over a decade, first with Smalltalk and now with Java. This document and FoodSmart, the archetype J2EE application on the "Developer's Guide" CD1, are brought to you by GemStone's Advanced Application Architecture Team (A3T), a group of dedicated folks with rich experience in the design of distributed object systems. In designing FoodSmart, we took our collective experience with distributed systems of all kinds and "translated" it into appropriate Java paradigms and patterns, discovering some new patterns along the way. The ideas in this paper are a stake in the ground. They are offered to the technical community as a basis for design and discussion, in the belief that they represent a good starting point in the development of large-scale, distributed e-commerce J2EE systems.
What Makes Large-scale Internet Commerce Systems Complex?
[back to top]
We all love the idea of big distributed systems, right? They're what "enterprise" computing is all about. They provide lots of different people a consistent view of the business at the same time. They help us collaborate, streamline processes, grow our business with fewer resources. And, joy of joys, they open up new business opportunities, the most notable of which is the possibility of making money in new ways over the Internet.
So we just quickly produce a nice big distributed system and we're done, right? Wrong. There's an ugly truth out there: performant, large-scale systems are complex to design and implement. This was true in the mainframe days of on-line transaction processing (OLTP), and it's still true in the brave new world of e-commerce and distributed object systems. Even with the power of J2EE, large-scale, transactional Web-based systems present complex technical issues that must be addressed carefully.
- Modern e-commerce sites generate many of their pages dynamically, using content assembled from a variety of sources. For example, a retail site page might present product descriptions and images from one or more databases, pricing calculated by business applications, and availability and shipping information from back-end business systems. In J2EE, the dynamic content is generated by Java servlets or Java Server Pages (JSPs), so a servlet or JSP must be initiated when a client hit arrives at the Web server (see Figure 1).
- To generate its dynamic content, the servlet must either access data directly out of a database or delegate these responsibilities to an application component. Invoking a component incurs the overhead of component creation.
- Pulling data out of the database involves querying, transporting the result set into the object world across a JDBC interface and doing the O/R (object to relational) mapping necessary to populate the needed objects.
- Depending on the application, it might be necessary to retrieve data from one or more legacy systems - each represented by a separate interface requiring more component creation.
- The hit might result in the need to commit data into the database. If several backend data sources are involved, this would incur the extra overhead of a distributed two-phase commit.
- Finally, methods in the objects are exercised to produce the HTML stream in the servlet that must then be routed back through the Web server and across the network to the client.
Figure 1. Servicing a Web Hit: Bird's Eye View
As you can see, servicing a Web hit is a lot of work. A large system incurs this cost for each Web hit that requires dynamic content, so if the system is required to support large numbers of users, scalability can become an enormous challenge. Figure 2 shows how the complexity of an e-commerce system increases with application complexity and scalability requirements.
Figure 2. Large Scale == Complex
To be scalable, systems must constantly juggle the resources available to service clients (CPU, memory, database connections, network bandwidth, etc.). The more clients, the more difficult this juggling becomes. To handle very large numbers of clients, resources must be shared, but sharing resources adds complexity to the application.
- To speed data access, systems must multiplex database connections and cache frequently used data for sharing among users.
- To speed processing, systems must carefully partition computational responsibilities between clients and servers to maximize CPU potential.
- To distribute processing responsibilities, systems must span multiple Java virtual machines (VMs) and multiple physical machines.
- All of the above design factors introduce issues of synchronization and concurrency, object distribution and data integrity.
Somehow, all these distributed resources must be coordinated to play together nicely. To build a large-scale system, you really have to understand where the performance hot spots are and how to address them. This is what makes it so challenging to build scalable distributed systems, and why a good design and a robust technology platform are critical.
Why J2EE? [back to top]
Why use J2EE and its Enterprise JavaBeans (EJB) components at all? Good question. We've chosen this model because J2EE is intended to be a complete platform for Web-enabled, multi-tier, secure, transactional Java applications. The goals of J2EE include better quality, maintainability, and portability of systems and increased productivity and economic return for businesses.
J2EE is based on the component usage model. It provides complete services for components and automatic handling of some application behaviors (such as declarative transactions). The promise of the J2EE standard is that third-party vendors will be able to market quality components that businesses can buy and use to build systems faster and more cost effectively than if they had to build their own infrastructure.
In this paper, we are concerned with four major J2EE components types:
- Session beans (a type of EJB)
- Entity beans (another type of EJB)
- Java Server Pages (JSPs)
- JSP beans
Session beans are typically used to model business processes or tasks. A session bean might, for example, model a set of e-mailing services or credit card validation services. Entity beans more often model business objects in the domain. An entity bean might represent a bank account, a customer, a piece of inventory, etc. Entity beans provide a set of methods that allow the state of the business objects to be managed throughout the bean's lifecycle.
JSPs are like HTML text combined with statements in a mark-up language. JSPs contain special tags that allow them to invoke a JSP bean. JSP beans are used to generate dynamic content that is returned to the Java Server Page to be included in the stream of HTML that the JSP ultimately sends back to a browser.
The roles of all these components in an J2EE application will become clear as we explore the design issues in this paper.
The Anatomy of a J2EE Web Hit [back to top]
Figure 3 is a high-level architectural view of an e-commerce application built with the GemStone/J application server with J2EE functionality. This is one architecture recommended by the Architecture team (there are others). Note the use of JSPs, JSP beans, and EJBs.
Figure 3. A J2EE Application Architecture
We'll use this architecture to track the progress of a Web hit through the system in more detail and understand what resources are used and/or consumed in the process.
Let's assume a customer is at an on-line bookstore. A Web hit begins in the browser. In our scenario, the customer has found a book she wants to buy, so she clicks on a button to put it in her shopping cart. This, of course, results in an HTTP hit. Now the fun starts....
- When the customer clicks on the "add to shopping cart" button, the browser creates a target URL, appends any needed parameters (in this case, an identifier for the book), then sends the packet over the network wrapped in the HTTP protocol.
- When the packet arrives at the Web server, it is unwrapped and examined. The server recognizes it to be a request for a JSP, so it passes this request and its parameters on to the servlet engine for processing.
- The servlet engine finds the relevant JSP (compiling it if necessary) and spawns it in a thread inside its Java VM.
- As the JSP executes, it creates a JSP bean and then delegates the request to it.
- The JSP bean, in turn, invokes the bean home of a EJB bean and obtains a session bean (perhaps a ShoppingCartManager bean). It then invokes a business method (putInCart) on the session bean.
- The session bean interacts with an RDBMS to obtain the state information for business objects of interest (the customer's shopping cart), instantiates them, populates their state (or, alternatively, they might be pulled out of an object cache) and invokes relevant business methods on these objects (addToCart, updateOrder, etc.).
- Once the business methods complete, the flow of information reverses. The shopping cart returns a business object or objects (the current cart contents) back to the session bean, ShoppingCartManager.
- The session bean either returns this directly to the JSP bean or re-maps the information into a more suitable form before returning it.
- The JSP bean takes the object(s) and returns them or some state information derived from them to the JSP.
- The JSP incorporates this information into the HTML stream that it is generating.
- Ultimately, the generated HTML is streamed back over the network to the browser, which then renders the result on the screen (the shopping cart contents).
Figure 4. Servicing a Web Hit: J2EE View
As you can see in Figure 4, a J2EE Web hit has many functional layers, many moving parts that must correctly interact. Each layer must be designed to implement a specific set of responsibilities and have a clearly defined API. Within the layers, the designer must partition these responsibilities, delegate them to relevant objects, and coordinate resource and data usage to ensure scalability and data integrity. To build a scalable system, you need a distributed object architecture that is designed for performance.
Distributed Object Systems Issues [back to top]
In the section above, we've explored the software gymnastics of a Web hit. In this section, we'll identify the major design issues that arise when you sit down to implement the gymnastics. Our discussion is organized around the following topics:
- Architectural approach
- Transactional model
- Object state distribution
- Object identity
- Scaling techniques
- Object-to-relational mapping
Of course, there are many other issues (you're probably thinking of a few right now), and all design issues are important. However, we believe that the issues addressed here have the most impact on total system viability.
Architectural Approach [back to top]
Software architecture is the bones of a system. It gives the system shape and ultimately constrains it in many dimensions. A good architecture makes a system scalable, extensible, and maintainable.
Why Layered Architecture? [back to top]
With all that's been written on software architecture in recent years, one principle that seems to be generally accepted is the concept of layered architectures, the separation of system responsibilities into functional layers, each with its own responsibilities and its own API. Layering achieves system flexibility in three ways:
- Encapsulation Each layer can hide details about its operations from other layers. Thus the layer can to evolve as needed behind a fixed API without affecting its clients.
- Separation of concerns Complexity in the system is easier to manage because each layer is focused on a cohesive set of responsibilities.
- Reuse Adding additional functionality is faster, because each layer can provide services to objects in the layer above it. Furthermore, classes in a given layer can inherit reusable behavior from a superclass, thus abstracting the responsibilities of classes of that layer.
Layered architecture leads to a more flexible and maintainable system. Layers may be quite thin and have very little impact on system performance. Layers can be changed with no effect on other layers, as long as the API remains constant. If you've designed well, you could easily swap out an entire layer to integrate a new data source or take advantage of a new technology.
J2EE Layering [back to top]
We believe a layered architecture is a good thing. Now, what is the right layer stack for a J2EE e-commerce application? Figure 5 represents our answer to this question. We recommend that the stack consist of five layers: presentation, application, services, domain, and persistence. These layers are physically split across the client and the server, and they are logically partitioned into the J2EE Web container, EJB container, and the database. The responsibilities of each layer are briefly summarized below.
Figure 5. J2EE Layering
The application layer mediates the interaction between the presentation and services layers. Services and domain objects in the lower layers may be shared among multiple applications. This layer calls services to implement the behavior of each individual application. Its primary responsibilities are to adapt the distributed representation of the domain to the user interface, to maintain conversational state for the presentation layer, and to handle exceptions that occur during service invocation and that need to be presented to the user.
The services layer provides an API to the business use cases and utility operations required by the application. The services manipulate the domain objects and store and retrieve data, as appropriate, for the application. Additionally, the services layer is responsible for converting objects into their distributable representations. (See the discussion of service-based architecture below for more detail on this.)
The domain layer models the abstractions in the application's problem domain (for example, in an order entry system we have Orders, Products, Vendors, etc.). Business rules and semantics are embedded in the domain objects in this layer. This layer is responsible for the enforcement of business rules and process; therefore, semantic validation of new information takes place here.
The persistence layer provides the mechanisms necessary to permanently save object state. It provides basic CRUD (create, read, update, delete) services and also deals with the object-to-relational mapping issues. If persistence mechanisms other than relational databases are a possibility, then very simple high-performance alternatives may be considered, e.g., GemStone/J's persistent cache.
Service-Based Architecture [back to top]
One key advantage to this layering model is that it enables the creation of a service-based architecture. In a service-based architecture, groups of operations or behaviors are clustered together in the services layer under an API called a service object an EJB session bean. Each service bean provides a suite of methods whose semantics are designed around a single "theme". For example, consider a financial application that must deal with major domain subjects such as accounts, customers, etc. As indicated in Figure 6, this application might use services such as AccountManagementService or CustomerManagementService. The theme of each service may be one of the major abstractions in the domain model in this case, accounts or customers.
However, not all service objects take the lifecycle of a domain object as their theme. Another service object might encapsulate an external interface to a legacy system or an external utility (such as mail or messaging), or provide an essential singleton service, such as creating a timestamp or an object ID. Still another might implement a cluster of use cases. For services whose theme is a domain object, the service provides methods that permit an application to manage the complete lifecycle of the domain object (for example, createNewCustomer, deleteCustomer, modifyCustomer, findCustomer, etc.).
Figure 6. Service-based Architecture
What are the advantages of a service-based architecture? First, the services layer can be the encapsulation layer for the domain model. Clients interface with the application domain model by asking for services, but they do not touch actual domain objects. This has several ramifications:
- Services' methods can take responsibility for transactions involving multiple domain objects. This can lessen the need to replicate objects into the client and thereby save on both processing and network bandwidth.
- Services permit schema hiding, an important goal in designing a flexible system architecture. When the client is shielded from the implementation details of the schema, the schema can change without affecting the client's code. Thus the schema can be changed as required to meet business or technical needs, while the clients continue to do business as usual. This is especially important in a distributed enterprise system, where one schema may support many clients.
Schema hiding can also improve performance by offering greater flexibility in object distribution. When domain objects are accessed through the service layer, a client has no idea whether the object it receives back from a service is an actual domain object or a structure invented just to return relevant state information. More importantly, the service layer can control the distribution of object state to minimize network traffic and optimize sharing of objects among users and applications.
- Many different applications can share a suite of services, while also using services that are unique to each application. Likewise, different kinds of clients (for example, Web or Swing-based applications) can use the same services, providing a consistent application architecture.
- Services also provide a very high leverage test point. QA (quality assurance) suites can efficiently and exhaustively test services to improve the quality of the application.
A careful layering of the architecture and the development of a robust services layer will form the foundation for a viable, extensible J2EE application.
Transactional Model [back to top]
The choice of transaction model has strong implications for application scalability and performance. The goal of transaction control is to permit sharing of business objects and data while preserving data integrity. However, an inefficient transaction model can leave critical data unavailable for long periods and seriously degrade system performance. On the other hand, inadequate control may lead to an inconsistent view of business information among users and applications.
One of the most basic and important decisions in designing an e-commerce system is to choose between the long and short transaction models. Figures 7a and 7b are visualizations of long and short transactions, respectively.
Long vs. Short Transactions [back to top]
A long transaction involves the allocation of a database connection for the duration of a business use case2. A user invokes the transaction through the application. The application obtains a database connection and begins a transaction. The transaction remains open while the user retrieves, reviews, and updates objects. The transaction may involve multiple reads, updates, or object creations. During these operations, whole data sets may be locked and hence, unavailable to other users. Finally, when the user is satisfied with his work, he signals the application that he is ready to commit. The application issues a commit causing all the changes to be flushed into the database. At this point, the session is able to release all the database resources it was holding onto during the transaction.
Figure 7a. A Long Transaction
Figure 7b. A Short Transaction
In a short transaction, in contrast, a single application use case must be split across multiple database connection steps. Figure 7b represents the simple case of two database connection steps. The use case starts with the application going into transaction and reading the objects necessary to carry out the task. Once the data is read, the application goes out of transaction. The user then works with the objects outside of transaction, applying modifications and adding new objects as necessary. When the user is satisfied that the work is complete, he signals the application to commit his changes. The application then goes back into a transaction, checks for write-write conflicts (see discussion below), and commits the changes into the database.
Write-Write Conflicts [back to top]
Transaction control is one of the greatest challenges in designing distributed, multi-user systems. The real benefit of these systems is that users can share access to business information in real time. The challenge is that when many people or processes share access to business objects, it is possible for one user or process to change the state of an object while it is in the process of being used or changed by another, causing what we call a write-write conflict.
Let's look at an example of a write-write conflict and consider the consequences. Suppose we are designing an application that allows someone to update their bank balance on line. We've used a short transaction model, so the update is broken into two database transactions. Joe and Susan have a joint account, which they can both access on line. So now, Joe uses our application to update the account. First, the application reads all the objects involved (Person, Address, CurrentBalance, etc.) and goes out of transaction mode. Joe changes his address and deposits $10,000 to the bank account. The application tracks the changes as they occur. When Joe is ready to commit, the application goes back into transaction. When it checks the database, it may find that Susan has also changed the account objects in the interim. This is the write-write conflict problem.
When a write-write conflict occurs, the application must be able to detect the conflict and take the appropriate action. In some applications, we don't care about write-write conflicts. For example, if two customer support people were adding comments to the same trouble ticket, the comments could both be appended, in whatever order, with no harm done. In the above case, if Susan has changed only the phone number on the account, Joe may be able to commit his changes with no serious consequences. If Susan deposited $10,000 as well, and the application overwrites the new balance rather than adding Joe's deposit to Susan's, Joe stands to lose a lot of money. If Joe and Susan have both withdrawn money at the same time, they could overdraw the account. Either way, if the application overlooks or mishandles the write-write conflict, the result could make Joe and Susan very unhappy.
Transaction Model Tradeoffs [back to top]
What are the tradeoffs for each of these modes? In short, long transactions are less scalable, while short transactions can be more complex to program.
A long transaction avoids write-write conflicts by "locking" (blocking access to) objects that the application may change. Other processes have to wait until the object is unlocked, so the system is less scalable. On the positive side, building an application with long transactions is appreciably simpler than using short transactions. The application does not have to maintain update information, it can simply update the objects and the database will maintain the transactional context. At commit time, the database can do all write-write conflict detection.
A short transaction reads the object in one transaction, does its work, and then tries to commit its work, running the risk of write-write conflicts. Therefore, applications using the short transaction model must track changes to detect and respond to write-write conflicts. (One common response is to abort the commit, report the error to the user, and cause rework of the entire database workstep.) They cannot rely on the database to do transaction control for them. This adds to the complexity of the application but allows the application to minimize its usage of database resources. In fact, short transactions are a necessary technique to allow scalability in large-scale e-commerce applications.
As a general rule, all but the smallest applications should plan on using short transactions. For applications of a departmental scale (from tens to perhaps low hundreds of users), long transactions may be workable.
For applications built with EJBs, short transactions are the de facto model. EJB declarative transactions are designed around the use of short transactions. Long transactions are possible using EJBs, but the application can no longer take advantage of the built-in EJB transaction management. Instead, the application must use client-initiated transactions and the UserTransaction interface to manage its own transactions, and it will pay penalties in system scalability.
Write-Write Conflict Detection [back to top]
So how does an application detect write-write conflicts? Before trying to commit changes, an application must re-read the objects read earlier and determine whether they have been changed in the interim.
There are at least three techniques to determine whether objects have been changed:
- State comparison
If timestamps are used, an object's state information is expanded to contain a field for the timestamp. As part of the commit process, the application obtains a fresh timestamp and updates this field. So, when the object's state is committed into the database, the timestamp field indicates the time of the commit. To detect a write-write conflict, the application re-reads the object and compares the timestamp in the newly read object with the timestamp in the copy originally read. If they are the same, no write-write conflict exits. If they are different, the conflict does exist and remedial action must be taken. (Note that this only works if every application that can change the object also changes the timestamp.) Another technique, instead of re-reading the object, is to add the timestamp field along with the primary key to the WHERE clause that is being used in the UPDATE statement. If no rows are updated, then the row no longer matches the timestamp (or that primary key was deleted). This is more efficient than two I/Os to the database.
The counter approach is quite similar. The object contains a field for the counter. Each time the object is successfully committed, the count is incremented. When preparing for the commit, the application re-reads the objects and compares the current counter value with the one in its original copy of the object. Again, you must be sure that all applications which can change the object will increment the counter.
The final technique is the most complex. The object is re-read and then the state of the newly re-read object is compared with the original copy. This might involve comparing a number of fields and is therefore more expensive than the previous techniques. Additionally, for this to work, the application must keep its original copy of the object unchanged. This further complicates the application since it must now accumulate changes to the object in some other "container" and apply them at commit time. For these reasons, we recommend timestamps or counters for most situations. A variation on this technique is to check the attributes of the column for change; this technique adds overhead but provides the richness of field-level detection rather than coarser-grained object-level detection.
Object State Distribution [back to top]
At first glance, object distribution doesn't seem like it should be much of a problem. You got objects in the client, you got objects in the server. No big deal. But in actuality, it's a very big deal. To a large part, it determines the viability of a distributed multi-user system. Why? Let's explore of some of the issues.
For an object to take up residence in a client it must be serialized (all the data and methods converted to network-transportable chunks) and sent across the network. This involves three resources in the system. First, CPU and memory in the server are used to serialize the object and packetize it for travel across the network. Second, network bandwidth is used to actually send the packets to the client. Third, CPU and memory in the client are required to unpacketize, deserialize, and reconstruct the object graph. (We know, there's probably no such word as "unpacketize." You know what we mean.) Hence, the movement of an object from server to client comes at a fairly high cost of system resources.
Let's step back and look at the problem in the large picture. Suppose a large e-commerce application is servicing 10,000 concurrent users. Objects must be serialized and sent to each user. If the average object graph sent to a client were 3 KB per Web hit, then to service all the clients for one hit, the system would have to send 30 MB of information. Because large-scale systems serve many users, relatively small increases in the amount of data sent to any single client is magnified thousands of times. These seemingly small increases can have a significant impact on total system throughput.
So, to minimize serialization overhead, we should always invoke methods from the client and have them execute using objects on the server, right? Sorry. It's not quite that simple. The EJB short transaction model basically precludes the use of remote domain objects (that is, objects that remain strictly in the server and have their methods invoked remotely from the client). The problem is that in the short transaction model, objects can be created and used in a transaction, but once the transaction commits, the objects are released. The only way to pass a remote reference to an object is to wrap it in an entity bean. If you do that, you potentially open yourself up to a long transaction model with all the scalability problems that that creates.
The following sections discuss the tradeoffs between serialization and remote object references and other issues with state distribution, such as security and schema hiding. Finally, we will present some general guidelines for object distribution.
Serialization vs. Remote Object References [back to top]
When an object is serialized in Java, by default3, it is the transitive closure of the object that is serialized. (The transitive closure is the complete set of all the objects to which an object holds references, and all the objects referenced by those objects, ad infinitum, ad nauseum.) This means that if you're not careful, you might wind up serializing a lot more data than you intended. Consider a Person object that has fields for Address and Name. The Address object might contain a State object. The State object itself might be a complex object that contains demographic information, and it might hold a reference to a collection of all states to which it belongs, and so on. If you attempt to serialize the Person object using the default mechanism, you will wind up serializing (by default) this entire network of objects, including the collection of all states and all the objects they reference (see Figure 8.) Clearly, this is undesirable in some cases and must be managed.
Figure 8: Transitive Closure
Now, instead, let's consider the case of using remote objects via entity beans. Suppose clients can gather data for display by calling services that return references to entity beans representing Persons and Address objects. For each user interface widget or HTML tag that displays a single field of information, a remote invocation on the entity bean would be required to get that field's value. For the simple case of a person's name and address, six remote invocations might be required to produce the entire display.
Each remote invocation comes at some cost: there is a minimum response time to access even the simplest type of data the cost of a network round trip. When these response times are aggregated for an entire screen, the result is a very sluggish, poorly performing user interface. This kind of "chatty" architecture also places a heavy burden on the application server and impedes scalability.
Every application is different, so you must weigh the costs of serialization versus remote object invocations. If you serialize, make sure you consider transitive closure and know what you're serializing. If you use remote invocation, consider the impact of "chattiness" on the network.
Security Issues in Object Distribution [back to top]
Security also enters into object distribution considerations. Should some objects never be distributed out to a client? Should parts of an object graph be pruned of sensitive data before an object is distributed? Should certain types of objects only be distributed to certain qualified users? Consider, for example, a hospital system. Perhaps you want to allow a clerk from accounting to examine a patient's financial records but disallow any access to medical records. And even if medical personnel were able to examine medical records, you might not want to replicate them into the client for security reasons. All these issues need to be considered and factored into decisions made about where objects should exist.
Object Distribution and Schema Hiding [back to top]
Another determinant on object distribution is the issue of schema hiding. When objects are serialized and sent to the client, the schema is being exposed. This may not be appropriate. If the client is not directly dependent on the schema, then the schema can change without client code impact. This is a particular advantage when you have an object model that is used by multiple applications. Changes can be rolled into the schema to support one application and be completely transparent to another. (Although some of us think that to achieve true schema hiding, you need to go to a key/value paradigm such as XML. A topic for another day perhaps.)
One Good Approach to Object Distribution [back to top]
In a service-based architecture, the services encapsulate the domain model. This leads to a very flexible and performant mechanism to handle object distribution: domain object state holders. Here's how they work.
When a client invokes a service, the service manipulates the real domain objects on behalf of the client. What it returns to the client is not necessarily a domain object. The service can define "state holders4", basically containers into which relevant information is placed for shipment to the client. These containers can hold either direct state information or information derived from domain objects. State holders can be defined for different service methods whenever and wherever it is inappropriate to serialize and return domain objects. Once a client is finished with the state holder, it can be discarded.
Figure 9. Domain Object State Holders
In assembling information in a state holder, a service can prune large objects into smaller subgraphs or remove sensitive information, addressing scalability and security issues and hiding the schema from the client application. (Alternatively state holders can be implemented as nested key/value pair representations la XML). This gives you the ultimate schema hiding and de-coupling of server object model from client representation.
General Good Practices for Object Distribution [back to top]
In a service-based architecture, the service is the natural mapping point for all object distribution policies. So the various services in a system not only vend business behavior, they also become the controllers of object distribution.
We offer the following general guidelines for object distribution:
- Distribute as few objects as possible. Be parsimonious. Each object passed to the client is a cost to the application. This cost is magnified by the number of concurrent users of the system.
- Distribute as few remote references (that is, entity beans) as possible. Each remote reference passed to the client risks a response time cost.
- Don't serialize large domain object graphs unless you really need all the state on the client. (And when you do serialize, be sure you're aware of everything you're serializing.)
- Centralize your object distribution policies in your services.
- Prune large object graphs and only send relevant information to the client. For many operations, the client needs only some basic identity information about the object for selection purposes. The full-blown state of the object is only needed for display or editing purposes.
- Do not distribute large collections of objects. They need to remain on the server while some lightweight representations or subsets are made available to the client (see discussion in next section).
- We have found that in most cases, state holders the best choice for object distribution. But state holders add complexity to an application they are one more set of classes to maintain. So whenever possible and appropriate, serialize domain objects rather than using state holders.
Using Object Identifiers [back to top]
The object identifier (oid) is an extremely useful concept in an object system, because they can be a relatively cheap and performant way to look up objects, and they can help cut down on the overhead of object distribution. An oid is a simple, unique key, typically a Java int or long, assigned to a domain object when it is brought into existence, The oid remains with the object throughout its lifetime. If the application is built on an RDBMS, the oid can be the primary key for a table representing an object. Oids can also be used as foreign keys in tables representing objects containing other objects, and oids can be sent down to the client as a placeholder for the real object. (In dealing with a legacy database, the primary key of the table could be used as an oid.)
The bottom line is that using oids in an application is a very good idea. Consider the following problem. You are writing a banking application. As part of the application, you need to display a list of checking accounts numbers in a listbox, have the user select one, and then open a window with all the relevant information about the account that has been selected. How do you go about this?
Just to make it interesting, suppose there are 5,000 accounts being managed by this application. It is immediately clear that it is not desirable to replicate 5,000 complete account objects onto the client. Instead, this is the strategy (and, we admit it, there are other strategies). Create a small class, called an identity holder, that holds two fields: accountNumberString and an oid for that account object. You then replicate these small, lightweight objects to the client and have it populate the listbox using the strings. When the user selects an account number, the corresponding oid is sent to the server in the invocation of a method that looks up and returns the account object or its state holder. The application can then paint the screen with the account information.
An application needs two resources to use oids effectively: a service that vends unique oids and a domain object factory. The oid vending service must guarantee that each oid it hands out is unique, so it may need to keep some persistent data indicating what oids have already been used. It is a good idea to use long (64-bit) variables for oids, because this virtually guarantees that you will never run out of them. (Remember Y2K!) An int might be exhausted over the lifetime of a system unless the application took on the added complexity of recycling oids.
The domain object factory must be used to manufacture all domain objects. Its responsibility is to create an instance of the needed type of object and to assign each one an oid. If all domain objects are created by one factory, you can guarantee that each will have a unique oid.
Scaling Techniques [back to top]
Earlier in this paper we talked about the transaction model and how it affects scalability. This section will cover several more topics related to scalability. Scalability permeates virtually every design decision made in a system. A thousand small implementation details done well can add significantly to the overall scalability of a system, and a few details overlooked can be deadly to system performance. Therefore, O Reader, be ye vigilant.
Stateful versus Stateless Session Beans [back to top]
EJB offers two forms of session beans: stateful and stateless. The choice between these can significantly affect system scalability. Each has strengths and weaknesses. Stateful session beans provide a convenient stash for conversational state in the application. Consequently, they make an application easier to program. But stateful beans have their dark side. Because they hold state, they must be mapped one to one with application clients. Thus, if an application has 10,000 concurrent clients it must manage 10,000 stateful session beans. How many beans can a container manage and maintain acceptable performance? Well, more than you might think, but not too many. To help with this problem, stateful beans have a complex lifecycle that includes the possibility of the passivation and activation by the container. But passivation and activation are not without cost. The bean's state information must be committed into and recovered from the database, and the bean must be reinstantiated and populated upon activation.
By contrast, stateless session beans are more complex to program: if an application must save conversational state, this information must be accommodated somewhere else in the application architecture. For example, the application might commit state information into the database or stash it in some other artifact in the system (HTTP session state, etc.). The important advantage of stateless beans is that they can be shared among many clients. The basic approach is to create a pool of the stateless bean references. Clients grab a bean reference out of the pool, use it for a method invocation, and return it to the pool for another client to use. Using this scheme, 10,000 concurrent users might be served by a few hundred beans5.
Systems using stateless or stateful beans will ultimately differ in their ability to scale. Because stateless beans can be multiplexed across many clients, systems built on stateless beans can scale to larger number of users. However, stateful beans are a very legitimate choice for small applications (those serving a few hundred concurrent users). By contrast, large-scale applications (those serving many thousands of concurrent users) clearly need to use stateless beans in order to scale. So what of the gray area we run into with applications serving a few hundred to a few thousands clients? At what point do you say that stateful beans are too costly and move on to stateless? Unfortunately, there is not wide industry experience with this technology and so the answer is unknown. Over time, metrics will arise to help guide this decision. In the meantime, perhaps the best rule of thumb is that if the number of users could possibly ever, in your wildest dreams grow beyond a few thousand, go stateless.
Entity Beans versus Java Classes [back to top]
A similar decision arises around how to represent domain objects, with entity beans or simple Java classes. An entity bean, like a session bean, is a heavyweight, remotable object that requires the EJB container to manage its lifecycle and maintain extra artifacts related to its distribution. If an application has thousands of concurrent users and each user interacts with many tens of entity beans in typical usage, the application server's Java VMs could become saturated with entity beans. The server could spend many cycles activating and passivating these beans to share available memory resources. Entity beans are very helpful in synchronizing the state of a domain object representation with persistent storage at well-defined lifecycle and transaction points. However, they do so at some cost in server resources.
JSPs and JSP Beans [back to top]
JSPs and JSP beans should be seriously considered as an alternative to stateful session beans. JSPs nicely partition the presentation (HTML) and application layers in a system. The JSP contains the HTML necessary to lay out a page, and makes invocations on JSP beans to return dynamic content. The JSP beans live in the application layer. They are a good place to hold onto conversational state outside the transaction semantics of the EJB container. (HTTP session state is another good place to do this.)
Connection Pooling [back to top]
Database connections are expensive resources in any system. Like any expensive resource, you want to get the most out of them. One consequence of the EJB short transaction model is that database connections are used only for very brief intervals. This allows you to use connections very efficiently by multiplexing each connection across many users. You do this by creating a pool of database connections at system initialization. When a client needs to access the database, it pulls a connection from the pool, goes into transaction, and commits the result. The connection is then returned to the pool for other clients to use. Connection pooling allows applications to scale to support large numbers of concurrent users. EJB server/containers such as GemStone/J typically provide JDBC connection pooling as a matter of course.
Large Query Results [back to top]
Management of query results presents yet another scalability issue in distributed applications. When a query is issued (particularly if it is built from user-supplied requests) there may be no way to anticipate how much data will be returned. The result set might include a huge amount of data, so you don't want to blindly replicate the query result of a query down to the client. The best approach is to divide and conquer through a technique we affectionately call "chunking". With this technique, the results are wrapped in a cursored enumeration object, and the client communicates with this object to obtain what it needs The client is initially sent a "chunk" of results of a preset size, then it requests more chunks as appropriate.
However, remember that the EJB short transaction model does not allow the cursored enumeration object to be remote from the client. Therefore, we need to wrap the enumeration in a bean and pass back a reference to this bean for the client to interrogate. What kind of bean should you use? Well, it depends. If the underlying persistence mechanism is an RDBMS, then the results will be accessible through a JDBC result set. This is best wrapped in a stateful session bean. (Note that this requires the associated JDBC connection to be dedicated to the user throughout the manipulation of the result set.) If the persistence mechanism is an object database, then the results would be held in a collection object and could easily be wrapped in an entity bean6.
Object-to-Relational Mapping Issues [back to top]
If the persistence mechanism used for an application is an RDBMS, then you must face the issue of object-to-relational (O/R) mapping. Objects inherently exist as interconnected networks. Relational databases must reduce everything to rows and columns in a table. This is the quintessential problem of O/R mapping. Conceptually, the problem is clear. But finding solution is more difficult and subtle than you might think.
There are two basic approaches to O/R mapping. Fortunately, the choice between them is obvious, as one is pretty good and the other is very bad. The first approach is to embed SQL queries and population of object state directly into business methods. This is a bad thing because mapping semantics are scattered throughout the code base, so maintenance becomes a nightmare when the schema rolls. We've seen systems built this way where the developers were reduced to searching the entire code base for SQL constructs to figure out what needed to be changed. So don't go there!
The second approach is to build (or buy) a framework that provides the services necessary to implement O/R mapping. The framework partitions the mapping logic and machinery away from the business logic, and centralizes it in one place so that it is easier to maintain. This is a good thing. There are several other major issues to be addressed in O/R mapping:
- The creation and maintenance of metadata
- The detection and management of dirty objects
- The management of object identity
Managing Metadata: Mapping Objects to Tables [back to top]
Metadata is the information that describes how a particular field in a class is mapped into a table and a row in the database. Metadata can be managed either implicitly or explicitly.
Implicit mapping is generally implemented using matching naming conventions for the object and its corresponding table in the database. Suppose you have a class Person with fields firstName, lastName, age, sex, and address. This class could be implicitly mapped into an RDBMS in a table named Person with columns: named firstName, lastName, age, sex, and address. The advantage of implicit mapping is that there is no external data to maintain and it can be automated. (See the FoodSmart application on the "Developer's Guide" CD for an example of this). There are also several disadvantages:
- The class and RDBMS schema can become skewed. This problem can be addressed through automation, using the Java reflection interface and JDBC's metadata capability to detect and report skews.
- It is not always possible to have a one-to-one mapping between classes and tables, particularly if legacy systems are involved. Also, if you are O/R mapping against a legacy database, the metadata may be very complex. When you design a database for a new application, you are likely to design database tables to map closely to objects. With legacy systems, since the tables were not originally designed with the object model in mind, state information for any given class may be scattered across multiple tables.
- Objects that contain other non-primitive objects and/or collections (especially those representing many-to-many relationships) complicate the needed mapping. It becomes necessary to recursively populate such object graphs. (For example, when populating a Person object, you need to get its Address object, its Employment object, etc.) Furthermore the issue arises of when to populate sub-objects. Are the child objects populated at the same time as the parent, or do we defer this until the children are actually needed?
Explicit mapping requires that the metadata be declared and stored somewhere (most likely as more database tables or as static data in your Java classes). The metadata specifies what column in what table is mapped to what field in what class. This approach simplifies the mapping task when legacy systems are involved, because there is no direct correlation between classes and tables. However, the metadata becomes another system artifact that must be maintained. Further, you then have to maintain a three-way synchronization between classes, tables, and metadata.
In general, we recommend that you use implicit mapping when there's little likelihood of change and no legacy system, and use explicit mapping elsewhere. A combination of techniques may also provide a nice compromise. For example you could have a static method, getMappings(), that defaults to using introspection/reflection, and can be overridden with an explicit mapping as required.
Object Dirtiness [back to top]
A second issue that must be addressed in an O/R mapping framework is object dirtiness. An object is said to be dirty when a change has been made to the application's copy since it was first accessed or loaded. (In other words, the application's view of the object's state is different than the original view taken from the database.) At commit time, all dirty objects need to have their state flushed back into the database. (This is the value added by entity beans.)
The simplistic approach is to flush the state of all persistent objects back into the database, but this hurts scalability because you are doing useless rewrites of non-dirty objects. For example, say you have a collection of 500 products that has been instantiated in the server. A user changes one product. If you flush back all 500 products, you could be taking a substantial performance hit. The right solution is to write back only the dirty object. But how do you manage this?
First, objects must be able to detect that they are dirty. Second, the application must be able to track the set of objects dirtied in a particular user's transaction. Here's one way to do this:
- Go to the parent class of all the domain objects (oh, by the way, you should have one of these guys and its name should be something like DomainObject).
- Add a boolean field called isDirty and set its default to false.
- Define a method on DomainObject called markDirty().
- In the method, set isDirty to true and include logic to add "this" to the dirty pool (we're coming to that).
- Create a class called DirtyPool with a field containing a HashSet. Implement methods on it to add an object to or remove one or all from the HashSet. (We have put dirty pools in thread local variables, which seems to work out well.)
- Now, for each domain object, write getter and setter methods for its public fields. In the setter method, call markDirty(). The object is then marked and added to the DirtyPool.
Now, at any point during execution, all the currently dirty objects are contained in the DirtyPool. At commit time, the application can remove the objects from the DirtyPool and flush their state back into the database. Et voil, we have solved the problem! (Well, okay, we've given you the basic approach. Admittedly, the devil is in the details.)
Object Identity [back to top]
Object identity becomes important in O/R mapping. To understand why, consider the following example. Suppose you have three objects in the system: A, B, and C. A and B both have references to C. The application maps in A, and because A contains C, C is also instantiated. Next, the application maps in B, and because B also contains C, the application instantiates a new copy of C. Now, there are two copies of C in the server in two different object networks (A-C and B-C.) This is inefficient and the identity of C has been compromised which is the "real" C? Depending on the application's design, it could lead to failed transactions or inconsistent results. We want a single copy of C with both A and B referencing it.
We can create a shared copy of an object by maintaining a registry of persistent objects that are currently in the server. When an object is brought in by the O/R mapping machinery, the machinery registers the object in the registry using its oid. To return to our example above, before loading in A, we would look up A's oid in the registry to see if A is already present. If it's not present, we map in A and register it. But now we recognize that we need C. It's also not in the registry, so we map and register it. We do the same for B, but in mapping B we check the registry which recognizes that C is already present and passes back a reference to it for B. Using this scheme, we have only a single copy of C with A and B correctly pointing to it. Gosh, isn't object identity handy?
A Shameless Commercial Moment [back to top]
Hopefully the discussions above have given you some feeling for the complexities of building an O/R mapping framework, and some workable solutions. As object bigots, we cannot resist noting that if you use an object storage capability, such as GemStone/J's persistence cache, you totally bypass all these issues.
Where Should You Go from Here? [back to top]
This paper has touched briefly on some of the most important design issues in building scalable distributed object systems. Remember, most of your design challenges come down to this: if you have a distributed application, things have to inter-operate. Stimulus comes from one place and response another. There are two ways to handle this. One is to remotely manipulate objects in the server. The other is to distribute data and put more responsibility on the client. Remote invocations are not cheap, but there is also a cost to passing state out to clients. The trick is to find the balance. So, go sparingly amid the noise and haste, and make each process do its logical part. Be parsimonious about handling out remote references. Be parsimonious about handing out state.
The design issues we've covered are the tip of the iceberg. So, where do you go from here? We suggest you use the topics in this paper as a checklist for your project. Start by reviewing your intended architecture:
- Partition the system into the five layers we've suggested and consider which objects should live where. Identify needed services and the service managers they should live in.
- Define the services API.
- Review the nature of your transactions and decide how (or if) you need to detect write-write conflicts and how you will do that.
- Examine your domain model and determine what objects need to be distributed and how. Should you distribute the entire object? A state holder? A distribution-only object that derives state from several different domain objects?
Determine the scaling needs of your system:
- How many concurrent users must you support?
- Can you go with stateful beans?
- If you use stateless beans, will you need to pool them?
- Do you need to pool database connections?
- How will you handle large query results?
This is all infrastructure needed to scale the system. Identify what you need and get it built and tested early on.
Next, consider what will be your object persistence mechanism:
- An RDBMS? An object database?
- If it's an RDBMS, your entire approach to O/R mapping needs to be thought through. Will you build it? Will you buy it?
- If you're building on top of legacy databases, how will you map the needed object state from the existing tables?
- If you are minting a new database, design the RDBMS schema to smoothly support the object model.
In Conclusion [back to top]
If you take the time to think through the answers to these questions, you will be well on the way to designing a scalable, maintainable J2EE e-commerce system. These systems are powerful and complex, your business results ride on them, and you will live with the results of your design decisions for many years to come. Therefore, there's no design decision so small it's not worthy of careful consideration. So start thinking....
And if you need help, just give GemStone Professional Services a call.
Glossary [back to top]
- Dirty object: An object held by an application, whose state has changed since the it was first loaded from the database. (In other words, the application's view of the object's state is different than the original view taken from the database.)
- Domain: The real-world environment modeled by a business application.
- Domain object: Objects representing things in the business domain, such as people, products, bank accounts, production lines, vehicles, etc.
- HTML: HyperText Markup Language. A markup language used to control the presentation of content in a Web page. HTML is an SGML (Standard Generalized Markup Language) DTD (Document Type Definition).
- HTTP: HyperText Transfer Protocol. A protocol used to exchange information between a Web browser and a Web server.
- JSP: Java Server Page. A specialized form of HTML page that supports tags to call out to JSP beans to generate dynamic page content. JSPs effectively separate HTML from the business logic needed to generate dynamic content.
- JSP bean: A specialized Java bean invoked by a JSP page.
- Metadata: Information describing how a particular field in a class is mapped into a table and row in a database.
- Object graph: A set of connected objects. An object graph starts with a selected object and contains objects to which that object holds direct or indirect references.
- OODB: Object-oriented database. A database that persists objects in their native form, avoiding the need for object-to-relational mapping.
- RDBMS: Relational database management system. A database that stores its information in tables of rows and columns.
- Semantic validation: Validation that an input value makes sense in the context of the application. For example, the presentation layer of a system may check that input to a field of type string (syntactic validation). The domain layer does semantic validation to confirm that the string is a valid account number or product code.
- Serialization: Conversion of an object to a serial stream of data for distribution over a network.
- Servlet: A Java program that implements the Java servlet interface. Servlets are run in servlet engines to produce pages with dynamic content. In contrast to JSPs, servlets are raw Java and generate their HTML output through manipulation and concatenation of strings.
- Transitive closure: The complete set of all the objects to which an object holds references, either directly or indirectly. (In other words, a Person object may hold reference to several Child objects, each of which hold references to School objects, which hold Address objects, etc. The transitive closure of the Person object includes all of these.)
- Use case: A task that is part of the business process, for example, renewing insurance policies in an insurance application.
- Workstep: Work that is done within the context of a single database transaction. In the long transaction model, workstep and use case tend to coincide. In the short transaction model, the use case must be broken into two or more worksteps.
Bibliography [back to top]
Bicknell, Barbara and Kris. The Road Map to Repeatable Success: Using QFD to Implement Change. CRC Press, 1995.
Davis, Alan. Software Requirements: Objects, Functions, and States. Prentice Hall PTR, 1993.
Cockburn, Alistair. "Structuring Use Cases With Goals." Journal of Object-Oriented Programming, Sep./Oct. 1997 [part 1] and Nov./Dec. 1997 [part 2]. http://members.aol.com/acockburn/papers/usecases.htm
Constantine, Larry and Lockwood, Lucy. Software for Use: A Practical Guide to the Models and Methods of Usage-Centered Design. Addison-Wesley, 1999.
Jacobson, Ivar et al. Object Oriented Software Engineering: A Use Case Driven Approach. Addison-Wesley, 1992.
Weigers, Karl. Software Requirements: A Pragmatic Approach. Microsoft Press, 1999.
Whitenack, Bruce. "RAPPeL: A Requirements-Analysis-Process Pattern Language for Object-Oriented Development" in Pattern Languages of Program Design (James Coplien and Douglas Schmidt, eds.). Addison-Wesley, 1995.
Distributed Object Design
Alpert, Stephen et al. The Design Patterns Smalltalk Companion. Addison-Wesley, 1998.
Brown, Kyle et al. "The Component Design Patterns Pattern Language" on the WikiWikiWeb. August, 1998. http://c2.com/cgi/wiki?ComponentDesignPatterns http://c2.com/cgi/wiki?WelcomeVisitors
Fowler, Martin. Analysis Patterns. Addison-Wesley, 1997. (See esp. Chapters 12 and13.)
Howard, Tim. The Smalltalk Developer's Guide to VisualWorks. SIGS Books, 1995. (In Chapter 11, Howard enumerates and discusses the basic characteristics, types, and responsibilities of domain objects.)
Mowbray, Thomas et al. CORBA Design Patterns. John Wiley & Sons, 1997.
Stewart, R. "Large CORBA Applications." September 2, 1997. (This comp.object.corba posting and the surrounding thread are retrievable from DejaNews at http://www.deja.com/.)
Domain Model Design
Coad, Peter et al. Java Modeling in Color with UML. Prentice Hall, 1999.
Fowler, Martin. Analysis Patterns. Addison-Wesley, 1997.
Wirfs-Brock et al. Designing Object-Oriented Software. Prentice Hall, 1990.
Coram, Todd and Lee, Jim. "Experiences - A Pattern Language for User Interface Design" at http://www.mindspring.com/coram/papers/experiences/Experiences.html
Rechtin, Eberhardt. Systems Architecting: Creating and Building Complex Systems. Prentice Hall, 1991.
Shaw, Mary and Garlan, David. Software Architecture: Perspectives on an Emerging Discipline. Prentice Hall, 1996.
Booch, Grady. Object Solutions: Managing the Object-Oriented Project. Addison-Wesley, 1996.
Coplien, James. "A Generative Development-Process Pattern Language" in Pattern Languages of Program Design (James Coplien and DouglasSchmidt, eds.). Addison-Wesley, 1995.
Cunningham, Ward. "EPISODES: A Pattern Language of Competitive Development" in Pattern Languages of Program Design 2 (John Vlissides et. al, eds). Addison-Wesley, 1995.
Goldberg, Adele and Rubin, Kenny. Succeeding with Objects: Decision Frameworks for Project Management. Addison-Wesley, 1995.
Yourdon, Ed. Death March. Prentice Hall PTR, 1997.
Design Patterns and Other Forms of Knowledge
Smith, David and Coplien, James. "When to Use Pattern Form" on the WikiWikiWeb. November 1, 1998. http://c2.com/cgi/wiki?WhenToUsePatternForm http://c2.com/cgi/wiki?WelcomeVisitors
Roll Credits.... [back to top]
The Advanced Application Architecture Team will be bringing more good stuff your way soon. Visit the GemStone Web site and order the Developer's Guide. We will be publishing more design information and other developer goodies on a continual basis. Soon our entire pattern language will be available on-line.
One last note about our team: we're consulting road warriors, and we like to help people build good systems. So call us any time. Maybe we'll discover some new best design practices together.
Contributing Team Members
BooBoo (a GemStone family dog)
For More Information [back to top]
For more information about GemStone/J, please call 800-243-9369 or e-mail firstname.lastname@example.org.
Footnotes [back to top]
1You can get a copy of, "A Developer's Guide to iCommerce Success with J2EE", at GemStone's Web site. (Back to #1)
2In the database world, there is another meaning sometimes associated with the phrase "'long transaction". In that case, it refers to a mode of operation where data is essentially checked out of the database to be worked on for an extended period of time, perhaps days or weeks. An example of this might be a CAD system wherein users check out circuit board designs. (Back to #2)
3The programmer can change the serialization scheme on a class-by-class basis by redefining readObject() and writeObject(). However, each object has one and only one serialization behavior. (Back to #3)
4Other names commonly associated with such objects are "data transfer objects" or "replicates". Monson Heafel calls them "bulk accessors". (Back to #4)
5There are many choices as to where to save state, each with varying degrees of scalability. Perhaps the most scalable solution is to push state out to the client and pass it back in to the server when you need it. (This is also the hardest to implement.) The worst case is that the saved state has to be persistent. (For example, Amazon.com saves the contents of your shopping cart for 30 days.) What's important is not to save state if you don't have to. If you do have to save state, don't make the saving transactional, if you can avoid it. (Back to #5)
6An alternative approach is to re-issue the query, picking up from the last item streamed to the client on the previous call. This is a more stateless approach. However, queries can also be very expensive, especially if you have tens of thousands of clients doing repeated queries. (Back to #6)
Copyright 2000, GemStone Systems, Inc.