|
iCommerce
Design Issues and Solutions
or
How to Build a Really Big Distributed J2EE System Using Tools
You Have Around the Office
by:
The Advanced Application Architecture Team (A3T)
GemStone
Professional Services White Paper Series
"Your
mission, should you decide to accept"
[back
to top]
Your company has been sailing along current course and speed for quite
some time. Suddenly, the seas change the Internet vortex looms
on the horizon. Your traditional way of doing business is now threatened.
Your company desperately needs "Web presence." This desperation is quickly
translated into a need for complex, scalable e-commerce applications.
There is talk of tens of thousands of Web hits per hour, thousands of
concurrent users, on-line catalogs, shopping carts, e-mail, dynamic content,
cookies (not the chocolate chip kind).... Persons high on the technonerdity
scale suddenly appear in the corporate hallways. (Boggle factor cuts
in here and the mind implodes.)
Question:
How do you take all the technobabble and all the business needs and actually
build the e-commerce system? That is the subject of this paper. Whether
you're a manager, designer, or down-in-the-bits developer, we hope this
paper will help you understand the issues involved in building large,
scalable Java 2 Enterprise Edition (J2EE) e-commerce systems, and that
it will familiarize you with some design patterns to address those issues.
We'll start at a high level and spiral down, so on any section, just read
for as much detail as you need. The idea is to jumpstart your system planning
and design and help you avoid several traps we've learned about
through experience. Read on, and hopefully you'll find some answers to
questions you don't yet know you have.
Who
Are We, Anyway? [back
to top]
GemStone has been in the distributed object world for well over a decade,
first with Smalltalk and now with Java. This document and FoodSmart,
the archetype J2EE application on the "Developer's
Guide" CD1,
are brought to you by GemStone's Advanced Application Architecture Team
(A3T), a group of dedicated folks with rich experience in the design of
distributed object systems. In designing FoodSmart, we took our collective
experience with distributed systems of all kinds and "translated" it into
appropriate Java paradigms and patterns, discovering some new patterns
along the way. The ideas in this paper are a stake in the ground. They
are offered to the technical community as a basis for design and discussion,
in the belief that they represent a good starting point in the development
of large-scale, distributed e-commerce J2EE systems.
What
Makes Large-scale Internet Commerce Systems Complex?
[back to top]
We all love the idea of big distributed systems, right? They're what "enterprise"
computing is all about. They provide lots of different people a consistent
view of the business at the same time. They help us collaborate, streamline
processes, grow our business with fewer resources. And, joy of joys, they
open up new business opportunities, the most notable of which is the possibility
of making money in new ways over the Internet.
So we
just quickly produce a nice big distributed system and we're done, right?
Wrong. There's an ugly truth out there: performant, large-scale systems
are complex to design and implement. This was true in the mainframe days
of on-line transaction processing (OLTP), and it's still true in the brave
new world of e-commerce and distributed object systems. Even with the
power of J2EE, large-scale, transactional Web-based systems present complex
technical issues that must be addressed carefully.
What
makes for the complexity in these systems? The fact that they must tie
together information and applications from many business systems and processes
in a consistent fashion, quickly and in real time, for many concurrent
users. Let's consider how such a system serves its clients. Ultimately,
e-commerce applications result in a stream of HTML (and perhaps JavaScript
code) being delivered to a client Web browser. The challenge is in how
that delivery takes place. Here's a nutshell overview of the operations
and technologies involved:
- Modern e-commerce
sites generate many of their pages dynamically, using content assembled
from a variety of sources. For example, a retail site page might present
product descriptions and images from one or more databases, pricing
calculated by business applications, and availability and shipping information
from back-end business systems. In J2EE, the dynamic content is generated
by Java servlets or Java Server Pages (JSPs), so a servlet or JSP must
be initiated when a client hit arrives at the Web server (see Figure
1).
- To generate its
dynamic content, the servlet must either access data directly out of
a database or delegate these responsibilities to an application component.
Invoking a component incurs the overhead of component creation.
- Pulling data out
of the database involves querying, transporting the result set into
the object world across a JDBC interface and doing the O/R (object to
relational) mapping necessary to populate the needed objects.
- Depending on the
application, it might be necessary to retrieve data from one or more
legacy systems - each represented by a separate interface requiring
more component creation.
- The hit might result
in the need to commit data into the database. If several backend data
sources are involved, this would incur the extra overhead of a distributed
two-phase commit.
- Finally, methods
in the objects are exercised to produce the HTML stream in the servlet
that must then be routed back through the Web server and across the
network to the client.

Figure
1. Servicing a Web Hit: Bird's Eye View
As you
can see, servicing a Web hit is a lot of work. A large system incurs this
cost for each Web hit that requires dynamic content, so if the system
is required to support large numbers of users, scalability can become
an enormous challenge. Figure 2 shows how the complexity of an e-commerce
system increases with application complexity and scalability requirements.

Figure 2. Large Scale ==
Complex
To be
scalable, systems must constantly juggle the resources available to service
clients (CPU, memory, database connections, network bandwidth, etc.).
The more clients, the more difficult this juggling becomes. To handle
very large numbers of clients, resources must be shared, but sharing resources
adds complexity to the application.
- To speed data
access, systems must multiplex database connections and cache frequently
used data for sharing among users.
- To speed processing,
systems must carefully partition computational responsibilities between
clients and servers to maximize CPU potential.
- To distribute processing
responsibilities, systems must span multiple Java virtual machines (VMs)
and multiple physical machines.
- All of the above
design factors introduce issues of synchronization and concurrency,
object distribution and data integrity.
Somehow,
all these distributed resources must be coordinated to play together nicely.
To build a large-scale system, you really have to understand where the
performance hot spots are and how to address them. This is what makes
it so challenging to build scalable distributed systems, and why a good
design and a robust technology platform are critical.
Why
J2EE? [back
to top]
Why use J2EE and its Enterprise JavaBeans (EJB) components at all? Good
question. We've chosen this model because J2EE is intended to be a complete
platform for Web-enabled, multi-tier, secure, transactional Java applications.
The goals of J2EE include better quality, maintainability, and portability
of systems and increased productivity and economic return for businesses.
J2EE
is based on the component usage model. It provides complete services for
components and automatic handling of some application behaviors (such
as declarative transactions). The promise of the J2EE standard is that
third-party vendors will be able to market quality components that businesses
can buy and use to build systems faster and more cost effectively than
if they had to build their own infrastructure.
In this
paper, we are concerned with four major J2EE components types:
- Session beans (a
type of EJB)
- Entity beans (another
type of EJB)
- Java Server Pages
(JSPs)
- JSP beans
Session
beans are typically used to model business processes or tasks. A session
bean might, for example, model a set of e-mailing services or credit card
validation services. Entity beans more often model business objects in
the domain. An entity bean might represent a bank account, a customer,
a piece of inventory, etc. Entity beans provide a set of methods that
allow the state of the business objects to be managed throughout the bean's
lifecycle.
JSPs
are like HTML text combined with statements in a mark-up language.
JSPs contain special tags that allow them to invoke a JSP bean. JSP beans
are used to generate dynamic content that is returned to the Java Server
Page to be included in the stream of HTML that the JSP ultimately sends
back to a browser.
The roles
of all these components in an J2EE application will become clear as we
explore the design issues in this paper.
The
Anatomy of a J2EE Web Hit
[back
to top]
Figure 3 is a high-level architectural view of an e-commerce application
built with the GemStone/J application server with J2EE functionality.
This is one architecture recommended by the Architecture team (there are
others). Note the use of JSPs, JSP beans, and EJBs.

Figure
3. A J2EE Application Architecture
We'll
use this architecture to track the progress of a Web hit through the system
in more detail and understand what resources are used and/or consumed
in the process.
Let's
assume a customer is at an on-line bookstore. A Web hit begins in the
browser. In our scenario, the customer has found a book she wants to buy,
so she clicks on a button to put it in her shopping cart. This, of course,
results in an HTTP hit. Now the fun starts....
- When the customer
clicks on the "add to shopping cart" button, the browser creates a target
URL, appends any needed parameters (in this case, an identifier for
the book), then sends the packet over the network wrapped in the HTTP
protocol.
- When the packet
arrives at the Web server, it is unwrapped and examined. The server
recognizes it to be a request for a JSP, so it passes this request and
its parameters on to the servlet engine for processing.
- The servlet engine
finds the relevant JSP (compiling it if necessary) and spawns it in
a thread inside its Java VM.
- As the JSP executes,
it creates a JSP bean and then delegates the request to it.
- The JSP bean, in
turn, invokes the bean home of a EJB bean and obtains a session bean
(perhaps a ShoppingCartManager
bean). It then invokes a business method (putInCart)
on the session bean.
- The session bean
interacts with an RDBMS to obtain the state information for business
objects of interest (the customer's shopping cart), instantiates them,
populates their state (or, alternatively, they might be pulled out of
an object cache) and invokes relevant business methods on these objects
(addToCart,
updateOrder,
etc.).
- Once the business
methods complete, the flow of information reverses. The shopping cart
returns a business object or objects (the current cart contents) back
to the session bean, ShoppingCartManager.
- The session bean
either returns this directly to the JSP bean or re-maps the information
into a more suitable form before returning it.
- The JSP bean takes
the object(s) and returns them or some state information derived from
them to the JSP.
- The JSP incorporates
this information into the HTML stream that it is generating.
- Ultimately, the
generated HTML is streamed back over the network to the browser, which
then renders the result on the screen (the shopping cart contents).

Figure
4. Servicing a Web Hit: J2EE View
As you
can see in Figure 4, a J2EE Web hit has many functional layers, many moving
parts that must correctly interact. Each layer must be designed to implement
a specific set of responsibilities and have a clearly defined API. Within
the layers, the designer must partition these responsibilities, delegate
them to relevant objects, and coordinate resource and data usage to ensure
scalability and data integrity. To build a scalable system, you need a
distributed object architecture that is designed for performance.
Distributed
Object Systems Issues [back
to top]
In the section above, we've explored the software gymnastics of a Web
hit. In this section, we'll identify the major design issues that arise
when you sit down to implement the gymnastics. Our discussion is organized
around the following topics:
- Architectural approach
- Transactional model
- Object state distribution
- Object identity
- Scaling techniques
- Object-to-relational
mapping
Of course,
there are many other issues (you're probably thinking of a few right now),
and all design issues are important. However, we believe that the
issues addressed here have the most impact on total system viability.
Architectural
Approach [back
to top]
Software architecture is the bones of a system. It gives the system shape
and ultimately constrains it in many dimensions. A good architecture makes
a system scalable, extensible, and maintainable.
Why
Layered Architecture? [back
to top]
With all that's been written on software architecture in recent years,
one principle that seems to be generally accepted is the concept of layered
architectures, the separation of system responsibilities into functional
layers, each with its own responsibilities and its own API. Layering achieves
system flexibility in three ways:
- Encapsulation
Each layer can hide details about its operations from other layers.
Thus the layer can to evolve as needed behind a fixed API without affecting
its clients.
- Separation
of concerns Complexity in the system is easier to manage
because each layer is focused on a cohesive set of responsibilities.
- Reuse
Adding additional functionality is faster, because each layer
can provide services to objects in the layer above it. Furthermore,
classes in a given layer can inherit reusable behavior from a superclass,
thus abstracting the responsibilities of classes of that layer.
Layered
architecture leads to a more flexible and maintainable system. Layers
may be quite thin and have very little impact on system performance. Layers
can be changed with no effect on other layers, as long as the API remains
constant. If you've designed well, you could easily swap out an entire
layer to integrate a new data source or take advantage of a new technology.
J2EE
Layering [back
to top]
We believe a layered architecture is a good thing. Now, what is the right
layer stack for a J2EE e-commerce application? Figure 5 represents our
answer to this question. We recommend that the stack consist of five layers:
presentation, application, services, domain, and persistence. These layers
are physically split across the client and the server, and they are logically
partitioned into the J2EE Web container, EJB container, and the database.
The responsibilities of each layer are briefly summarized below.

Figure
5. J2EE Layering
Presentation
Layer
The presentation layer manages I/O interactions with the user. This
layer renders HTML, presents application data, intercepts user input and
does rudimentary application-specific range and syntactic checking on
it. In a J2EE application, this layer executes in a Web browser. Behavior
is either native to the browser or supplemented by JavaScript in the HTML
stream. (Note that J2EE supports other types of clients, such as applets,
applications or CORBA clients. In this paper, we are focusing on Web clients.)
Application
Layer
The application layer mediates the interaction between the presentation
and services layers. Services and domain objects in the lower layers may
be shared among multiple applications. This layer calls services to implement
the behavior of each individual application. Its primary responsibilities
are to adapt the distributed representation of the domain to the user
interface, to maintain conversational state for the presentation layer,
and to handle exceptions that occur during service invocation and that
need to be presented to the user.
Services
Layer
The services layer provides an API to the business use cases and utility
operations required by the application. The services manipulate the domain
objects and store and retrieve data, as appropriate, for the application.
Additionally, the services layer is responsible for converting objects
into their distributable representations. (See the discussion of service-based
architecture below for more detail on this.)
Domain
Layer
The domain layer models the abstractions in the application's problem
domain (for example, in an order entry system we have Orders, Products,
Vendors, etc.). Business rules and semantics are embedded in the domain
objects in this layer. This layer is responsible for the enforcement of
business rules and process; therefore, semantic validation of new information
takes place here.
Persistence
Layer
The persistence layer provides the mechanisms necessary to permanently
save object state. It provides basic CRUD (create, read, update, delete)
services and also deals with the object-to-relational mapping issues.
If persistence mechanisms other than relational databases are a possibility,
then very simple high-performance alternatives may be considered, e.g.,
GemStone/J's persistent cache.
Service-Based
Architecture [back
to top]
One key advantage to this layering model is that it enables the creation
of a service-based architecture. In a service-based architecture, groups
of operations or behaviors are clustered together in the services layer
under an API called a service object an EJB session bean. Each
service bean provides a suite of methods whose semantics are designed
around a single "theme". For example, consider a financial application
that must deal with major domain subjects such as accounts, customers,
etc. As indicated in Figure 6, this application might use services such
as AccountManagementService
or CustomerManagementService.
The theme of each service may be one of the major abstractions in the
domain model in this case, accounts or customers.
However,
not all service objects take the lifecycle of a domain object as their
theme. Another service object might encapsulate an external interface
to a legacy system or an external utility (such as mail or messaging),
or provide an essential singleton service, such as creating a timestamp
or an object ID. Still another might implement a cluster of use cases.
For services whose theme is a domain object, the service provides methods
that permit an application to manage the complete lifecycle of the domain
object (for example, createNewCustomer,
deleteCustomer,
modifyCustomer,
findCustomer,
etc.).

Figure
6. Service-based Architecture
What
are the advantages of a service-based architecture? First, the services
layer can be the encapsulation layer for the domain model. Clients interface
with the application domain model by asking for services, but they do
not touch actual domain objects. This has several ramifications:
- Services' methods
can take responsibility for transactions involving multiple domain objects.
This can lessen the need to replicate objects into the client and thereby
save on both processing and network bandwidth.
- Services permit
schema hiding, an important goal in designing a flexible system architecture.
When the client is shielded from the implementation details of the schema,
the schema can change without affecting the client's code. Thus the
schema can be changed as required to meet business or technical needs,
while the clients continue to do business as usual. This is especially
important in a distributed enterprise system, where one schema may support
many clients.
Schema hiding
can also improve performance by offering greater flexibility in object
distribution. When domain objects are accessed through the service
layer, a client has no idea whether the object it receives back from
a service is an actual domain object or a structure invented just
to return relevant state information. More importantly, the service
layer can control the distribution of object state to minimize network
traffic and optimize sharing of objects among users and applications.
- Many different
applications can share a suite of services, while also using services
that are unique to each application. Likewise, different kinds of clients
(for example, Web or Swing-based applications) can use the same services,
providing a consistent application architecture.
- Services also provide
a very high leverage test point. QA (quality assurance) suites can efficiently
and exhaustively test services to improve the quality of the application.
A careful layering
of the architecture and the development of a robust services layer will
form the foundation for a viable, extensible J2EE application.
Transactional
Model [back
to top]
The choice of transaction model has strong implications for application
scalability and performance. The goal of transaction control is to permit
sharing of business objects and data while preserving data integrity.
However, an inefficient transaction model can leave critical data unavailable
for long periods and seriously degrade system performance. On the other
hand, inadequate control may lead to an inconsistent view of business
information among users and applications.
One of
the most basic and important decisions in designing an e-commerce system
is to choose between the long and short transaction models. Figures 7a
and 7b are visualizations of long and short transactions, respectively.
Long
vs. Short Transactions [back to
top] A long transaction involves the allocation of a
database connection for the duration of a business use case2.
A user invokes the transaction through the application. The application
obtains a database connection and begins a transaction. The transaction
remains open while the user retrieves, reviews, and updates objects. The
transaction may involve multiple reads, updates, or object creations.
During these operations, whole data sets may be locked and hence,
unavailable to other users. Finally, when the user is satisfied with his
work, he signals the application that he is ready to commit. The
application issues a commit causing all the changes to be flushed into the
database. At this point, the session is able to release all the database
resources it was holding onto during the transaction.

Figure 7a. A Long Transaction

Figure 7b. A Short
Transaction
In a short
transaction, in contrast, a single application use case must be split
across multiple database connection steps. Figure 7b represents the simple
case of two database connection steps. The use case starts with the
application going into transaction and reading the objects necessary to
carry out the task. Once the data is read, the application goes out of
transaction. The user then works with the objects outside of transaction,
applying modifications and adding new objects as necessary. When the user
is satisfied that the work is complete, he signals the application to
commit his changes. The application then goes back into a transaction,
checks for write-write conflicts (see discussion below), and commits the
changes into the database.
Write-Write Conflicts [back to
top] Transaction control is one
of the greatest challenges in designing distributed, multi-user systems.
The real benefit of these systems is that users can share access to
business information in real time. The challenge is that when many people
or processes share access to business objects, it is possible for one user
or process to change the state of an object while it is in the process of
being used or changed by another, causing what we call a write-write conflict.
Let's look
at an example of a write-write conflict and consider the consequences.
Suppose we are designing an application that allows someone to update
their bank balance on line. We've used a short transaction model, so the
update is broken into two database transactions. Joe and Susan have a
joint account, which they can both access on line. So now, Joe uses our
application to update the account. First, the application reads all the
objects involved (Person, Address, CurrentBalance, etc.) and goes out of
transaction mode. Joe changes his address and deposits $10,000 to the bank
account. The application tracks the changes as they occur. When Joe is
ready to commit, the application goes back into transaction. When it
checks the database, it may find that Susan has also changed the account
objects in the interim. This is the write-write conflict problem.
When a
write-write conflict occurs, the application must be able to detect the
conflict and take the appropriate action. In some applications, we don't
care about write-write conflicts. For example, if two customer support
people were adding comments to the same trouble ticket, the comments could
both be appended, in whatever order, with no harm done. In the above case,
if Susan has changed only the phone number on the account, Joe may be able
to commit his changes with no serious consequences. If Susan deposited
$10,000 as well, and the application overwrites the new balance rather
than adding Joe's deposit to Susan's, Joe stands to lose a lot of money.
If Joe and Susan have both withdrawn money at the same time, they could
overdraw the account. Either way, if the application overlooks or
mishandles the write-write conflict, the result could make Joe and Susan
very unhappy.
Transaction Model Tradeoffs [back to
top] What are the tradeoffs for
each of these modes? In short, long transactions are less scalable, while
short transactions can be more complex to program.
A long
transaction avoids write-write conflicts by "locking" (blocking access to)
objects that the application may change. Other processes have to wait
until the object is unlocked, so the system is less scalable. On the
positive side, building an application with long transactions is
appreciably simpler than using short transactions. The application does
not have to maintain update information, it can simply update the objects
and the database will maintain the transactional context. At commit time,
the database can do all write-write conflict detection.
A short
transaction reads the object in one transaction, does its work, and then
tries to commit its work, running the risk of write-write conflicts.
Therefore, applications using the short transaction model must track
changes to detect and respond to write-write conflicts. (One common
response is to abort the commit, report the error to the user, and cause
rework of the entire database workstep.) They cannot rely on the database
to do transaction control for them. This adds to the complexity of the
application but allows the application to minimize its usage of database
resources. In fact, short transactions are a necessary technique to allow
scalability in large-scale e-commerce applications.
As a
general rule, all but the smallest applications should plan on using short
transactions. For applications of a departmental scale (from tens to
perhaps low hundreds of users), long transactions may be workable.
For
applications built with EJBs, short transactions are the de facto model. EJB declarative transactions are
designed around the use of short transactions. Long transactions are
possible using EJBs, but the application can no longer take advantage of
the built-in EJB transaction management. Instead, the application must use
client-initiated transactions and the UserTransaction interface to manage its own
transactions, and it will pay penalties in system scalability.
Write-Write Conflict Detection [back to
top] So how does an application detect write-write
conflicts? Before trying to commit changes, an application must re-read
the objects read earlier and determine whether they have been changed in
the interim.
There are
at least three techniques to determine whether objects have been
changed:
- Timestamps
- Counters
- State comparison
If
timestamps are used, an object's state information is expanded to contain
a field for the timestamp. As part of the commit process, the application
obtains a fresh timestamp and updates this field. So, when the object's
state is committed into the database, the timestamp field indicates the
time of the commit. To detect a write-write conflict, the application
re-reads the object and compares the timestamp in the newly read object
with the timestamp in the copy originally read. If they are the same, no
write-write conflict exits. If they are different, the conflict does exist
and remedial action must be taken. (Note that this only works if every
application that can change the object also changes the timestamp.)
Another technique, instead of re-reading the object, is to add the
timestamp field along with the primary key to the WHERE clause that is
being used in the UPDATE statement. If no rows are updated, then the row
no longer matches the timestamp (or that primary key was deleted). This is
more efficient than two I/Os to the database.
The
counter approach is quite similar. The object contains a field for the
counter. Each time the object is successfully committed, the count is
incremented. When preparing for the commit, the application re-reads the
objects and compares the current counter value with the one in its
original copy of the object. Again, you must be sure that all applications
which can change the object will increment the counter.
The final
technique is the most complex. The object is re-read and then the state of
the newly re-read object is compared with the original copy. This might
involve comparing a number of fields and is therefore more expensive than
the previous techniques. Additionally, for this to work, the application
must keep its original copy of the object unchanged. This further
complicates the application since it must now accumulate changes to the
object in some other "container" and apply them at commit time. For these
reasons, we recommend timestamps or counters for most situations. A
variation on this technique is to check the attributes of the column for
change; this technique adds overhead but provides the richness of
field-level detection rather than coarser-grained object-level detection.
Object State Distribution
[back to
top] At first glance, object
distribution doesn't seem like it should be much of a problem. You got
objects in the client, you got objects in the server. No big deal. But in
actuality, it's a very big deal. To a large part, it determines the
viability of a distributed multi-user system. Why? Let's explore of some
of the issues.
For an
object to take up residence in a client it must be serialized (all the
data and methods converted to network-transportable chunks) and sent
across the network. This involves three resources in the system. First,
CPU and memory in the server are used to serialize the object and
packetize it for travel across the network. Second, network bandwidth is
used to actually send the packets to the client. Third, CPU and memory in
the client are required to unpacketize, deserialize, and reconstruct the
object graph. (We know, there's probably no such word as "unpacketize."
You know what we mean.) Hence, the movement of an object from server to
client comes at a fairly high cost of system resources.
Let's step
back and look at the problem in the large picture. Suppose a large
e-commerce application is servicing 10,000 concurrent users. Objects must
be serialized and sent to each user. If the average object graph sent to a
client were 3 KB per Web hit, then to service all the clients for one hit,
the system would have to send 30 MB of information. Because large-scale
systems serve many users, relatively small increases in the amount of data
sent to any single client is magnified thousands of times. These seemingly
small increases can have a significant impact on total system throughput.
So, to
minimize serialization overhead, we should always invoke methods from the
client and have them execute using objects on the server, right? Sorry.
It's not quite that simple. The EJB short transaction model basically
precludes the use of remote domain objects (that is, objects that remain
strictly in the server and have their methods invoked remotely from the
client). The problem is that in the short transaction model, objects can
be created and used in a transaction, but once the transaction commits,
the objects are released. The only way to pass a remote reference to an
object is to wrap it in an entity bean. If you do that, you potentially
open yourself up to a long transaction model with all the scalability
problems that that creates.
The
following sections discuss the tradeoffs between serialization and remote
object references and other issues with state distribution, such as
security and schema hiding. Finally, we will present some general
guidelines for object distribution.
Serialization vs. Remote Object References
[back to
top] When an object is serialized
in Java, by default3, it is the transitive closure of the object that is
serialized. (The transitive closure is the complete set of all the objects
to which an object holds references, and all the objects referenced by
those objects, ad infinitum, ad nauseum.) This means that if you're not
careful, you might wind up serializing a lot more data than you intended.
Consider a Person object that has fields for Address and Name. The Address object might contain a State object. The State object itself might be a complex object
that contains demographic information, and it might hold a reference to a
collection of all states to which it belongs, and so on. If you attempt to
serialize the Person object using the default mechanism, you
will wind up serializing (by default) this entire network of objects,
including the collection of all states and all the objects they reference
(see Figure 8.) Clearly, this is undesirable in some cases and must be
managed.

Figure 8: Transitive
Closure
Now,
instead, let's consider the case of using remote objects via entity beans.
Suppose clients can gather data for display by calling services that
return references to entity beans representing Persons and Address objects. For each user interface
widget or HTML tag that displays a single field of information, a remote
invocation on the entity bean would be required to get that field's value.
For the simple case of a person's name and address, six remote invocations
might be required to produce the entire display.
Each
remote invocation comes at some cost: there is a minimum response time to
access even the simplest type of data the cost of a network round trip.
When these response times are aggregated for an entire screen, the result
is a very sluggish, poorly performing user interface. This kind of
"chatty" architecture also places a heavy burden on the application server
and impedes scalability.
Every
application is different, so you must weigh the costs of serialization
versus remote object invocations. If you serialize, make sure you consider
transitive closure and know what you're serializing. If you use remote
invocation, consider the impact of "chattiness" on the network.
Security Issues in Object Distribution
[back to
top] Security also enters into
object distribution considerations. Should some objects never be
distributed out to a client? Should parts of an object graph be pruned of
sensitive data before an object is distributed? Should certain types of
objects only be distributed to certain qualified users? Consider, for
example, a hospital system. Perhaps you want to allow a clerk from
accounting to examine a patient's financial records but disallow any
access to medical records. And even if medical personnel were able to
examine medical records, you might not want to replicate them into the
client for security reasons. All these issues need to be considered and
factored into decisions made about where objects should exist.
Object Distribution and Schema Hiding
[back to
top] Another determinant on
object distribution is the issue of schema hiding. When objects are
serialized and sent to the client, the schema is being exposed. This may
not be appropriate. If the client is not directly dependent on the schema,
then the schema can change without client code impact. This is a
particular advantage when you have an object model that is used by
multiple applications. Changes can be rolled into the schema to support
one application and be completely transparent to another. (Although some
of us think that to achieve true schema hiding, you need to go to a
key/value paradigm such as XML. A topic for another day perhaps.)
One Good Approach to Object Distribution
[back to
top] In a service-based
architecture, the services encapsulate the domain model. This leads to a
very flexible and performant mechanism to handle object distribution: domain object state holders. Here's how they
work.
When a
client invokes a service, the service manipulates the real domain objects
on behalf of the client. What it returns to the client is not necessarily
a domain object. The service can define "state holders4", basically containers into
which relevant information is placed for shipment to the client. These
containers can hold either direct state information or information derived
from domain objects. State holders can be defined for different service
methods whenever and wherever it is inappropriate to serialize and return
domain objects. Once a client is finished with the state holder, it can be
discarded.
Figure 9. Domain Object State
Holders
In
assembling information in a state holder, a service can prune large
objects into smaller subgraphs or remove sensitive information, addressing
scalability and security issues and hiding the schema from the client
application. (Alternatively state holders can be implemented as nested
key/value pair representations la XML). This
gives you the ultimate schema hiding and de-coupling of server object
model from client representation.
General Good Practices for Object Distribution
[back to
top] In a service-based
architecture, the service is the natural mapping point for all object
distribution policies. So the various services in a system not only vend
business behavior, they also become the controllers of object
distribution.
We offer
the following general guidelines for object distribution:
- Distribute as few objects as possible. Be
parsimonious. Each object passed to the client is a cost to the
application. This cost is magnified by the number of concurrent users of
the system.
- Distribute as few remote references (that
is, entity beans) as possible. Each remote reference passed to the
client risks a response time cost.
- Don't serialize large domain object graphs
unless you really need all the state on the client. (And when you do
serialize, be sure you're aware of everything you're serializing.)
- Centralize your object distribution policies
in your services.
- Prune large object graphs and only send
relevant information to the client. For many operations, the client
needs only some basic identity information about the object for
selection purposes. The full-blown state of the object is only needed
for display or editing purposes.
- Do not distribute large collections of
objects. They need to remain on the server while some lightweight
representations or subsets are made available to the client (see
discussion in next section).
- We have found that in most cases, state
holders the best choice for object distribution. But state holders add
complexity to an application they are one more set of classes to
maintain. So whenever possible and appropriate, serialize domain objects
rather than using state holders.
Using Object Identifiers
[back to
top] The object identifier (oid) is an extremely useful
concept in an object system, because they can be a relatively cheap and
performant way to look up objects, and they can help cut down on the
overhead of object distribution. An oid is a simple, unique key, typically
a Java int or long, assigned to a domain object when it is
brought into existence, The oid remains with the object throughout its
lifetime. If the application is built on an RDBMS, the oid can be the
primary key for a table representing an object. Oids can also be used as
foreign keys in tables representing objects containing other objects, and
oids can be sent down to the client as a placeholder for the real object.
(In dealing with a legacy database, the primary key of the table could be
used as an oid.)
The bottom
line is that using oids in an application is a very good idea. Consider
the following problem. You are writing a banking application. As part of
the application, you need to display a list of checking accounts numbers
in a listbox, have the user select one, and then open a window with all
the relevant information about the account that has been selected. How do
you go about this?
Just to
make it interesting, suppose there are 5,000 accounts being managed by
this application. It is immediately clear that it is not desirable to
replicate 5,000 complete account objects onto the client. Instead, this is
the strategy (and, we admit it, there are
other strategies). Create a small class, called an identity holder, that holds two fields: accountNumberString and an oid for that
account object. You then replicate these small, lightweight objects to the
client and have it populate the listbox using the strings. When the user
selects an account number, the corresponding oid is sent to the server in
the invocation of a method that looks up and returns the account object or
its state holder. The application can then paint the screen with the
account information.
An
application needs two resources to use oids effectively: a service that
vends unique oids and a domain object factory. The oid vending service
must guarantee that each oid it hands out is unique, so it may need to
keep some persistent data indicating what oids have already been used. It
is a good idea to use long (64-bit) variables
for oids, because this virtually guarantees that you will never run out of
them. (Remember Y2K!) An int might be
exhausted over the lifetime of a system unless the application took on
the added complexity of recycling oids.
The domain
object factory must be used to manufacture all domain objects. Its
responsibility is to create an instance of the needed type of object and
to assign each one an oid. If all domain objects are created by one
factory, you can guarantee that each will have a unique oid.
Scaling
Techniques [back to top] Earlier in this paper we talked about the transaction
model and how it affects scalability. This section will cover several more
topics related to scalability. Scalability permeates virtually every
design decision made in a system. A thousand small implementation details
done well can add significantly to the overall scalability of a system,
and a few details overlooked can be deadly to system performance.
Therefore, O Reader, be ye vigilant.
Stateful versus Stateless Session Beans
[back to
top] EJB offers two forms
of session beans: stateful and stateless. The choice between these can
significantly affect system scalability. Each has strengths and
weaknesses. Stateful session beans provide a convenient stash for
conversational state in the application. Consequently, they make an
application easier to program. But stateful beans have their dark side.
Because they hold state, they must be mapped one to one with application
clients. Thus, if an application has 10,000 concurrent clients it must
manage 10,000 stateful session beans. How many beans can a container
manage and maintain acceptable performance? Well, more than you might
think, but not too many. To help with this problem, stateful beans have a
complex lifecycle that includes the possibility of the passivation and
activation by the container. But passivation and activation are not
without cost. The bean's state information must be committed into and
recovered from the database, and the bean must be reinstantiated and
populated upon activation.
By
contrast, stateless session beans are more complex to program: if an
application must save conversational state, this information must be
accommodated somewhere else in the application architecture. For example,
the application might commit state information into the database or stash
it in some other artifact in the system (HTTP session state, etc.). The
important advantage of stateless beans is that they can be shared among
many clients. The basic approach is to create a pool of the stateless bean
references. Clients grab a bean reference out of the pool, use it for a
method invocation, and return it to the pool for another client to use.
Using this scheme, 10,000 concurrent users might be served by a few
hundred beans5.
Systems
using stateless or stateful beans will ultimately differ in their ability
to scale. Because stateless beans can be multiplexed across many clients,
systems built on stateless beans can scale to larger number of users.
However, stateful beans are a very legitimate choice for small
applications (those serving a few hundred concurrent users). By contrast,
large-scale applications (those serving many thousands of concurrent
users) clearly need to use stateless beans in order to scale. So what of
the gray area we run into with applications serving a few hundred to a few
thousands clients? At what point do you say that stateful beans are too
costly and move on to stateless? Unfortunately, there is not wide industry
experience with this technology and so the answer is unknown. Over time,
metrics will arise to help guide this decision. In the meantime, perhaps
the best rule of thumb is that if the number of users could possibly
ever, in your wildest dreams grow beyond a few thousand, go stateless.
Entity Beans versus Java Classes [back to
top] A similar decision arises
around how to represent domain objects, with entity beans or simple Java
classes. An entity bean, like a session bean, is a heavyweight, remotable
object that requires the EJB container to manage its lifecycle and
maintain extra artifacts related to its distribution. If an application
has thousands of concurrent users and each user interacts with many tens
of entity beans in typical usage, the application server's Java VMs could
become saturated with entity beans. The server could spend many cycles
activating and passivating these beans to share available memory
resources. Entity beans are very helpful in synchronizing the state of a
domain object representation with persistent storage at well-defined
lifecycle and transaction points. However, they do so at some cost in
server resources.
JSPs and JSP Beans [back to
top] JSPs and JSP beans should be seriously considered
as an alternative to stateful session beans. JSPs nicely partition the
presentation (HTML) and application layers in a system. The JSP contains
the HTML necessary to lay out a page, and makes invocations on JSP beans
to return dynamic content. The JSP beans live in the application layer.
They are a good place to hold onto conversational state outside the
transaction semantics of the EJB container. (HTTP session state is another
good place to do this.)
Connection Pooling [back to
top] Database connections are expensive resources in
any system. Like any expensive resource, you want to get the most out of
them. One consequence of the EJB short transaction model is that database
connections are used only for very brief intervals. This allows you to use
connections very efficiently by multiplexing each connection across many
users. You do this by creating a pool of database connections at system
initialization. When a client needs to access the database, it pulls a
connection from the pool, goes into transaction, and commits the result.
The connection is then returned to the pool for other clients to use.
Connection pooling allows applications to scale to support large numbers
of concurrent users. EJB server/containers such as GemStone/J typically
provide JDBC connection pooling as a matter of course.
Large Query Results [back to
top] Management of query results
presents yet another scalability issue in distributed applications. When a
query is issued (particularly if it is built from user-supplied requests)
there may be no way to anticipate how much data will be returned. The
result set might include a huge amount of data, so you don't want to
blindly replicate the query result of a query down to the client. The best
approach is to divide and conquer through a technique we affectionately
call "chunking". With this technique, the results are wrapped in a
cursored enumeration object, and the client communicates with this object
to obtain what it needs The client is initially sent a "chunk" of results
of a preset size, then it requests more chunks as appropriate.
However,
remember that the EJB short transaction model does not allow the cursored
enumeration object to be remote from the client. Therefore, we need to
wrap the enumeration in a bean and pass back a reference to this bean for
the client to interrogate. What kind of bean should you use? Well, it
depends. If the underlying persistence mechanism is an RDBMS, then the
results will be accessible through a JDBC result set. This is best wrapped
in a stateful session bean. (Note that this requires the associated JDBC
connection to be dedicated to the user throughout the manipulation of the
result set.) If the persistence mechanism is an object database, then the
results would be held in a collection object and could easily be wrapped
in an entity bean6.
Object-to-Relational Mapping Issues
[back to
top] If the persistence mechanism
used for an application is an RDBMS, then you must face the issue of
object-to-relational (O/R) mapping. Objects inherently exist as
interconnected networks. Relational databases must reduce everything to
rows and columns in a table. This is the quintessential problem of O/R
mapping. Conceptually, the problem is clear. But finding solution is more
difficult and subtle than you might think.
There are
two basic approaches to O/R mapping. Fortunately, the choice between them
is obvious, as one is pretty good and the other is very bad. The first
approach is to embed SQL queries and population of object state directly
into business methods. This is a bad thing
because mapping semantics are scattered throughout the code base, so
maintenance becomes a nightmare when the schema rolls. We've seen systems
built this way where the developers were reduced to searching the entire
code base for SQL constructs to figure out what needed to be changed. So
don't go there!
The second
approach is to build (or buy) a framework that provides the services
necessary to implement O/R mapping. The framework partitions the mapping
logic and machinery away from the business logic, and centralizes it in
one place so that it is easier to maintain. This is a good thing. There are several other major issues
to be addressed in O/R mapping:
- The creation and maintenance of metadata
- The detection and management of dirty
objects
- The management of object identity
Managing Metadata: Mapping Objects to Tables
[back to
top] Metadata is the information that describes how a
particular field in a class is mapped into a table and a row in the
database. Metadata can be managed either implicitly or explicitly.
Implicit
mapping is generally implemented using matching naming conventions for the
object and its corresponding table in the database. Suppose you have a
class Person with fields firstName,
lastName, age, sex, and address. This class could be implicitly mapped
into an RDBMS in a table named Person with
columns: named firstName, lastName, age, sex, and address.
The advantage of implicit mapping is that there is no external data to
maintain and it can be automated. (See the FoodSmart application on the
"Developer's Guide" CD for an example of this). There are also several
disadvantages:
- The class and RDBMS schema can become
skewed. This problem can be addressed through automation, using the Java
reflection interface and JDBC's metadata capability to detect and report
skews.
- It is not always possible to have a
one-to-one mapping between classes and tables, particularly if legacy
systems are involved. Also, if you are O/R mapping against a legacy
database, the metadata may be very complex. When you design a database
for a new application, you are likely to design database tables to map
closely to objects. With legacy systems, since the tables were not
originally designed with the object model in mind, state information for
any given class may be scattered across multiple tables.
- Objects that contain other non-primitive
objects and/or collections (especially those representing many-to-many
relationships) complicate the needed mapping. It becomes necessary to
recursively populate such object graphs. (For example, when populating a
Person object, you need to get its Address object, its Employment object, etc.) Furthermore the
issue arises of when to populate sub-objects. Are the child objects
populated at the same time as the parent, or do we defer this until the
children are actually needed?
Explicit
mapping requires that the metadata be declared and stored somewhere (most
likely as more database tables or as static data in your Java classes).
The metadata specifies what column in what table is mapped to what field
in what class. This approach simplifies the mapping task when legacy
systems are involved, because there is no direct correlation between
classes and tables. However, the metadata becomes another system artifact
that must be maintained. Further, you then have to maintain a three-way
synchronization between classes, tables, and metadata.
In
general, we recommend that you use implicit mapping when there's little
likelihood of change and no legacy system, and use explicit mapping
elsewhere. A combination of techniques may also provide a nice compromise.
For example you could have a static method, getMappings(), that defaults to using
introspection/reflection, and can be overridden with an explicit mapping
as required.
Object Dirtiness [back to
top] A second issue that must be
addressed in an O/R mapping framework is object dirtiness. An object is
said to be dirty when a change has been made to the application's copy
since it was first accessed or loaded. (In other words, the application's
view of the object's state is different than the original view taken from
the database.) At commit time, all dirty objects need to have their state
flushed back into the database. (This is the value added by entity beans.)
The
simplistic approach is to flush the state of all persistent objects back
into the database, but this hurts scalability because you are doing
useless rewrites of non-dirty objects. For example, say you have a
collection of 500 products that has been instantiated in the server. A
user changes one product. If you flush back all 500 products, you could be
taking a substantial performance hit. The right solution is to write back
only the dirty object. But how do you manage this?
First,
objects must be able to detect that they are dirty. Second, the
application must be able to track the set of objects dirtied in a
particular user's transaction. Here's one way to do this:
- Go to the parent class of all the domain
objects (oh, by the way, you should have one of these guys and its name
should be something like DomainObject).
- Add a boolean field called isDirty and set its default to false.
- Define a method on DomainObject called markDirty().
- In the method, set isDirty to true and include logic to add
"this" to the dirty pool (we're coming to that).
- Create a class called DirtyPool with a field containing a HashSet. Implement methods on it to add an
object to or remove one or all from the HashSet. (We have put dirty pools in thread
local variables, which seems to work out well.)
- Now, for each domain object, write getter
and setter methods for its public fields. In the setter method, call
markDirty(). The object is then marked and
added to the DirtyPool.
Now, at
any point during execution, all the currently dirty objects are contained
in the DirtyPool. At commit time, the application can
remove the objects from the DirtyPool and flush their state back into the
database. Et voil, we have solved the
problem! (Well, okay, we've given you the basic approach. Admittedly, the
devil is in the details.)
Object Identity [back to
top] Object identity becomes
important in O/R mapping. To understand why, consider the following
example. Suppose you have three objects in the system: A, B, and C. A
and B both have references to C. The application maps in A, and because A
contains C, C
is also instantiated. Next, the application maps in B, and because B also contains C, the application instantiates a new copy of
C. Now, there are two copies of C in the server in two different object
networks (A-C
and B-C.) This is inefficient and the identity of
C has been compromised which is the "real"
C? Depending on the application's design, it
could lead to failed transactions or inconsistent results. We want a
single copy of C
with both A and B referencing it.
We can
create a shared copy of an object by maintaining a registry of persistent
objects that are currently in the server. When an object is brought in by
the O/R mapping machinery, the machinery registers the object in the
registry using its oid. To return to our example above, before loading in
A, we would look up A's oid in the registry to see if A is already present. If it's not present, we
map in A and register it. But now we recognize that
we need C. It's also not in the registry, so we map
and register it. We do the same for B, but in mapping B we check the registry
which recognizes that C
is already present and passes back a reference to it for B. Using this scheme, we
have only a single copy of C
with A and B correctly pointing to
it. Gosh, isn't object identity handy?
A Shameless Commercial Moment [back to
top] Hopefully the discussions
above have given you some feeling for the complexities of building an O/R
mapping framework, and some workable solutions. As object bigots, we
cannot resist noting that if you use an object storage capability, such as
GemStone/J's persistence cache, you totally bypass all these issues.
Where Should You Go from Here?
[back to
top] This paper has touched
briefly on some of the most important design issues in building scalable
distributed object systems. Remember, most of your design challenges come
down to this: if you have a distributed application, things have to
inter-operate. Stimulus comes from one place and response another. There
are two ways to handle this. One is to remotely manipulate objects in the
server. The other is to distribute data and put more responsibility on the
client. Remote invocations are not cheap, but there is also a cost to
passing state out to clients. The trick is to find the balance. So, go
sparingly amid the noise and haste, and make each process do its logical
part. Be parsimonious about handling out remote references. Be
parsimonious about handing out state.
The design
issues we've covered are the tip of the iceberg. So, where do you go from
here? We suggest you use the topics in this paper as a checklist for your
project. Start by reviewing your intended architecture:
- Partition the system into the five layers
we've suggested and consider which objects should live where. Identify
needed services and the service managers they should live in.
- Define the services API.
- Review the nature of your transactions and
decide how (or if) you need to detect write-write conflicts and how you
will do that.
- Examine your domain model and determine what
objects need to be distributed and how. Should you distribute the entire
object? A state holder? A distribution-only object that derives state
from several different domain objects?
Determine the scaling needs of your system:
- How many concurrent users must you support?
- Can you go with stateful beans?
- If you use stateless beans, will you need to
pool them?
- Do you need to pool database connections?
- How will you handle large query results?
This is
all infrastructure needed to scale the system. Identify what you need and
get it built and tested early on.
Next,
consider what will be your object persistence mechanism:
- An RDBMS? An object database?
- If it's an RDBMS, your entire approach to
O/R mapping needs to be thought through. Will you build it? Will you buy
it?
- If you're building on top of legacy
databases, how will you map the needed object state from the existing
tables?
- If you are minting a new database, design
the RDBMS schema to smoothly support the object model.
In Conclusion [back to
top] If you take the time to
think through the answers to these questions, you will be well on the way
to designing a scalable, maintainable J2EE e-commerce system. These
systems are powerful and complex, your business results ride on them, and
you will live with the results of your design decisions for many years to
come. Therefore, there's no design decision so small it's not worthy of
careful consideration. So start thinking....
And if you
need help, just give GemStone Professional Services a call.
Happy
trails!
Glossary [back to
top]
- Dirty object: An
object held by an application, whose state has changed since the it was
first loaded from the database. (In other words, the application's view
of the object's state is different than the original view taken from the
database.)
- Domain: The
real-world environment modeled by a business application.
- Domain object:
Objects representing things in the business domain, such as people,
products, bank accounts, production lines, vehicles, etc.
- HTML: HyperText
Markup Language. A markup language used to control the presentation of
content in a Web page. HTML is an SGML (Standard Generalized Markup
Language) DTD (Document Type Definition).
- HTTP: HyperText
Transfer Protocol. A protocol used to exchange information between a Web
browser and a Web server.
- JavaScript: An
interpreted scripting language which can be embedded in HTML pages to
provide various kinds of behavior. It has a Java-like syntax, hence the
name.
- JSP: Java Server
Page. A specialized form of HTML page that supports tags to call out to
JSP beans to generate dynamic page content. JSPs effectively separate
HTML from the business logic needed to generate dynamic content.
- JSP bean: A
specialized Java bean invoked by a JSP page.
- Metadata:
Information describing how a particular field in a class is mapped
into a table and row in a database.
- Object graph: A
set of connected objects. An object graph starts with a selected object
and contains objects to which that object holds direct or indirect
references.
- OODB:
Object-oriented database. A database that persists objects in their
native form, avoiding the need for object-to-relational mapping.
- RDBMS:
Relational database management system. A database that stores its
information in tables of rows and columns.
- Semantic
validation: Validation that an input value makes sense in the
context of the application. For example, the presentation layer of a
system may check that input to a field of type string (syntactic
validation). The domain layer does semantic validation to confirm that
the string is a valid account number or product code.
- Serialization:
Conversion of an object to a serial stream of data for distribution over
a network.
- Servlet: A Java
program that implements the Java servlet interface. Servlets are run in
servlet engines to produce pages with dynamic content. In contrast to
JSPs, servlets are raw Java and generate their HTML output through
manipulation and concatenation of strings.
- Transitive
closure: The complete set of all the objects to which an object
holds references, either directly or indirectly. (In other words, a
Person object may hold reference to several Child objects, each of which
hold references to School objects, which hold Address objects, etc. The
transitive closure of the Person object includes all of these.)
- Use case: A task
that is part of the business process, for example, renewing insurance
policies in an insurance application.
- Workstep: Work
that is done within the context of a single database transaction. In the
long transaction model, workstep and use case tend to coincide. In the
short transaction model, the use case must be broken into two or more
worksteps.
Bibliography [back to top]
Requirements Definition Bicknell, Barbara and Kris. The Road Map to Repeatable Success: Using QFD to
Implement Change. CRC Press, 1995.
Davis,
Alan. Software Requirements: Objects, Functions,
and States. Prentice Hall PTR, 1993.
Cockburn,
Alistair. "Structuring Use Cases With Goals." Journal of Object-Oriented Programming, Sep./Oct.
1997 [part 1] and Nov./Dec. 1997 [part 2]. http://members.aol.com/acockburn/papers/usecases.htm
Constantine, Larry and Lockwood, Lucy. Software for Use: A Practical Guide to the Models and
Methods of Usage-Centered Design. Addison-Wesley, 1999.
Jacobson,
Ivar et al. Object Oriented Software Engineering:
A Use Case Driven Approach. Addison-Wesley, 1992.
Weigers,
Karl. Software Requirements: A Pragmatic
Approach. Microsoft Press, 1999.
Whitenack,
Bruce. "RAPPeL: A Requirements-Analysis-Process Pattern Language for
Object-Oriented Development" in Pattern Languages
of Program Design (James Coplien and Douglas Schmidt, eds.).
Addison-Wesley, 1995.
Distributed Object Design Alpert, Stephen et al. The
Design Patterns Smalltalk Companion. Addison-Wesley, 1998.
Brown,
Kyle et al. "The Component Design Patterns Pattern Language" on the
WikiWikiWeb. August, 1998. http://c2.com/cgi/wiki?ComponentDesignPatterns http://c2.com/cgi/wiki?WelcomeVisitors
Fowler,
Martin. Analysis Patterns. Addison-Wesley,
1997. (See esp. Chapters 12 and13.)
Howard,
Tim. The Smalltalk Developer's Guide to
VisualWorks. SIGS Books, 1995. (In Chapter 11, Howard enumerates and
discusses the basic characteristics, types, and responsibilities of domain
objects.)
Mowbray,
Thomas et al. CORBA Design Patterns. John
Wiley & Sons, 1997.
Stewart,
R. "Large CORBA Applications." September 2, 1997. (This comp.object.corba
posting and the surrounding thread are retrievable from DejaNews at http://www.deja.com/.)
Domain Model Design Coad,
Peter et al. Java Modeling in Color with UML.
Prentice Hall, 1999.
Fowler,
Martin. Analysis Patterns. Addison-Wesley,
1997.
Wirfs-Brock et al. Designing
Object-Oriented Software. Prentice Hall, 1990.
Software Architecture Coram, Todd and Lee, Jim. "Experiences - A Pattern
Language for User Interface Design" at http://www.mindspring.com/coram/papers/experiences/Experiences.html
Rechtin,
Eberhardt. Systems Architecting: Creating and
Building Complex Systems. Prentice Hall, 1991.
Shaw, Mary
and Garlan, David. Software Architecture:
Perspectives on an Emerging Discipline. Prentice Hall, 1996.
Project Management Booch,
Grady. Object Solutions: Managing the
Object-Oriented Project. Addison-Wesley, 1996.
Coplien,
James. "A Generative Development-Process Pattern Language" in Pattern Languages of Program Design (James
Coplien and DouglasSchmidt, eds.). Addison-Wesley, 1995.
Cunningham, Ward. "EPISODES: A Pattern Language of
Competitive Development" in Pattern Languages of
Program Design 2 (John Vlissides et. al, eds). Addison-Wesley, 1995.
Goldberg,
Adele and Rubin, Kenny. Succeeding with Objects:
Decision Frameworks for Project Management. Addison-Wesley, 1995.
Yourdon,
Ed. Death March. Prentice Hall PTR,
1997.
Design Patterns and Other Forms of Knowledge Smith, David and Coplien, James. "When to Use Pattern
Form" on the WikiWikiWeb. November 1, 1998. http://c2.com/cgi/wiki?WhenToUsePatternForm http://c2.com/cgi/wiki?WelcomeVisitors
Roll Credits....
[back to
top]
The Advanced Application Architecture Team will be bringing
more good stuff your way soon. Visit the GemStone Web site and order the
Developer's Guide.
We will be publishing more design information and other developer goodies
on a continual basis. Soon our entire pattern language will be available
on-line.
One last
note about our team: we're consulting road warriors, and we like to help
people build good systems. So call us any time. Maybe we'll discover some
new best design practices together.
Core Team Dave
Muirhead Anita Osterhaug Colleen Roe Randy
Stafford Alan Strait
Contributing Team Members Sergio Gonik Diane Levin
Ermine Todd
External Contributors Martin Fowler Alan McKean Brian Wilkerson Rebecca
Wirfs-Brock
Valued Kibitzersa Bruce
Ochanderena Chris Raber
Team Mascot BooBoo (a
GemStone family dog)
For More Information [back to
top] For more information about
GemStone/J, please call 800-243-9369 or e-mail
info@gemstone.com.
Footnotes [back to
top]
1You can get a copy of, "A
Developer's Guide to iCommerce Success with J2EE", at GemStone's
Web site. (Back to #1)
2In the database world,
there is another meaning sometimes associated with the phrase "'long
transaction". In that case, it refers to a mode of operation where data is
essentially checked out of the database to be worked on for an extended
period of time, perhaps days or weeks. An example of this might be a CAD
system wherein users check out circuit board designs. (Back to #2)
3The programmer can
change the serialization scheme on a class-by-class basis by redefining
readObject() and writeObject(). However, each
object has one and only one serialization behavior. (Back to #3)
4Other names commonly
associated with such objects are "data transfer objects" or "replicates".
Monson Heafel calls them "bulk accessors". (Back to #4)
5There are many choices
as to where to save state, each with varying degrees of scalability.
Perhaps the most scalable solution is to push state out to the client and
pass it back in to the server when you need it. (This is also the hardest
to implement.) The worst case is that the saved state has to be
persistent. (For example, Amazon.com saves the contents of your shopping
cart for 30 days.) What's important is not to save state if you don't have
to. If you do have to save state, don't make the saving transactional, if
you can avoid it. (Back to #5)
6An alternative approach is to re-issue the query,
picking up from the last item streamed to the client on the previous call.
This is a more stateless approach. However, queries can also be very
expensive, especially if you have tens of thousands of clients doing
repeated queries. (Back to #6)
Copyright
2000, GemStone Systems, Inc.
|