Home

News: Opinion: Martin Fowler's First Law of Distribution

  1. Dan Creswell agrees and disagrees with Martin Fowler's first law of distribution (don't). He notes that network round-trips aren't always the performance bottleneck. Dan gives us his thoughts on distributed systems, performance and scalability.

    Excerpt
    I've implemented database systems and written many a network application and here's an interesting fact:

    Network round-trips are often considerably less costly than the time taken for a transactional database operation due to the need to forcibly log transactional operations which is very costly in terms of disk performance. i.e. network round-trips aren't always the performance bottleneck.
    Read Dan Creswell in Martin Fowler's First Law of Distribution

    Threaded Messages (77)

  2. Mr Fowler demonstrates a further lack of understanding by stating that his preferred solution is to write all code as local code with nice interfaces to run on one JVM and then use clustering to get scaling
    and
    And that means we're back to doing network round-trips for co-ordination purposes. So how much performance did we gain
    Dude ,
    1.In most of the applications this coordination that you are talking about can be avoided . For example in a call center applciation. It is very unlikely that two agents are working on the same customer. (and if they are you basically restrict them). Use something like a Network Dispatcher to make all the calls to the same instance.
    2. We do not need Distributed Objects. We need Distributed applications. What is the point of distributing a Billing Amount Calculating Component(which could be very chatty in nature). You would rather distribute the Billing Application which constitutes more of a business proceess than just calculating a particular amount .

    2a) . MY personal experience is as follows
        a) I started off with thinking Billing Amount Calculator Component as a distributed componenent and proceeded with the design
        b) Over the iterations I realized that I was adding more and more work to the Billing Amount Calculator just to save up on performance. Thie work was not necessary 100% the job of the Calculator.
        c) It turned out that the Billing Amount Calculator interface was now more of a Billing Application Interface (the component was now taking care of Business process also)
  3. "In most of the applications this coordination that you are talking about can be avoided . For example in a call center applciation. It is very unlikely that two agents are working on the same customer. (and if they are you basically restrict them). Use something like a Network Dispatcher to make all the calls to the same instance."

    What you're describing here is a partition'ing of the data which is what I described with respect to splitting across databases. This isn't clustering because you're not trying to create the image of a single large machine running an application. You've built into your architecture a recognition of separate machines.

    "We do not need Distributed Objects. We need Distributed applications. What is the point of distributing a Billing Amount Calculating Component(which could be very chatty in nature). You would rather distribute the Billing Application which constitutes more of a business proceess than just calculating a particular amount."

    I wouldn't begin to suggest that you distribute objects if they're going to be chatty (nor did I in my original posting). And sure you'd like to "distribute your application" but now you're back to relying on the underlying platform to figure out how to get best performance and this isn't always a good thing.

    The real point of what I was saying is not that "zero distribution is bad" nor was I saying "full distribution is good". I was saying that there's a middle ground and I believe that's where the best solution lies.
  4. The real point of what I was saying is not that "zero distribution is bad" nor was I saying "full distribution is good". I was saying that there's a middle ground and I believe that's where the best solution lies.
    Then what is the disconnect between what you are saying and what martin is saying. I think Martin is saying the same thing.

    Martin , Am I right???
  5. The real point of what I was saying is not that "zero distribution is bad" nor was I saying "full distribution is good". I was saying that there's a middle ground and I believe that's where the best solution lies.
    Then what is the disconnect between what you are saying and what martin is saying. I think Martin is saying the same thing. Martin , Am I right???Quite simply, I felt that Martin's reasoning on the subject was a little too focused on network roundtrips as the core design centre. To be fair, he may not have presented all his opinions on the subject (and I'd love to hear them).
  6. Then what is the disconnect between what you are saying and what martin is saying. I think Martin is saying the same thing. Martin , Am I right???
    Sorry, reposting cos the formatting didn't work out too nice:

    Quite simply, I felt that Martin's reasoning on the subject was a little too focused on network roundtrips as the core design centre. To be fair, he may not have presented all his opinions on the subject (and I'd love to hear them).
  7. The real point of what I was saying is not that "zero distribution is bad" nor was I saying "full distribution is good". I was saying that there's a middle ground and I believe that's where the best solution lies.

    That is what Martin Fawler is advocating too. He explicitaly says: distribute only when you must, for what ever the reason. And always mesure performance before you say something is fast or slow. And make you app testable (which is certenly at least a bit harder if you distribute).
    And you will not need to distibute for about 95% of applications.

    Mileta
  8. That is what Martin Fawler is advocating too. He explicitaly says: distribute only when you must, for what ever the reason. And always mesure performance before you say something is fast or slow. And make you app testable (which is certenly at least a bit harder if you distribute).And you will not need to distibute for about 95% of applications.Mileta
    Sure, what Martin doesn't really make mention of though is if you make the decision to do things locally and code in that style, what do you do if you find you have to go distributed?

    As I said, you don't make the whole app distributed - maybe none of it to start with but you probably want to put in some abstraction where you think it might happen so as to make things easier later on.
  9. Sure, what Martin doesn't really make mention of though is if you make the decision to do things locally and code in that style, what do you do if you find you have to go distributed?As I said, you don't make the whole app distributed - maybe none of it to start with but you probably want to put in some abstraction where you think it might happen so as to make things easier later on.
    You can refer to one of Fowler's other mantras where he says not to write code you may never use. He would most likely treat a distributed solution as a separate project and re-design accordingly.
  10. You can refer to one of Fowler's other mantras where he says not to write code you may never use. He would most likely treat a distributed solution as a separate project and re-design accordingly.
    And that mantra is one I agree with however, rather than risk putting (more?) words in his mouth, I'd like to know from Martin himself whether he would apply it in this situation or do something else?
  11. Performance is not as simple as just deciding whether to distribute or not. Distributing processing components depends largely on the domain and the nature of interactions between the various comonents that are NOT co-located. While a single network round trip can be measured in milli-seconds, and may not seem much when considering a 2-second user respond time, the scaling cost can be very high. Especially when there are multiple communciation hops. Every hop will result in context switches, latencies and concurrency aggravation, resulting in disproportionate non-linear degradation in performance.

    So the key consideration for distribution is if the application domain contains clearly independant processing (with data access local to each domain- even if in a dedicated DB server) with minimal interaction across distributed components.

    [Some more thoughts here.. ]

    Cheers,
    Ramesh
  12. Performance is not as simple as just deciding whether to distribute or not. Distributing processing components depends largely on the domain and the nature of interactions between the various comonents that are NOT co-located. While a single network round trip can be measured in milli-seconds, and may not seem much when considering a 2-second user respond time, the scaling cost can be very high. Especially when there are multiple communciation hops. Every hop will result in context switches, latencies and concurrency aggravation, resulting in disproportionate non-linear degradation in performance.
    Ooops, I didn't intend that you interpreted those thoughts (2 second user-response time etc) as absolutes. They were starting points for thinking beyond basic reasoning about distributed vs non-distributed based purely on round-trips. I'd also say that whilst the factors you list are important they too are situation dependent. There are no guarentees or rules that work in all situations.
  13. You can refer to one of Fowler's other mantras where he says not to write code you may never use. He would most likely treat a distributed solution as a separate project and re-design accordingly.
    Yep! Write the same project twice and get double the consulting dollars!!!

    Want it distributed? Oh, that's another 6 month project. Oh, now you want to support two different database vendors? We told you we were YAGNI people, so that's another 6 month project to undo all that vendor-specific code. What's that, you want templates for the web site? Oh damn, we do println() to the HTTP request sockets for all of your look and feel, and that's all hard coded classes - but in 6 months we can refactor that too!

    Yes, the above is over the top and exagerrating things a bit, but this is the mindset that developers are being sold on. The problem is that simplistic solutions sometimes are horrendously costly to change after the fact if you in fact ultimately need something more complex. But it is a nice salesmanship technique - analyze it a bit, and you realize what Fowler and others are doing are taking failures and making them look like wins. This makes developers feel good and "empowered". Too bad it's most often done by screwing the business.

        -Mike
  14. Yep! Write the same project twice and get double the consulting dollars!!!Want it distributed? Oh, that's another 6 month project. Oh, now you want to support two different database vendors? We told you we were YAGNI people, so that's another 6 month project to undo all that vendor-specific code. What's that, you want templates for the web site? Oh damn, we do println() to the HTTP request sockets for all of your look and feel, and that's all hard coded classes - but in 6 months we can refactor that too!Yes, the above is over the top and exagerrating things a bit, but this is the mindset that developers are being sold on. The problem is that simplistic solutions sometimes are horrendously costly to change after the fact if you in fact ultimately need something more complex. But it is a nice salesmanship technique - analyze it a bit, and you realize what Fowler and others are doing are taking failures and making them look like wins. This makes developers feel good and "empowered". Too bad it's most often done by screwing the business.    
    -Mike
    Not that Fowler needs to be defended in any way but I think he’s been quoted a bit out the context. If you read the book where he introduces this “law” (or some articles where he quotes almost verbatim his book) he basically defines this “law” in response to a real scenario of obnoxiously over-distributed system. So, perceived simplicity of this “law” is intended as a sarcastic sort of counter balance to stupid over-distribution.

    Granted, for most of us on TSS these cases are rather obvious. But I would doubt that Fowler “just doesn’t get it” or engages into smelly sales technique. His books, obviously, are on popularizing side of equation but their have their own massive readership.

    In general, I found many Fowler’s ideas interesting and worthy.

    Regards,
    Nikita Ivanov
    xTier - Service Oriented Technology
  15. balanced response[ Go to top ]

    After reading the article, I find it well balanced. Avoiding distributed objects because it's "hard" to design or build it not a good excuse to not use it. Using a piece of technology without a clear understanding of the technical and business requirements is often the cause of bad designs and poor performance. Computer Science programs really should tech students a few basic skills about how to interpret business requirements and how it may translate to technical/architecture designs. I realize schools aren't trade schools, but I'm not talking about learning a trade. More precisely, the critical thinking skills to deconstruct two requirements which may appear contradictory to come to a clearer understanding. After all, the scientific method prescribes a method for analyzing and understanding a problem. The same techniques can be applied to development problems resulting from human issues. One could argue, without the human factor, science isn't really all that useful.
  16. balanced response[ Go to top ]

    After reading the article, I find it well balanced. Avoiding distributed objects because it's "hard" to design or build it not a good excuse to not use it. Using a piece of technology without a clear understanding of the technical and business requirements is often the cause of bad designs and poor performance.
    Fowler's argument (which I agree to) is just the other way around: why bother with pontentially hard to design trendy distributed system if a colocated, clustered system would do the same job for you.

    Note that Fowler is not saying "don't distribute!". He's just asking you to think twice before deciding that your application will be distributed. Weight the pros and cons, understand the technical and business requirements and just then, decide if you really need a distributed application. Not doing that will sure be the cause of a bad design and poor performance, as you say.
  17. Dan:
    What happens when one server had a lock which we're blocking on and the server holding the lock dies? We'll be left waiting until something resets that lock (is it done automatically or manually and how soon will it be done?).
    in distributed cache/db/object-store, the locks are generally replicated to at least one other node if not all. so if the server(node) holding the lock dies, backup node will hold it. moreover, it is very common method that lock will have timeout so you acquire a lock for certain time and if you need more time you will have to renew the lock before it expires. this is same as leasing in Jini world. nothing special about it. common practise or at least it should be so.

    it all depends as usual. my first law of distribution is "don't distribute, if you don't and won't have to".

    Talip Ozturk
  18. my first law of distribution is "don't distribute, if you don't and won't have to".Talip Ozturk
    Agreed, but you can't always make that prediction ("won't have to").
  19. my first law of distribution is "don't distribute, if you don't and won't have to".Talip Ozturk
    Agreed, but you can't always make that prediction ("won't have to").
    If I remember the words in PEAA - Fowler suggests that you should sell you favourite grandma before you distribute objects (Note: his patterns for when you do need to distribute are about colaescing either function or data into faceades or DTO's - so essentially your objects become coarser grained and start to look like distributed _application_ interfaces). However in reality we don't always have that luxury (hell my grandparents aren't even alive! ;) )

    I think you should design and implement locally if your application is fairly simplistic, with rigid process boundaries - but after that you should at least design remotely, even if the initial implementation is local.

    At a startpoint of any given project , I could say it can all be done locally, but can I say the same thing in 6 months time, with a stack of enhancement requests on my desk?

    One thing I look at is that in system integration, you can have all your objects on a single box, but the system they ultimately talk to are not. Plus many projects, especially those that start off being designed for a few people, often are a victim of their own success - what happens when your application now needs to handle the whole department, muiltiple departments or even multiple sites.

    I don't believe that there really is a choice (or a fork in the road if you will) that is go local or go distributed - often the needs of a local application through maintenance or higher usage increase, which in turn forces an evolutionary response to make the application distributed.

    Yes, understanding and designing for distribution is more complex than local, but your local application will _always_ be bounded by the vertical scaling of the host - if you have the option to make an application distributed without a huge amount of re-design then you have to at least think about it, and dismiss it on better grounds than latency, serialization and performance.

    I wrote a similar piece at Artima - http://www.artima.com/weblogs/viewpost.jsp?thread=44530

    Cheers

    Calum
  20. Dan, I happen to agree with your point of view and thought it was a well-written piece. Much of Fowler's rules, advice, and examples tend to be simplistic and focus on just one or two pieces of a complex problem.

    On distributed systems - Fowler most often talks about simple problems and solution spaces i.e. the bread and butter of what XP is all about. In that domain, what he says is true - if your application is as simple as he describes, then you get nothing going distributed and buy alot of headaches.

    The problem is that Fowler gives no useful advice or analysis if your application happens to be a bit more complex than XP is comfortable with. You might say he's a bit of a one trick pony on this subject "Uh, network calls are slow and can lead to exceptions". Yeah - that's really insightful.

    And as someone else on this thread noticed - if you take a Fowler bias and say "When in doubt, don't distribute", you're basically screwed if you were wrong. For larger applications, it's often better to design and code as-if you were distributed - even if you happen to be running on one JVM right now.

    In a nutshell, Dan's showing what happens when reality collides with simple advice that has no comprehensive analysis/justification behind it. Little blurbs that draw web traffic, book deals, and seminars don't do all that well in the harsh light of getting a job done.

    People would be much better off learning basic analysis and design skills, and then applying those skills to their specific environments and applications than listening to lovely slogans and generalities.

         -Mike

    P.S. Fowler's advice "always measure" and similar statements aren't common sense wisdom imparted to others - they're a literary escape hatch. It's a convenient way to look wise with little effort. "Here are my Laws of Computing - and here's my escape hatch which says that if my advice doesn't work, you were too dumb to adapt it to your needs" :-)
  21. We have a little tweak to the whole distributed object notion that avoids many of the discussed problems. Let me explain with a simple example, which would apply to our CMS product.

    Let's say you have a Site object, and it has the following methods:
    setHostname(String)
    getHostname()
    setTitle(String)
    getTitle()
    search(String)
    --
    In this case the four first methods are related to state, and if you have a Site object in a client (assuming a client/server setup) then you wouldn't want it to be a distributed object since it would cause too many distributed calls. At the same time you want search() to be a remote call, because the search engine is only available on the server.

    The solution to apparent problem is to use AOP to split the object into pieces, each of which is either distributed or not distributed. In this case you'd put the getters and setters (=state related) in one introduction and the search method (=behaviour/service related) in one introduction. You could then use a Site object like so:
    Site site = server.getSite(); // Get reference to site object
    String title = site.getTitle(); // Get state from server and invoke getter
    site.setHostname("www.somename.com"); // Set state locally
    site.setTitle(title+" - "+site.getHostname()); // Get and update state locally
    Set pages = site.search("help"); // Do remote call and find "help" pages
    commit(); // Send locally modified state to server in order to sync
    ---
    This is a VERY simplified example, but I hope the idea is clear. When considering whether to distribute or not distribute, you simply do both: distribute services and don't distribute state, even if one single object logically contains both. By using AOP to do the underlying magic the client can be fairly oblivious to this.

    We are using this model for our entire product, and it is working very well. There's obviously lots more detail with regard to how the state is handled, loaded and synchronized, but the gist of it is as above. One interesting detail is that this model maps *very* well to the SDO ideas put forth by BEA/IBM.
  22. Separate state and service[ Go to top ]

    You bring out a good point... when developing a potentially distributed app, separate state and service. That is very good advice, especially when you decide that you need to distribute state too! Why would anyone ever decide to this?? Dynamic and unattended service failover and/or load balancing.
  23. AOP & Distribution[ Go to top ]

    The solution to apparent problem is to use AOP to split the object into pieces, each of which is either distributed or not distributed. In this case you'd put the getters and setters (=state related) in one introduction and the search method (=behaviour/service related) in one introduction.

    When considering whether to distribute or not distribute, you simply do both: distribute services and don't distribute state, even if one single object logically contains both. By using AOP to do the underlying magic the client can be fairly oblivious to this.
    Rickard, I agree with you that AOP introductions are great way solving this. Without AOP though, do you think we can create the same effect by using smart proxies where proxy will make some of the calls locally and some remotely? in your example, Site object could be implemented as a smart proxy object where setters and getters make local calls to the manipulate the state, whereas search makes a remote call. you are using the same Site object, which is a smart proxy, for both remote and local calls.

    Talip Ozturk
  24. AOP & Distribution[ Go to top ]

    Rickard, I agree with you that AOP introductions are great way solving this. Without AOP though, do you think we can create the same effect by using smart proxies where proxy will make some of the calls locally and some remotely? in your example, Site object could be implemented as a smart proxy object where setters and getters make local calls to the manipulate the state, whereas search makes a remote call. you are using the same Site object, which is a smart proxy, for both remote and local calls.Talip Ozturk
    In effect our AOP implementation is a souped up proxy system, so yes, absolutely.

    However, a normal proxy system typically deals with only one interface at a time, and with AOP/introductions a single object could have multiple interfaces. Our objects typically implement *AT LEAST* 5 interfaces, each backed by a separate introduction/POJO. In the given example, the search() method would be extracted into a separate SearchService interface that we can apply to Page, Folder, Site, Server, Cluster, and Matrix, in order to scope the search accordingly. Our experience with designing this way is that clients very very rarely care about object type (i.e. if it's a Site,Page, etc.), and are more interested in the implemented interfaces (i.e. if it's a SearchService I can use it to search). So, once you get to the point where one single interface doesn't work, you're almost in AOP land anyway.
  25. This is a VERY simplified example, but I hope the idea is clear. When considering whether to distribute or not distribute, you simply do both: distribute services and don't distribute state, even if one single object logically contains both.
    Not sure if Rickard meant this: when distributing service, the service may use its own state and persisted data. In other words, the application's "state" is also 'distributed'- ALL the state is NOT in one place; just the processing of any given 'state' should be 'local' to where the state is maintained. Cost of acessing a remote service is lower than managing replicated state.

    Cheers,
    <href="http://www.jroller.com/page/rameshl"> Ramesh </a>
  26. Not sure if Rickard meant this: when distributing service, the service may use its own state and persisted data. In other words, the application's "state" is also 'distributed'- ALL the state is NOT in one place; just the processing of any given 'state' should be 'local' to where the state is maintained. Cost of acessing a remote service is lower than managing replicated state. Cheers,<href="http://www.jroller.com/page/rameshl"> Ramesh </a>
    Yes, processing should be as local as possible, and sometimes that means calling a distributed service (e.g. site.search()) and sometimes it means invoking methods on local objects (e.g. site.setTitle()).

    To go one step further, we have the option of making local state seem remote. If local state has been modified, those modifications are seen by remote services if they have not yet been commited to the remote server. This doesn't work for all remote services, but for some.

    We use it, for example, when some portlet configurations on a page have been changed, and needs to be re-rendered on the server to show these changes. We do a service invocation to the server to render the portlet, and pass along the changed state so that the rendering reflects the local changes.
  27. You can also use Data Transfer Object pattern to manipulate state locally and then synchronize local state with global one with one remote call.
  28. Yes, processing should be as local as possible, and sometimes that means calling a distributed service (e.g. site.search()) and sometimes it means invoking methods on local objects (e.g. site.setTitle()).To go one step further, we have the option of making local state seem remote. If local state has been modified, those modifications are seen by remote services if they have not yet been commited to the remote server.
    As I wrote earlier today, when the application is such where it contains very clearly partitioned domains (each with self contained processing and data) then there is a straight forward case for distribution. In most applications though, there may not be such a natural case for distribution. Then vanilla clustering may be the only scaling option available. Am curious to know if there are widespread cases out there where distribution of components was possible?

    Cheers,
    Ramesh
  29. As I wrote earlier today, when the application is such where it contains very clearly partitioned domains (each with self contained processing and data) then there is a straight forward case for distribution. In most applications though, there may not be such a natural case for distribution. Then vanilla clustering may be the only scaling option available. Am curious to know if there are widespread cases out there where distribution of components was possible? Cheers,Ramesh
    One of the things Martin alludes to is that it's always possible to distribute components regardless of whether it's good or bad. And I agree wholeheartedly with this. The difference of opinion lies in the factors we'd consider when making related design decisions and, possibly, our overall mindset on how/when to distribute.

    Now, whilst I think your question is interesting, I think there's another one we should ask at the same time which is: Are there widespread cases where you felt you had to adopt some measure of distributed implementation and if so, why?
  30. What if you have to Switch?[ Go to top ]

    My thoughts on this topic:

    1) It is much easier for a distributed system to be deployed on just one server than for a non-distributed system to become distributed.
    2) Most non-consulting projects (apps, services) will eventually need to be distributed. There is always that "one" customer with huge data requirements, or complicated process. Obviously, this does not hold true for consulting projects that solve a specific issue for a specific customer, and the IP is owned by that customer.

    ergo

    3) It saves a lot more time and money by thinking about distribution up front. Not necessarily program in all the infrastructure to do so, but to make sure that there is a good strategy to do so. Otherwise you're just asking for it.

    Jeff
  31. What if you have to Switch?[ Go to top ]

    Not necessarily program in all the infrastructure to do so, but to make sure that there is a good strategy to do so.
    You should apply YAGNI but make sure you have a matching exit strategy for any of the YAGNI decisions you make.
    will eventually need to be distributed
    Distribution is a pain - so keep it to a minimum. Latency and Serialization cost can often undo any "scalability" gain that you think you might get.

    -Nick
  32. What if you have to Switch?[ Go to top ]

    Distribution is a pain - so keep it to a minimum. Latency and Serialization cost can often undo any "scalability" gain that you think you might get.-Nick
    As an aside, there's a great discussion with respect to serialization and optimization here
  33. What if you have to Switch?[ Go to top ]

    Distribution is a pain - so keep it to a minimum. Latency and Serialization cost can often undo any "scalability" gain that you think you might get.
    For people who have been distributing computing for a long time it's not a pain - it's a natural way of working. It's not all that difficult if you've been there before and you know how to design that way.

    And I might point ou that you seem to be confusing scability with performance - they are two very different beasties. Introducing distributed components or services in one form or another can lower your per-request performance but boost your overall scalability. Yeah, serialization may increase processing times per request, and network latency can do the same. But that's per-request performance and response times. Scalability means something entirely different.

    Scalability means that you can deploy the app on a small box and support X users with a given average response time. And then you can "scale up" to a bigger box, or multiple boxers, and support X*Y users with the same average response time.

    Yes, you can sometimes get scalability by doing a pure horizontal clustering and relying on vendors to do the distribution for you. But that can also be a very expensive approach in terms of hardware and software licenses. And even so, developers still have to develop knowing that they're in a distributed environment - evil little singletons will often kill you even though they don't directly have anything to do with network connections :-/

    And of course, there's always the title of this sub-thread: what if you have to switch? I can tell you it's a bitch to do.

        -Mike
  34. I think his statements are getting out of context. I think the root of it is using things like Remote EJB's inside a Web Application. Use Local Interfaces instead, even though he is trying to make this statement as product agnostic as possible.

    Also, distributed data caching and session cachin usually use different techniques. Session caches are usually replicated accross clusters, this we are often told to keep Session Data small. Distributed Data caches work differently if done correctly. Data caches should be refresehed from the source and asynchronously and lazily. So if cluster A gets some data cache updated, some notification marking data as dirty in the other cluster members will require a reload of the data

    For data that gets updated frequently, you have no choice but to go to the database. Data caches are usually not used.
  35. Understand the Need completly[ Go to top ]

    It looks like every time somebody thinks of something they want to re-invent
    the solution.
    The need for local interfacing code is different from the need for distributed
    or remote interfacing (refer to EJB 2.0 specs).
    If I am right Fowler is thinking of something in between. Before going that far
    the need is simple, distributed applications have to work on remote interfacing
    they have no other choice. Network overload is obvious but there are creative ways to deal with it.

    I have a requirement, we are building a web based retail product, that can work with in the stores as well as central HQ.
    If you look at the need, it needs local interfacing as well as remote.
    There are no 2 ways about it but to build with EJB 2.0 that allows both.

    Martin stop working for Microsoft .NET stuff. I have seen you talking these
    things even in Architecture forum that was in Bangalore India.

    True technologists talk neutral as people know technology is a tool to implement
    business ideas.

    -Hari.
  36. Despite you being a bit arrogant, I believe that Martin Fowler deserves to gather in his book a great bunch of experience, which is often goof to follow in entreprise application projects.
    - Martin Fowler is saying "Don't distribute objects unless you need to .". Actualy, who needs to deploy objects on differents physical machines ? To me, few applications need to.
    - Regarding the network round trip, please think of distributed transactions. If a transaction in an application server spans accross objects on different physical computers, you need to handle XA transaction between the objects. It is always better to handle "local" transaction in J2EE or .NET than distributed transactions. This is where the network round trip comes from !
  37. Actualy, who needs to deploy objects on differents physical machines ? To me, few applications need to.
    Anybody involved in systems or legacy integration, people who run on commodity wintel servers, not large x-way SUN/IBM/HP (et al) boxes,

    When you use a Stored Procedure you are distributing your processing, yet in a large percentage of cases, Stored Procedures can be re-written in Java and executed locally. So if I can run them locally, by the first law, I should - so why don't I?
    Regarding the network round trip, please think of distributed transactions.
    But in some case you can't get away from them. Think of SOA, Grids and all the other things that are buzzing around at the moment. If you want a transaction in SOA, you had best design it to be a distributed transaction, even if you're 99% sure it's going to be local.

    You are correct that Distributed Transactions do indeed increase the amount of network handshakes required in a method call - but the benefit that they give ( disparate systems not being out of sync ) is much more important to me than adding 100ms to my response time because of extra network activity. (the human eye takes between 300 and 400ms to blink, if I remember correctly).

    Calum
  38. Despite you being a bit arrogant, I believe that Martin Fowler deserves to gather in his book a great bunch of experience, which is often goof to follow in entreprise application projects.
    relax... there is too much stress in life already.
    Actualy, who needs to deploy objects on differents physical machines ? To me, few applications need to.
    to you yes.. to me, none of the clients i work with deploys an app to one machine. some also deploys to over 50 machines.
    It is always better to handle "local" transaction in J2EE or .NET than distributed transactions. This is where the network round trip comes from !
    first of all, xa is not only for physically distribued applications. secondly, if an application needs to synchronize its state with other nodes, then state synchronization is the most network consumer because it involves distributed locks, recovery, state replication. sending prepare, commit, rollback tokens doesn't hurt network as much but i agree that it is more blocking operation so it is time consuming. no one is saying distribution performs much better than local (although parallel processing performs much better than local.. but it is another subject) and no one is claiming that network round trips are good, but it is OK if the return is higher. if distribution cannot be avoided and it is all you need then you will have to pay the price of course... it think this is quite reasonable.
  39. Despite you being a bit arrogant, I believe that Martin Fowler deserves to gather in his book a great bunch of experience, which is often goof to follow in entreprise application projects.
    I find that arrogance can be easily tolerated when someone is being correct - as I think the case is here with Dan. Fowler has some good ideas and is a good writer, but he also says some really stupid things. And the _reality_ is that his background, like many authors, is primarily rooted in quick get-in-get-out consulting gigs. Do you honestly think Fowler lives with a single application or set of applications year in and year out? No, sorry, he doesn't. He gets to leave after his 1/3/6 months is up.

    In this particular context, I think Fowler's experience is neglible and people with alot more experience with real-life full-cycle distributed computing are making alot more sense. Get a grip - Fowler wrote a book, he's not God incarnate.
    - Martin Fowler is saying "Don't distribute objects unless you need to .". Actualy, who needs to deploy objects on differents physical machines ? To me, few applications need to.
    Distributed computing in general is a reality. Are your clients on the same machine as your server? Is your app server linked into your database? No, they're not. The reality of software development today is that you have many processes intercommunicating. Since it is the reality, developers shouldn't run away from it screaming and feeling unworthy. Rather than fear the technology they should strive to understand it.

    You could make a very convincing case that what Fowler's really saying is that most developers are too stupid to write distributed code.
    - Regarding the network round trip, please think of distributed transactions. If a transaction in an application server spans accross objects on different physical computers, you need to handle XA transaction between the objects. It is always better to handle "local" transaction in J2EE or .NET than distributed transactions. This is where the network round trip comes from !
    I don't think you understand XA very well - network round trips are the least of your problems there.

    And I'd add in "never say never and always avoid saying always". When you say "It is always better to handle 'local' transaction"....well, you've just nuked a number of applications that use XA because they have to.

        -Mike
  40. Distributed computing in general is a reality. Are your clients on the same machine as your server? Is your app server linked into your database? No, they're not. The reality of software development today is that you have many processes intercommunicating. Since it is the reality, developers shouldn't run away from it screaming and feeling unworthy. Rather than fear the technology they should strive to understand it.
    "First Law of Distributed Object Design: Don’t distribute your OBJECTS"

    I understand that this refers to distributed objects (RPC, CORBA, RMI, EJB), not distributed services or layers (TCP/IP, Databases, browsers). They are very different things.

    Do you really think that we should design everything as distributed objects (coarse-grained calls, DTOs, WebServices, remote exceptions, whatever) and **** with all the object-oriented design principles?
    You could make a very convincing case that what Fowler's really saying is that most developers are too stupid to write distributed code.
    Well, in my opinion, he would be right if that was what he is trying to say.
    Most developers don't know how to write distributed code, most developers don't know how to write multithread systems, most developers don't know how to design decoupled systems, most developers don't know even how to properly handle exceptions.

    "When in doubt, don't distribute (objects)" RIGHT! If you can't see a reason to make your objects distributed, why waste the effort? Why use RemoteEJBs if the software will run on a single machine (and no perspective to do it in a clustered environment)? Designing and coding distributed objects makes the job muuuuuuch more complex. Even if it is needed, the distribuition should be made through well defined interfaces (layers, subsystems, remote services), designed to be distributed, not the whole system (internal objects).

    About Dan, he misses the point when he says "This difference in granularity is introduced because of a need to avoid network round trips". That is not the only reason, and (I think) not even the most important for avoiding distribution of objects. Designing distributed systems involves DISTRIBUTED STATE, that can cause a hell of problems and an explosion of complexity.
    It's not about "saving some bytes" of bandwidth, it is about consistency, simplicity (simplicity as 'avoiding unnecessary complexity'), and productivity.
  41. About Dan, he misses the point when he says "This difference in granularity is introduced because of a need to avoid network round trips".
    No, that was one of Martin Fowler's reasons for why we end up with coarse versus fine granularity - not mine.
  42. About Dan, he misses the point when he says "This difference in granularity is introduced because of a need to avoid network round trips".
    No, that was one of Martin Fowler's reasons for why we end up with coarse versus fine granularity - not mine.
    I know this is his saying, but you build your argument around this statement alone, not considering the others.
  43. I know this is his saying, but you build your argument around this statement alone, not considering the others.
    And my argument is that this considering network round-trips as the only influence on interface design is wrong: "However, IMHO, Mr Fowler has missed the point. Designing an interface isn't solely about reducing roundtrips."
  44. I know this is his saying, but you build your argument around this statement alone, not considering the others.
    And my argument is that this considering network round-trips as the only influence on interface design is wrong: "However, IMHO, Mr Fowler has missed the point. Designing an interface isn't solely about reducing roundtrips."
    Ooops, dumb me, pushed the reply button to soon.....

    I'll let Martin speak for himself on the subject here. (you'll have to register to see it, sorry 'bout that).

    Would you say that the above is a balanced article with equal coverage of all issues including performance, complexity and so on?
  45. Thanks for the link Dan. If I may summarize the article:

      - Distributed computing needs coarse interfaces
      - Fine grained interfaces are too slow for distributed solutions
      - Coarse interfaces are "hard to program"
      - Distributed solutions are slow
      - You should fight like a cornered rat to eliminate as much process distribution as possible
      - Always favor in-process models over multiple-process models
      - It is assumed without additional information that a distributed design sucks.

    To summarize it further: every point Dan is making is right on the money, and Fowler is giving an extraordinarily simplistic opinion which is basically obsessing over a few points. The first two points above can be further encapsulated as "duh". The third point merely shows Fowler's bias for simplicity being more important than serving the business' needs. The rest is alot of generality.

    Frankly, Fowler is pining over the old Smalltalk days when _everything_ ran in the image - including the database. He's sad and depressed that the world has moved passed good old Smalltalk images and object databases, woke up and discovered that you need distributed solutions to succeed in today's world. He even goes so far as suggesting running everything in the database, and showing how disappointed he is that this doesn't work very well in practice.

    Really, Fowler's whole point is that he dislikes the effort that goes into designing distributed systems and will fight tooth and nail to make _his own life easier_. And if he's wrong and the client does need a distributed system? Hell - once again, he'll either get more money or some damned consultant after him will pick it up.

        -Mike
  46. To summarize it further: every point Dan is making is right on the money, and Fowler is giving an extraordinarily simplistic opinion which is basically obsessing over a few points.
    ...
    Really, Fowler's whole point is that he dislikes the effort that goes into designing distributed systems and will fight tooth and nail to make _his own life easier_.
    I see Dan's and Mike's views as much to an extreme ("always distribute") as Martin's view. Again, as much as Martin might be biased against distribution, he does give the considerations needed when designing distribution. In a distributed applications, if there are NOT coarse grained distributed components, then the costs (the multiple network round trips for each user request and the resulting latency and concurrency escalations) and the are extremely high. This is a fact!

    A simple question: Assume well designed components- that are ready for distribution- and surely easier to maintain and extend. For any such application, from a performance standpoint (with no $cost consdierations), isnt it always best to co-locate (even if on a HA/cluster) ALL pieces? Including the DB?

    If the answer to the above is YES (even if not practical), then that is the crux of Martin's article. As a design goal, care needs to be taken to minimize the costs that may arise from Distribution. And not distribute just for the sake of distribution.

    Cheers,
    Ramesh
  47. I see Dan's and Mike's views as much to an extreme ("always distribute") as Martin's view.
    I don't think I have, at any point, said you must always distribute (straight out of my weblog entry):
    Distributed systems as for single-machine systems, are not the global solution to all problems. You need to choose the right thing for the right problem. In most systems I build, I end up with a hybrid solution with various pieces distributed and various pieces not distributed. I don't "fight like a cornered rat" to avoid distribution equally, I don't use it for all parts of a system. Master craftsmen use many tools, their skill being that they know which tool to use when and they can use them all equally well. Distributed systems are just another tool in the box to be used appropriately which is neither never nor is it always.
    if there are NOT coarse grained distributed components, then the costs (the multiple network round trips for each user request and the resulting latency and concurrency escalations) and the are extremely high. This is a fact!
    It's a fact that is meaningless without a context:

    (1) Costs extremely high in comparison to what? The point I was trying to make in my original posting is that the cost incurred is not always relevant because it's completely overshadowed by other factors.
    (2) I agree there may be a subset of interactions as required by a subset of use-cases where it's prohibitive to do this kind of thing. However, there are other things outside of those subsets where this is not the case.
    A simple question: Assume well designed components- that are ready for distribution- and surely easier to maintain and extend. For any such application, from a performance standpoint (with no $cost consdierations), isnt it always best to co-locate (even if on a HA/cluster) ALL pieces? Including the DB?
    (1) No $cost considerations is okay for a theoretical discussion but that doesn't work in the real world as has been pointed out by others on this thread.

    (2) Please define the kind of cluster you had in mind when you made this statement as there are several different ways to build such a thing and they will give different performance behaviour/characteristics for the same workload.
  48. It's a fact that is meaningless without a context:(1) Costs extremely high in comparison to what? The point I was trying to make in my original posting is that the cost incurred is not always relevant because it's completely overshadowed by other factors.(2) I agree there may be a subset of interactions as required by a subset of use-cases where it's prohibitive to do this kind of thing. However, there are other things outside of those subsets where this is not the case.
    The cost is purely in terms of performance. Consider an app:
    - accessing component A.foo()
    - which accesses component B.foo1() in a loop.
    And B happens to be on another server. This is where teh network round trips become extremely relevant. And get furtheraggravated, if A.foo() holds locks on other resources, thus degrading other operations as well- via the ensuing lock waits. This is where the corase-grained components (with minimal interactions across components) are key for effective distribution.
    (1) No $cost considerations is okay for a theoretical discussion but that doesn't work in the real world as has been pointed out by others on this thread.(2) Please define the kind of cluster you had in mind when you made this statement as there are several different ways to build such a thing and they will give different performance behaviour/characteristics for the same workload.
    From a cost standpoint a cluster of low-cost servers running one application or a set of same machines each running one part of a distributed app would be the same. Now if a "single" application (on a cluster), wherein all calls across beans are local in-VM, yet scaling is achieved by the cluster serving a larg number of requests, this would be the best case for performance. In this process-co-location mode, the resulting cost penalty may be the distributed cache. If the application can manage this cost (say using capabilities provided by the App Server), then this cluster solution without distributed components wouldperform best.

    (If the distributed cache is key to the performance, then distributing the components across the machines, each with local caches wouldperform better. But this is the class of applications,where there is extensive process interaction across components and still needing distributed components, that I still dont have a feel for. What kind of apps fit this bill?)

    Cheers,
    Ramesh
  49. The cost is purely in terms of performance. Consider an app:
    - accessing component A.foo()
    - which accesses component B.foo1() in a loop.
    And B happens to be on another server. This is where teh network round trips become extremely relevant. And get furtheraggravated, if A.foo() holds locks on other resources, thus degrading other operations as well- via the ensuing lock waits. This is where the corase-grained components (with minimal interactions across components) are key for effective distribution.
    OK - you've set one boundary to the problem set, by demonstrating that super-fine-grained access over a network can lead to problems. This is the first boundary, and first step, in a problem that requires about 50 more steps at minimum before you should start making pervasive design decisions. Fowler gives us a huge boost and defines step #1, and leaves steps #2-#50 as an exercise to the reader :-/

        -Mike
  50. It's a fact that is meaningless without a context:(1) Costs extremely high in comparison to what? The point I was trying to make in my original posting is that the cost incurred is not always relevant because it's completely overshadowed by other factors.(2) I agree there may be a subset of interactions as required by a subset of use-cases where it's prohibitive to do this kind of thing. However, there are other things outside of those subsets where this is not the case.
    The cost is purely in terms of performance. consider an app:
    - accessing component A.foo()
    - which accesses component B.foo1() in a loop.
    And B happens to be on another server. This is where teh network round trips become extremely relevant. And get furtheraggravated, if A.foo() holds locks on other resources, thus degrading other operations as well- via the ensuing lock waits. This is where the corase-grained components (with minimal interactions across components) are key for effective distribution.
    (1) No $cost considerations is okay for a theoretical discussion but that doesn't work in the real world as has been pointed out by others on this thread.(2) Please define the kind of cluster you had in mind when you made this statement as there are several different ways to build such a thing and they will give different performance behaviour/characteristics for the same workload.
    From a cost standpoint a cluster of low-cost servers running one application or a set of same machines each running one part of a distributed app would be the same. Now if a "single" application (on a cluster), wherein all calls across beans are local in-VM, yet scaling is achieved by the cluster serving a larg number of requests, this would be the best case for performance. In this process-co-location mode, the resulting cost penalty may be the distributed cache. If the application can manage this cost (say using capabilities provided by the App Server), then this cluster solution without distributed components wouldperform best.

    (If the distributed cache is key to the performance, then distributing the components across the machines, each with local caches wouldperform better. But this is the class of applications,where there is extensive process interaction across components and still needing distributed components, that I still dont have a feel for. What kind of apps fit this bill?)

    Cheers,
    Ramesh
  51. I see Dan's and Mike's views as much to an extreme ("always distribute") as Martin's view. Again, as much as Martin might be biased against distribution, he does give the considerations needed when designing distribution. In a distributed applications, if there are NOT coarse grained distributed components, then the costs (the multiple network round trips for each user request and the resulting latency and concurrency escalations) and the are extremely high. This is a fact!
    Two problems here - first, you're saying things differently than Fowler himself does. Read through his writings on this sort of subject and, as I mentioned before, any talk about when you should distribute isn't a detailed analysis, it's a literary escape hatch. The plain fact is that Fowler says "Don't distribute; when someone forces you to do so at gunpoint, then use coarse-grained interfaces". That is the complete net sum his advice. Wow, thanks Martin!

    Secondly, no one is saying "always distribute". It would be more accurate to say "keep distribution possibilities at the front of your brain when you're designing a system. 'Cuz if you don't need to distribute processing today you will need to tomorrow".

    You go on to say "the costs are extremely high". Define "extremely". Please, I'll wait :-) One of my points, and I believe one of Dan's as well, is that the "costs" you're citing here are often so low that they appear as noise in the overall system. A dumb example: say you have an RDBMS query which takes 200 milliseconds. This is _not_ unreasonable at all. Guess how long the round trip for such a query is on a 100MBit network? Probably on the order of 5-10 milliseconds at worst.

    The lesson is that "coarseness" is distributed computing 101 - the intro course you give to freshmen. What Dan is talking about is more like distributed computing 472, an advanced seniors course
    A simple question: Assume well designed components- that are ready for distribution- and surely easier to maintain and extend. For any such application, from a performance standpoint (with no $cost consdierations), isnt it always best to co-locate (even if on a HA/cluster) ALL pieces? Including the DB?
    How CS 101 of you. :-)

    The answer of course is "it's always best to do what's right for your environment - given the machines you have available, political infighting in the company, legacy software you have to deal with, timelines, latency minimums, max expected simulaneous users, projected burst traffic, future scaling needs, failover/recovery requirements, operations monitoring requirements, and oh yes the business requirements as well that intersect with the design".

    You could sum it up by saying that Fowler probably makes hundreds of thousands of bucks a year stating the obvious and assuming optimal conditions; the rest of us make alot less and have to make much more difficult decisions in far less homogenous and simplistic environments.

        -Mike
  52. Distribution of objects is a Niche[ Go to top ]

    I've done both -- and a need for distributed objects is *very* rare.

    One fella here said all his work requires distributed objects. Dan seems to feel all his work does, too (altho forgive me if I'm misunderstanding you). I think being immersed in that small market niche has clouded some minds, the classic 'can't see the forest for the trees' problem.

    In my experience, something like 2% of the apps out there truly require distributed objects. In those cases, it's the only real choice (altho I've seen some amazing backflips tried to avoid it!) But for the other 98% of the real world work done, distributing objects would be a giant mistake you will regret.

    I'd also like to say that designing for distributed objects means you're cripling your OO architecture/design (are we agreed on that)? If so, then to me, that cost is too high, unless you're absolutely forced to it. And, I believe, that is exactly what Fowler is saying.

    In most apps I've seen, the use of distributed objects was the wrong choice, and done for fashion, not need. It was the 'hot buzzword of the month', the sexiest new style, and was used without regard to the costs involved.
  53. Distribution of objects is a Niche[ Go to top ]

    One fella here said all his work requires distributed objects. Dan seems to feel all his work does, too (altho forgive me if I'm misunderstanding you). I think being immersed in that small market niche has clouded some minds, the classic 'can't see the forest for the trees' problem.
    I don't think you read Fowler's article, Joe - you're not saying what Fowler was. Fowler was saying that all distribution across processes is bad and should be avoided at all costs - said avoidance including selling your grandparents and fighting like a cornered rat (Fowler's words, not mine).

    Generally most applications don't need distributed objects in the sense of fine-grained RMI objects or Entity bean access. Yuck. But that's not what I'm talking about or others here either. What we're saying is that most apps do end up being distributed in one way or another. Whether that's through web services, or JMS, or JCA connectors, or some other mechanism, distribution is a fact of life today. In the article Fowler basically says you should avoid any distributed computing mechanism at almost any cost. Overly fine grained "distributed objects" is one example he uses - but he goes on and says that any sort of distribution, even a well designed one, is bad unless you're forced into it (and he says "forced" specifically and repeatedly).

    As I've mentioned a couple of times now - Fowler is arguing for the Smalltalk God-image (or a cluster of such God images). That's his ideal runtime, and that's what he's always going on about.

         -Mike
  54. I don't think you read Fowler's article, Joe - you're not saying what Fowler was. Fowler was saying that all distribution across processes is bad and should be avoided at all costs - said avoidance including selling your grandparents and fighting like a cornered rat (Fowler's words, not mine).
    I don't know, reading the article Dan linked to in Message #119060, I think ya'll are mis-representing Fowler's points. Read that article, he lays it all out very clearly. "When we let objects wander, we all pay the performance price."

    He's talking only about distributed objects.
    Generally most applications don't need distributed objects in the sense of fine-grained RMI objects or Entity bean access. Yuck. But that's not what I'm talking about or others here either.
    Um, I don't mean to be coy, but -- did you actually read the article? Not just Dan's straw-man rephrasing, but the actual article? From that article, again, "The primary reason that the distribution by class model doesn’t work has to do with a fundamental fact of computers. A procedure call within a process is extremely fast. A procedure call between two separate processes is orders of magnitude slower.Make that a process running on another machine, and you can add another order of magnitude or two, depending on the network topography involved.As a result, the interface for an object to be used remotely must be different from that for an object used locally within the same process. A local interface is best as a fine-grained interface. "

    He goes on to explain. "Where You Have to Distribute -- So you want to minimize distribution boundaries and utilize your nodes through clustering as much as possible. The rub is that there are limits to that approach—that is, places where you need to separate the processes. If you’re sensible, you’ll fight like a cornered rat to eliminate as many of them as you can, but you won’t eradicate them all."

    Then he lists the various distributed architectures, favoring, as do I, scaling horizontally, so to speak, clustering out. He specifically mentions 3-tiered, seperating the web, client and data tiers.

    I think there's some serious misunderstanding going on here.
    Overly fine grained "distributed objects" is one example he uses - but he goes on and says that any sort of distribution, even a well designed one, is bad unless you're forced into it (and he says "forced" specifically and repeatedly).
    Exactly what I just said. Maybe 2% of the time, in my experience, the developers is literally forced into using distributed objects. By far the majority of the time -- 98% in my opinion -- using distributed objects is actually a hindrance, and in several projects I've heard of the complexity of EJB's caused the project to fail.

    I'm curious, do you use distributed objects in pretty much every app you build?
  55. Um, I don't mean to be coy, but -- did you actually read the article? Not just Dan's straw-man rephrasing, but the actual article? From that article, again, "The primary reason that the distribution by class model doesn?t work has to do with a fundamental fact of computers. A procedure call within a process is extremely fast. A procedure call between two separate processes is orders of magnitude slower.Make that a process running on another machine, and you can add another order of magnitude or two, depending on the network topography involved.As a result, the interface for an object to be used remotely must be different from that for an object used locally within the same process. A local interface is best as a fine-grained interface. "
    Yes, I read the article :-). He says "fine grain good, coarse grain bad, distributed forces coarse grain, distributed slow. Oogah OOgah".

    Leaving out the "ooga ooga" that's the net sum of his advice.

    He goes on to say it'd be really nice to run everything within the database, but "often that's not practical". If you read closely you can see he really wants your code running inside the database, but "often that's not practical".

    He goes on to say your web serving should be in the same process as the app server. No reasons, just that they should be in the same process "all things being equal".

    His final thoughts from the bullet list near the beginning is when you do have to distribute, "Then you just have to hold your nose and divide your software into remote, coarse-grained components".

    If you read Fowler's article closely, you can see that it's not distribution he hates so much, _it's coarse-grained interfaces_. He despises them - as the above indicates.
    Exactly what I just said. Maybe 2% of the time, in my experience, the developers is literally forced into using distributed objects. By far the majority of the time -- 98% in my opinion -- using distributed objects is actually a hindrance, and in several projects I've heard of the complexity of EJB's caused the project to fail.

    I'm curious, do you use distributed objects in pretty much every app you build?
    Joe, you've got distributed objects on the brain. Read what I wrote:

    "Overly fine grained 'distributed objects' is one example he uses - but he goes on and says that any sort of distribution, even a well designed one, is bad unless you're forced into it (and he says 'forced' specifically and repeatedly)."

    See where I say "...and says that _ANY_ sort of distribution, even a well designed one, is bad" (with emphasis added on "any")? Fowler's thesis is that any form of distribution is bad, and you should fight it tooth and nail - his examples cascade over various levels - and at every single level he fights distribution tooth and nail. This has NOTHING to do with distributed objects. Invert Fowler's bullet points and it rolls up as a recommended architecture of "database, web servder, app server, and application code should all run in the same process".

    To spin this a different way - I see JMS, JCA, and other distributed technologies as fundamental tools of my job. Fowler casts these as necessary evils that much be tolerated - but should be eradicated whereever you can get away with it.

    Just to make it clear - Fowler's article says at every point in a design you should do whatever you can to remove multiple processes. "Distributed objects" are a read herring here - I don't consider JMS a "distributed object technology", but I'll bet ya Fowler would because "it's remote and forces a coarse-grained interface". I don't consider seperate web server and app server processes "distributed objects" - but Fowler recommends against it. In a perfect world, Fowler's app code would run in a database but he barely concedes that this isn't practical. You're focusing on the common sense interpretation of "distributed objects" but Fowler considers any two processes talking together "distributed objects" in one stripe or another.

    Ultimately Fowler just hates distributed-anything. I'm sure he would be happy as a pig in its slop if he could force users to all use a single executable image, and anything less than that ideal is something that a cruel world is forcing him. Note how he repeatedly laments how he's forced to do something, or how his ideal "is not practical". Whenever he talks about distribution (not objects - just distribution) it's got a load of negative adjectives attached to it.

         -Mike
  56. I think not[ Go to top ]

    He says "fine grain good, coarse grain bad, distributed forces coarse grain, distributed slow. Oogah OOgah".

    Leaving out the "ooga ooga" that's the net sum of his advice.
    Um, I'm sorry, but no, I don't agree, not at all.

    He's specifically talking about distributed objects, and he says so again and again and again. And he gives specific reasons you seem not to have read at all. From the article:

    -- "Objects have been around for a while, and sometimes it seems that ever since they were created, folks have wanted to distribute them. However, distribution of objects, or indeed of anything else, has a lot more pitfalls than many people realize, especially when they’re under the influence of vendors’ cozy brochures. This article is about some of these hard lessons—lessons I’ve seen many of my clients learn the hard way."

    -- "A fine-grained interface is good because it follows the general OO principle of lots of little pieces that can be combined and overridden in various ways to extend the design into the future."

    -- " Hence, we get to my First Law of Distributed Object Design: Don’t distribute your objects!"

    -- "The overriding theme, in OO expert Colleen Roe’s memorable phrase, is to be “parsimonious with object distribution.” Sell your favorite grandma first if you possibly can."

    Objects, objects, objects. "Don't distribute your objects". "Be parsimonious (what a word!) with object distribution." I'm sorry, but certain folks here seem to have so missed the boat it's hard to believe. It's almost as if Fowler has hit a raw nerve, and gored someone's sacred cow.

    Seriously, I'm curious, what % of the apps you build have distributed objects in them somewhere? I've only really come across one app that really required them in my work -- an insurance processing app which had to crank interest calculations and the like for 3 million policies every night. I've seen them used (or attempted) in about a 1/2 dozen apps, tho, and only one of those was a 'success', and even that one probably could have been done quicker and cheaper without them.

    Do you fit distributed objects into all your work?
  57. Re: I think not[ Go to top ]

    However, distribution of objects, or indeed of anything else, has a lot more pitfalls than many people realize, especially when they’re under the influence of vendors’ cozy brochures. This article is about some of these hard lessons—lessons I’ve seen many of my clients learn the hard way."
    Note the words "...or indeed anything else....", so it ain't just about objects

    In Java, everything is an object, whether it's coarse-grained, fine-grained or whatever...so therefore it actually comes down to...."Don't distribute", not "Don't distribute unless....." or even cogent examples of what "unless you have to" actually may mean. Or even the flipside of why you may want to distribute.

    The point that many people are trying to make, is that distribution and a potential view on whether you will need distribution in the future is needed during the design phase. Designing 98% of applications to be local means that 98% of your applications will be constrained by the resources of the single system and/or JVM that they run on.
  58. We'll just disagree, then.[ Go to top ]

    Note the words "...or indeed anything else....", so it ain't just about objects
    " . . . or indeed anything else, has a lot more pitfalls than many people realize . . ."

    I'm sure you'll agree that's a 100% true statement, yes? And that's the point of the entire article. There are costs ("pitfalls") associated with distributing, so only do it when necessary, when 'forced' by user requirements.

    In fact, that rule goes for just about everything in coding. Even simply putting a button on a screen. Don't put a button on the screen unless you are forced to, by a user requirement.

    Forgive me, but you seem to have an agenda, you seem to be a distributed object advocate, and Fowler has touched a nerve with you. I radically disagree with your "interpretation" of his comments.

    I'm afraid I've got a full day ahead, and I think we've pretty much gone as far as we're likely to go with this. We just read this differently.

    Have a nice day.
  59. We'll just disagree, then.[ Go to top ]

    There are costs ("pitfalls") associated with distributing
    But there are also benefits, what O disaggree with is painting apicture of just negatives and not providing some form of Devil's Advocae that states the benefits to using distribution. If Fowler was so orrect that there are only pitfalls to distribution there wouldn't be an Internet.
    Forgive me, but you seem to have an agenda, you seem to be a distributed object advocate, and Fowler has touched a nerve with you. I radically disagree with your "interpretation" of his comments.
    There is no interpretation as far as I'm concerned, but there is also no "balaned" discussion from Fowler that not opnly touches on the "costs" of distribution but also the benefits of it.
    I'm afraid I've got a full day ahead, and I think we've pretty
    much gone as far as we're likely to go with this. We just read this differently.Have a nice day.
    Please understand I and others on this thread aren't trying to offend, but for someone to blindly make a law of "don't distribute" and not give exceptions is wrong. Rules and laws are very draconian unless they have exceptions.

    Calum
  60. Distributing databases[ Go to top ]

    The one thing about his article that was interesting is the idea of distributing different customers to different databases. Sometimes that may be ok, but with the app I'm working on now, there is a "common" schema with data belonging to the system and then there are sub schemas that are different applications. We constantly join with the common schema, so I'm not sure that would work in this case.
  61. Fowler says,
    A second divide often occurs between server-based application software (the application server) and the database. Of course, you can run all your application software in the database process itself, using such things as stored procedures. But often that’s not practical, so you must have separate processes. They may run on the same machine, but once you have separate processes, you immediately have to pay most of the costs in remote calls. Fortunately, SQL is designed as a remote interface, so you can usually arrange things to minimize that cost.
    Dan's take?
    He then goes on to state that one should prefer stored procedures over anything else because one can then execute an application within the same process as the database giving better performance. He states that when you can't do this you incur most of the costs associated with remoteness. Wrong again, there are various tools for avoiding this such as shared memory and pipes which can be used as part of your remote infrastructure in such a manner as to optimize communication between processes on the same machine so as to avoid most of the cost of remote communication.
    This is called a 'straw man'.
  62. Distribution of objects is a Niche[ Go to top ]

    <mike>
    Fowler was saying that all distribution across processes is bad and should be avoided at all costs
    </mike>
    I think you are taking fowler too literally here. If he implies not u use distributed objects all the time he does not have to talk about facades or data transfer objects. He is using the “Grandma” analogy to stress the point to consider whether there is a real need for distributed objects. I’m quite surprised you didn’t get it.

    Pratheep
  63. Distribution of objects is a Niche[ Go to top ]

    I say I'm taking Fowler too literally, I'd say that you're assigning too much common sense and interpretation to his words. Whenever Fowler talks about remote-anything it's _loaded_ with negative adjectives. Here's a sample " In the end, even with coarse-grained interfaces on every remotable class, you?ll still end up with too many remote calls and a system that?s awkward to modify as a bonus". He shows you how you can write a system that (in his opinion barely) works by using something like DTOs or facades - and he'll go on and tell you that it's slow and awkward.
     He is using the ?Grandma? analogy to stress the point to consider whether there is a real need for distributed objects. I?m quite surprised you didn?t get it.
    Having read more than the books, but many on-line ramblings, and looking at lots of code snippets - he means what he says (well, not literally selling ancestors, but he's just about that serious). Perhaps, Pratheep, you're not familiar with the Smalltalk "image" concept. I am, and so is Fowler - and he _loves_ having everything in a single image. A smalltalker's personal heaven is having one image running as a web server, your app code, container stuff, and using something like Gemstone/S so that as much code as possible (and even your configuration!) is all running as a single binary image. That's his ideal and that's how he measures any design.

    He concedes - with regret! - that real life sometimes intrudes and you can't run as a single image. He then bitches and moans about how horrible it is that he's forced to distribute in some situations, and gives ideas on how to code in such a situation - and bitches again at how awkard and slow it all is compared to his beloved ideal.

         -Mike
  64. <blockquoteThe plain fact is that Fowler says "Don't distribute; when someone forces you to do so at gunpoint, then use coarse-grained interfaces". That is the complete net sum his advice. Wow, thanks Martin!Secondly, no one is saying "always distribute". It would be more accurate to say "keep distribution possibilities at the front of your brain when you're designing a system. 'Cuz if you don't need to distribute processing today you will need to tomorrow". The crux here is that coarse-grained objects are the objects that you distribute. Dont believe there is any disagreement on this front. (Arguable, if Martin says dont consider distribution possibilities when designing!).
    You go on to say "the costs are extremely high". Define "extremely". Please, I'll wait :-) One of my points, and I believe one of Dan's as well, is that the "costs" you're citing here are often so low that they appear as noise in the overall system. A dumb example: say you have an RDBMS query which takes 200 milliseconds. This is _not_ unreasonable at all. Guess how long the round trip for such a query is on a 100MBit network? Probably on the order of 5-10 milliseconds at worst.
    This suggests that you have hardly experienced performance intensive situations in any of your projects :-)
    Let alone 200ms, even a 20ms round trip to the Db from the middle tier has a cascading cost, when the system is loaded. The cost is typically multiples of the base additionally latency (in this case, the 20 ms). These could be caused by any number of factors that in the least include context switches (and the ensung increased CPU stress) and cascading concurrency aggravation.

    {More CS101 at - http://www.jroller.com/page/rameshl/20040420#to_d_istribute_or_not1 )

    Cheers,
    Ramesh
  65. This suggests that you have hardly experienced performance intensive situations in any of your projects :-)
    ROTFLMAO! Perhaps you should read my blog sometime to see what sort of projects I've been involved in and what the performance aspects were.
    Let alone 200ms, even a 20ms round trip to the Db from the middle tier has a cascading cost, when the system is loaded. The cost is typically multiples of the base additionally latency (in this case, the 20 ms). These could be caused by any number of factors that in the least include context switches (and the ensung increased CPU stress) and cascading concurrency aggravation.
    So you're saying that latency adds up serially? Sorry - that isn't necessarily the case at all.

    As you increase load on a system, latency tends to only go up slightly for quite awhile, where your latency levels per individual thread exist in a relatively tight band. For example, in a pure comms system w/ no persistent store access, for 1-100 threads the latency might stay in a band of 1-10 milliseconds, and latency could be indistinguishable between, say, 40 threads and 60. This happy situation exists because you haven't saturated any resources yet. Latency starts to pile up only when a resource gets saturated and queueing starts happening somewhere in the system - be it network, CPU, or something else.

    If you throw in something with complex locking, like a database, then you've added another factor into the equation - lock contention. A well designed RDBMS and well designed app code will be coded to avoid lock contention as much as possible, and will resemble the comms example above. If there's excessive locking, then you _can_ start getting latency pile ups due to queueing - in this case, you're queueing up due to lock contention.

    In XA situations, it's possible to code the system in pure Java and a fast disk so that you can support >5000 transactions/second from a large number of threads (say, 800 or so) and hold latency down to <100 milliseconds.

    In comms situations, with small messages (1K or less) it's possible to do >15,000 msgs per second with similar latencies in a suitably fast network.

    All of the above of course relies on parallelism, and in many cases piggybacking multiple requests together in one form another - in XA, you might have people sharing disk forces, in networking you might combine multiple requests into a single network packet or otherwise coalesce results somehow.

    And when a given item gets saturated, you get around it by adding "more" of the saturated resource - more machines, more memory, faster disks, more disks, a faster network. Hell, even multiple networks. Perhaps you've heard the term of scaling up software? Well, this is what it's all about - software that lets you add "more" of what you need hardware wise, and that can take advantage of that and distribute costs over multiple components.

    Now from the perspective of a single thread, things are a bit different and closer to what you describe. Traditionally in Java land, people do access resources serially and in a synchronous manner. So if they access 10 "external" resources, they access them one at a time, and the total "request" time becomes the sum of all of those accesses. This goes hand in hand with people talking about not using fine-grained network interfaces - to avoid summing up the cost of many network round trips. In this situation you will hit a wall at some point.

    And this is the point where people learn what people in the financial services arena learned decades ago - asynchronicity and non-blocking I/O and parallelism are your friends :-) Some aspects of a workflow may be serial in nature and force you to code them that way, but a surprising amount of work in a "request" coming into a system may be done asynchronously in a non-blocking manner. The model here is often:

       - Do some up front work
       - Fire off external requests asynchronously to "N" resources
       - Reap results of async requests as they come in
       - Process & return results to caller

    Such an approach can take 200 milliseconds of work and accomplish it in 40 or 50 milliseconds - and adding a few extra asynchronous firings doesn't measurably change this perceived time to the user.

    This sort of design tends to be very fast and very resilient in the face of errors. It's primary drawback is that it is a very different way of doing things, and one that some people find more "complex". To Fowler's credit, he mentions a growing like of asynchronous protocols and approaches to this sort of problem. He recognizes that asynchronicity and non-blocking semantics solves a number of his performance critiques of distributed computing (he's a couple of decades late in that observation, but at least he finally got there).

         -Mike
  66. ROTFLMAO! Perhaps you should read my blog sometime to see what sort of projects I've been involved in and what the performance aspects were.
    I do read your blog Mike. And surely it is is pretty heavy stuff. (But that still doesnt take away the lightness of your perception that distibuted components do not degrade performance when designed well. :-) )
    So you're saying that latency adds up serially? Sorry - that isn't necessarily the case at all.As you increase load on a system, latency tends to only go up slightly for quite awhile, where your latency levels per individual thread exist in a relatively tight band.
    Again, no argument when there are coarse grained components with minimal round trips over the course of processing a request. But if there are fine grained components, and in processing one end user request there are multiple network round trips, the latencies will increase manifold.
    For example, in a pure comms system w/ no persistent store access, for 1-100 threads the latency might stay in a band of 1-10 milliseconds, and latency could be indistinguishable between, say, 40 threads and 60. This happy situation exists because you haven't saturated any resources yet. Latency starts to pile up only when a resource gets saturated and queueing starts happening somewhere in the system - be it network, CPU, or something else.If you throw in something with complex locking, like a database, then you've added another factor into the equation - lock contention. A well designed RDBMS and well designed app code will be coded to avoid lock contention as much as possible, and will resemble the comms example above. If there's excessive locking, then you _can_ start getting latency pile ups due to queueing - in this case, you're queueing up due to lock contention.
    This lock contention is the challenge. Surely no excuse for poorly designed apps and databases. But all said and done, in oltp biz apps, lock contention is inevitable (when loaded). And when contended, and some operations serialized, additional latencies (like network) have a multiplying effect.
    In XA situations, it's possible to code the system in pure Java and a fast disk so that you can support >5000 transactions/second from a large number of threads (say, 800 or so) and hold latency down to <100 milliseconds.In comms situations, with small messages (1K or less) it's possible to do >15,000 msgs per second with similar latencies in a suitably fast network.All of the above of course relies on parallelism, and in many cases piggybacking multiple requests together in one form another
    This is the key issue. Parallelism in biz apps (oltp) is not easy. (As is app's control of communication (to piggyback) nor code in pure java with fast disks- esp when comm is managed by say the J2EE App Server and disc access by the DBMS server).

    Cheers,
    Ramesh
  67. Again, no argument when there are coarse grained components with minimal round trips over the course of processing a request. But if there are fine grained components, and in processing one end user request there are multiple network round trips, the latencies will increase manifold.
    The whole coarseness argument has been hashed to death. Yeah, the more network requests the more you're stressing a resource, and the more latency.

    But you're also sort of dilly-dallying between the single-request perspective and the resource utilization perspective. Your arguments are, forgive me, rather conveniently jumping between the two (or so it seems to me). Be clear when you're talking about latency as seen by one request as opposed to aggragated latencies across many requests.

    Also keep in mind the advances in both compute hardware, ethernet boards, and networks. Much of what Fowler spouts on about is rooted in 10MBit networks with slow processors and fairly dumb Ethernet boards. You can reasonably do more round trips today on a modern hardware with a 100MBit network than you could only five years ago. Fowler is borrowing on old knowledge based on old systems, and he's rather slow to adapt to changes in the underlying hardware. An order of magnitude increase in network capacity from 10MBit to 100MBit changes a number of old "common sense" equations.
    This lock contention is the challenge. Surely no excuse for poorly designed apps and databases. But all said and done, in oltp biz apps, lock contention is inevitable (when loaded). And when contended, and some operations serialized, additional latencies (like network) have a multiplying effect.
    What you're speaking of is worse-casing several different bad applications of distributed computing in a manner so that they cascades - one bad decision cascades upon another to produce really bad numbers.

    You're basically saying "lock contention is inevitable" and "additional latencies (like network) have a multiplying effect" - and you're _wrong_. These are problems when people approach a high-throughput distributed problem with the same mindset as doing single-server solutions. They're not problems with professionals who've been dealing with this problem for years. Back in the mid-90's the brokerage houses were doing transaction rates that would curl your hair, and that of most web-oriented people - because they understood how to get fast distributed processing to work.

    The answers are simple - avoid locks, batch where possible, distribute to gain scalability (the latter is hugely scalable when you avoid locks). Throw in network speed increases and you've got another huge helping hand there.

    Lock contention isn't "inevitable" if you avoid them. Favor inserts over updates, aggressively pursue optimistic locking in conjunction with that, avoid global locks in general - and your problem largely goes away. When you need locks, keep them as local as possible. People figured out how to do this in the 80's in the financial arena - and that's how the various stock exchanges handled a staggering transaction load on pathetic hardware. Use the same approaches today with 100 Mbit (or gigabit :-) networks, crazy fast CPUs and 3 or 4 orders of magnitude more memory, and now you're talking about billions upon billions of transactions in a 7 hour period.
    This is the key issue. Parallelism in biz apps (oltp) is not easy. (As is app's control of communication (to piggyback) nor code in pure java with fast disks- esp when comm is managed by say the J2EE App Server and disc access by the DBMS server).
    I agree that J2EE is stupid when it comes to this - J2EE makes parallelism much harder than it should be. Parallelism in and of itself is not all that hard, though. I've been pushing for more asynchronous and parallel features to be built into J2EE for some time now. For now, you have to role your own - and guess what? That's exactly what places like brokerage houses and exchanges do right now. Some people at IBM are pushing the envelope here as well, like Websphere and its Async Beans concept (hi Billy!).

    Fowler is fundamentally wrong because he tries so desperately to apply a single-process mentality to problems that just don't work in that model, even on today's modern hardware. He's bitter because reality is cheating him of a life of ease and content :-). In his picture that he paints of our world, distributed computing is a curse that should be avoided at all costs, and barely tolerated when forced to under extreme duress. Sensible people see distributed computing as a reality of today's hardware, and work pragmatically to work with it, not against it. This is precisely what people like Billy Newport are doing - they're not avoiding the problem or railing against it, they're giving people tools to work _with_ it.

         -Mike
  68. But you're also sort of dilly-dallying between the single-request perspective and the resource utilization perspective. Your arguments are, forgive me, rather conveniently jumping between the two (or so it seems to me). Be clear when you're talking about latency as seen by one request as opposed to aggragated latencies across many requests.
    Am refering to OLTP operations. Here again, the primary operation of interest in an end user request. Assuming a component model, wherein there is an entry point service method that is invoked (say, in a SLSB) that provides the service that the user needs (and requested via the presentation tier). And this service method in the course of its processing accesses other distributed components. (Even in batch processing (say, in J2EE again), there will be a similar service entry point that would perform/trigger the batch processing.)

    In these cases, the latencies am referring to are those that affect the overall response and scaling of the method of user's interest, which is the SLSB method.
    What you're speaking of is worse-casing several different bad applications of distributed computing in a manner so that they cascades - one bad decision cascades upon another to produce really bad numbers.
    Am referring to a conventional biz app, that has high concurrent requests. And in such cases, when implemented in J2EE (OK, stupid maybe- but only when you need parallel processing. Where agian, MDBs do offer decent framework.), there is bound to be contentions. And latencies will aggravate.

    Cheers,
    Ramesh
  69. Am refering to OLTP operations. Here again, the primary operation of interest in an end user request. Assuming a component model, wherein there is an entry point service method that is invoked (say, in a SLSB) that provides the service that the user needs (and requested via the presentation tier). And this service method in the course of its processing accesses other distributed components. (Even in batch processing (say, in J2EE again), there will be a similar service entry point that would perform/trigger the batch processing.)

    In these cases, the latencies am referring to are those that affect the overall response and scaling of the method of user's interest, which is the SLSB method.
    OK, you've now constrained us to one very narrow approach. You do realize that this is not the only way to do this, right?

    As I said - you're painting a worst-case scenario. In particular, your latencies will pile up, and most of those latencies will be in the RDBMS (or other remote database-like resource). As other people have pointed out here network latency doesn't make much difference - assuming about 2 milliseconds network latency (which is reasonable on 100MBit with request-response), 2 such round trips might take 4 milliseconds network time, 50 such requests might take 100 milliseconds. Either way the user won't really notice.

    The real problem here will be pessimistic locking of resources.

    So don't do that!
    Am referring to a conventional biz app, that has high concurrent requests. And in such cases, when implemented in J2EE (OK, stupid maybe- but only when you need parallel processing. Where agian, MDBs do offer decent framework.), there is bound to be contentions. And latencies will aggravate.
    Switch your RDBMS locking from pessimistic locking to optimistic locking and you'll find that many of your problems simply go away. On a 100MBit network, you'll additionally find that end-users will not notice if you're doing 1 giant super-coarse-grained request or 20 smaller requests. Likewise - in these scenarios you are probably _not_ saturating any resources, so simultaneous requests from multiple threads will not be unduly impacting each other and piling up latency.

    As I said - the traditional J2EE approach is about the worst way to do distributed computing. People are learning this (in spades) and are learning that the answer isn't to avoid distribution - the answer is to design your app _for_ distribution, to use techniques like optimistic locking, and to embrace asynchronous models.

        -Mike
  70. Am refering to OLTP operations. Here again, the primary operation of interest in an end user request. Assuming a component model, wherein there is an entry point service method that is invoked (say, in a SLSB) that provides the service that the user needs (and requested via the presentation tier). And this service method in the course of its processing accesses other distributed components. (Even in batch processing (say, in J2EE again), there will be a similar service entry point that would perform/trigger the batch processing.)In these cases, the latencies am referring to are those that affect the overall response and scaling of the method of user's interest, which is the SLSB method.
    OK, you've now constrained us to one very narrow approach. You do realize that this is not the only way to do this, right?As I said - you're painting a worst-case scenario. In particular, your latencies will pile up, and most of those latencies will be in the RDBMS (or other remote database-like resource). As other people have pointed out here network latency doesn't make much difference - assuming about 2 milliseconds network latency (which is reasonable on 100MBit with request-response), 2 such round trips might take 4 milliseconds network time, 50 such requests might take 100 milliseconds. Either way the user won't really notice.The real problem here will be pessimistic locking of resources.So don't do that!
    Two issues here:
    1. The latency is not just the network round trip. It also includes the basic cost of invoking a distributed component. (Even if this still doesnt explain latencies in seconds).
    Again these are not necessarily deterrents to distribution. But these surely are costs of distribution- making it difficult to be a no brainer decision. Especially, when having fine grained compoennets with excessive interactons for a single end user request.
    2. Avoiding Pessimistic locking is easier said than done. Effectiveness of using Optimistic locking very much depends on the concurrency dynamics. IN cases with high concurrent updates, using Optimistic locking will result in excessive rollbacks. (I know this very well as we were one of the earlier J2EE vendors to implement Optimistic Concurrency control. And also have this adopted in well designed customer apps).

    Cheers,
    Ramesh
  71. I've posted a blog entry related to this message: http://www.artima.com/weblogs/viewpost.jsp?thread=45621
  72. I've implemented database systems and written many a network application and here's an interesting fact:Network round-trips are often considerably less costly than the time taken for a transactional database operation due to the need to forcibly log transactional operations which is very costly in terms of disk performance. i.e. network round-trips aren't always the performance bottleneck.
    Dan, I'm not sure how to interpret your phrase "forcibly log transactional operations". Of course there will be disk IO when the transaction is committed, otherwise the whole concept of ACID transactions wouldn't make much sense. But during the course of a transaction, single operations are not normally logged to disk. At least in Oracle they are first written to an in memory buffer which is only forcibly flushed to disk at the end of the transaction.

    As for IPC vs. stored procedures, my experience is, that if you need to process more than one or two SQL statements (that may depend on each other's results) as part of a single transaction, then SPs are always going to perform a lot better than any form of IPC. There are many reasons for this beyond those connected to network roundtrips, like character set conversions, etc.
     
    But this whole distribution debate should, in my view, be seen more in the light of organisational necessities than just technical ones. E.g. you have one situation when you do an in-house app with a single centralised development team that can change distribution aspects directly in the code as need arises. And you have a completely different situation when you are an IVR that has to make sure that your clients' sysadmins can reconfigure your app to optimize for the particular usage patterns they observe.
  73. Dan, I'm not sure how to interpret your phrase "forcibly log transactional operations". Of course there will be disk IO when the transaction is committed, otherwise the whole concept of ACID transactions wouldn't make much sense. But during the course of a transaction, single operations are not normally logged to disk. At least in Oracle they are first written to an in memory buffer which is only forcibly flushed to disk at the end of the transaction.
    Indeed, in the case of Oracle, single operations aren't logged to disk just as you say (not all components's are that optimal however). I wasn't really trying to draw focus to this fact specifically. My intent was to introduce a different perspective to demonstrate that focus on round-trips is only a small piece of the puzzle.
    As for IPC vs. stored procedures, my experience is, that if you need to process more than one or two SQL statements (that may depend on each other's results) as part of a single transaction, then SPs are always going to perform a lot better than any form of IPC. There are many reasons for this beyond those connected to network roundtrips, like character set conversions, etc.
    Agreed, there are many reasons why in-process stuff is faster than IPC. Character-set conversions are certainly one of them but not always necessary (we're not always talking to a database). However, use of IPC can still save you a lot over full-on remote communication. The purpose behind my making of these points was to draw attention to the myriad of technical variables that factor into making decisions about remoteness/distributed systems other than network round-trips. It was an exercise in highlighting ambiguity and false assumptions.
    But this whole distribution debate should, in my view, be seen more in the light of organisational necessities than just technical ones. E.g. you have one situation when you do an in-house app with a single centralised development team that can change distribution aspects directly in the code as need arises. And you have a completely different situation when you are an IVR that has to make sure that your clients' sysadmins can reconfigure your app to optimize for the particular usage patterns they observe.
    Quite right - I didn't give much consideration to this part of the "should we distribute or not" puzzle - in hindsight, that was a mistake - thanks for resetting my perspective.
  74. Indeed, in the case of Oracle, single operations aren't logged to disk just as you say (not all components's are that optimal however). I wasn't really trying to draw focus to this fact specifically. My intent was to introduce a different perspective to demonstrate that focus on round-trips is only a small piece of the puzzle.
    Urgh, that needs a clarification - when I said "other components", I meant things other than Oracle - sorry 'bout that.
  75. Indeed, in the case of Oracle, single operations aren't logged to disk just as you say (not all components's are that optimal however). I wasn't really trying to draw focus to this fact specifically. My intent was to introduce a different perspective to demonstrate that focus on round-trips is only a small piece of the puzzle.
    Urgh, that needs a clarification - when I said "other components", I meant things other than Oracle - sorry 'bout that.
    Dont believe any DBMS worth its salt (for commercial use) logs on every single request. (know the costs of not doing this too well- worked in senior roles for two DBMS companies in the past!)

    Cheers,
    Ramesh
  76. Dont believe any DBMS worth its salt (for commercial use) logs on every single request. (know the costs of not doing this too well- worked in senior roles for two DBMS companies in the past!)Cheers,Ramesh
    Agreed but I'm not talking just about databases - sorry, should've made that clearer.
  77. As for IPC vs. stored procedures, my experience is, that if you need to process more than one or two SQL statements (that may depend on each other's results) as part of a single transaction, then SPs are always going to perform a lot better than any form of IPC. There are many reasons for this beyond those connected to network roundtrips, like character set conversions, etc.
    You get better performance.. but at a cost. Especially if you are using stored procedures to do business logic as well ( and not just as a convenient place to do multiple sqls efficiently). Eventually it all adds up and more and more work gets pushed to the database. I know of a former client of mine who now have over 300 stored procedures resulting in the database being a major bottleneck. In my experience, it is relatively easier to distribute code modules, application servers etc than it is to distribute databases.
  78. This is actually a prime indicator of ignoring distribution in a design (not the stored procedures but the fact of overloading a single system) if I deploy 10 'workers' (be they Jini Services, EJB, Agents, Webapps, whatever) to a single server and I think 'the performance is reasonable and meets my requirements', I can't just assume that as my requirements, and connections, scale, my server will automatically support it - law of diminishing returns (as a side note - it really is frightening how many principles of economics you can apply to computing). Very soon the cost of distributing my 100 workers across, say, 4 machines is less than running all of them on one machine.

    The fact is that mankind has had to deal with distribution and it's associated costs for millenia - yeah we bitch about it, but we wouldn't do it unless it gave us some form of tangible benefit. So why do we, as developers, constantly strive to fight against distribution? Then we hit that vertical ceiling of our server, or the CFO won't sign off another upgrade. When that happens, you want to make sure that you have taken distribution into account, because when it bites you in the backside, it won't let go, until the app is redeveloped.

    Most of us work in multi-server environments with different systems integrating together (Payroll, accounts, database, appserver, mainframe, workflow......) and thinking "oh I can just run everything in my appserver", and act like nothing is distributed, is a fallacy - think about JCA; those are the connections to your appserver (hub) out to your other system (spokes)

    Calum