Discussions

News: Learning about soft and weak references in Java 5

  1. Learning about soft and weak references in Java 5 (21 messages)

    Back in 2001, Heinz wrote a newsletter about a SoftReference based HashMap, where the soft references are on the values, rather than the keys. During the research, Heinz noticed that the SoftReferences behaved almost identically to the WeakReferences.

    Now, Heinz compares the behaviour difference in JDK 1.5 between the soft and the weak reference.

    Then, Heinz presents a new edition of the SoftHashMap using generics and relying on the new behaviour of the SoftReference.

    Lastly, Heinz shows an application of the PhantomReference by defining a new type of reference which he calls the GhostReference, with some additional features.

    Have a look at the 98th edition of The Java Specialists' Newsletter on http://www.javaspecialists.co.za.

    Threaded Messages (21)

  2. Am I too stupid?[ Go to top ]

    Sorry Heinz, but I still didn't get the purpose of a PhantomReference. Except for your hack, when would I use a PhantomReference instead of a WeakReference except for the enqueuing order (weak -> finalizer -> phantom).

    Does anyone have real world experience with SoftReferences out there?

    There's documentation for Sun VMs' behaviour (soft ref is cleared if not accessed since (free heap times constant)) but how does this translate into observable behaviour? For example, an application that hasn't been used overnight, will it have its caches flushed on the next Full GC? Is wall-clock time a good measurement?

    I'm still kind of hesitant to replace our hand-coded eviction mechanism by SoftRef's although they look sooo handy.

    Matthias
  3. Am I too stupid?[ Go to top ]

    Sorry Heinz, but I still didn't get the purpose of a PhantomReference. Except for your hack, when would I use a PhantomReference instead of a WeakReference except for the enqueuing order (weak -> finalizer -> phantom).

    I guess the purpose is to add the finalizing ability to a class "after the fact", i.e. without overriding finalize(). Similar to writing a Comparator instead of just implementing Comparable - you might want to know when an object has been GCed, but without changing the object.

    I would personally prefer using the GhostReference rather than the WeakReference because I can then match up the object that is being released (even though it is finalized) without having a separate map for it. However a straight PhantomReference - hmmm. Dangerous as well - you must remember to let go of your PhantomReference, plus call "clear()" on it, otherwise you develop a memory leak. In addition, if your PhantomReference processing code takes too long, you might end up with the classic producer-consumer problem, where the objects are allocated faster than they are being deallocated.
    Does anyone have real world experience with SoftReferences out there? There's documentation for Sun VMs' behaviour (soft ref is cleared if not accessed since (free heap times constant)) but how does this translate into observable behaviour? For example, an application that hasn't been used overnight, will it have its caches flushed on the next Full GC? Is wall-clock time a good measurement?I'm still kind of hesitant to replace our hand-coded eviction mechanism by SoftRef's although they look sooo handy.

    We tried them in JDK 1.3.x but found them to be too unreliable, so we also wrote our own eviction mechanism. However, we have gone quite far since JDK 1.4.x and I believe that we can now use soft references for caches, as observed by my experiments. However, using them is still a bit non-deterministic, which can be a pain to have to tune when it gets to performance analysis.

    Heinz
  4. Am I too stupid?[ Go to top ]

    I would personally prefer using the GhostReference rather than the WeakReference because I can then match up the object that is being released (even though it is finalized) without having a separate map for it.

    I prefer having the object out of the way as fast as possible, especially if it's big. The actual data needed for cleanup is often much smaller. I'd rather create a subclass of WeakReference to hold that data and perform the cleanup.

    Thanks
    Matthias
  5. Phantom Refernence clearing[ Go to top ]

    However a straight PhantomReference - hmmm. Dangerous as well - you must remember to let go of your PhantomReference, plus call "clear()" on it, otherwise you develop a memory leak.

    Are you sure about that? The documentation suggests otherwise:

    "An object that is reachable via phantom references will remain so until all such references are cleared or themselves become unreachable"

    When I read that it suggests to me that either is necessary and sufficient to allow the referent to be GC'd.
  6. Am I too stupid?[ Go to top ]

    Does anyone have real world experience with SoftReferences out there?

    +1 - If you've built/worked on an application that makes use of this kind of caching, or weak references in general, please share. I've always been too nervous about the unpredictable nature of weakish references (at the system level) to try them myself on any projects. Questions I'm curious about:

    1. What VM are you using? Do you get the same results on other VMs (e.g. from people other than Sun)?
    2. Any insights into how long your objects are living, how much space is being used by the cache, etc.?
    3. What impact does use of this feature have on garbage collection in the system at large? Does the cost of creating and destroying an object (in general, or specifically one with weak references) increase due to the use of these features?
    4. How well does it work with the different GC schemes available within the VM, such as incremental or concurrent collectors?
    5. Does the VM seem to prioritize which soft references are released first? Are the objects freed in order of creation, recent usage, or for all intents and purposes unpredictably?
    6. What does the throughput curve of your application look like? Is it a nice one that increases linearly with load and then flattens out, or is it one of those ugly ones that heads south once it reaches peak sustainable load?
    7. What impact does use of this have on other processes in the system, and virtual memory management in general? Would the JVM use virtual memory rather than releasing soft references?

    Just curious. Thanks in advance to anyone with answers to any of this.
  7. Am I too stupid?[ Go to top ]

    1. What VM are you using? Do you get the same results on other VMs (e.g. from people other than Sun)?

    A lot of the early ref implementations on other JVMs were based on Sun's code too, so the issues in the Sun 1.3 JVMs also showed up in almost every other implementation, and may take longer to resolve as a result.

    2. Any insights into how long your objects are living, how much space is being used by the cache, etc.?

    In the Sun 1.3 JVM, the soft refs were all collected at the beginning of the next GC cycle. In a sufficiently busy application even with a big "new" heap, that could be a few seconds or less.

    3. What impact does use of this feature have on garbage collection in the system at large? Does the cost of creating and destroying an object (in general, or specifically one with weak references) increase due to the use of these features?

    My tests were on 1.3, but soft refs did seem impact GC times pretty significantly. I can't remember numbers, though. I'd also like to re-test with 1.4 and 1.5 to compare.
     
    4. How well does it work with the different GC schemes available within the VM, such as incremental or concurrent collectors?

    Most of these weren't very good at all on the Sun JVMs until 1.4 ..

    5. Does the VM seem to prioritize which soft references are released first? Are the objects freed in order of creation, recent usage, or for all intents and purposes unpredictably?

    In 1.3 they were all released, always.

    7. What impact does use of this have on other processes in the system, and virtual memory management in general? Would the JVM use virtual memory rather than releasing soft references?

    This is a very good question. You really really really want to keep your Java heap out of virtual space (i.e. off the disk) or your performance will degrade drastically. That's because of the access patterns exhibited by processes such as GC -- they're freakin' all over the place, and thus will be forcing non-stop page swaps. It's not like in C/C++ where you were (for the most part, if you were a good programmer) managing your own big memory blocks, and most of the hard-hit stuff was on the thread stack itself.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  8. Am I too stupid?[ Go to top ]

    Yeah, unfortunately my experience to most of these question is "it depends". Having worked with many a TopLink cache over the years, it's amazing at how varied behvaviors are with respect to how, when, if Soft and Weak references are handled. For example, I find JVMs on multi CPU machines are very agressive. It's almost like one CPU has nothing better to do but clear up Soft references. Although the "spirit" [sic] of SoftReferences is to reclaim them in some logical order, there's no requirement for it to happen. As the author notes, it's best to have some tests handy to see if your JVM behaves like you expect.

     - Don
  9. Am I too stupid?[ Go to top ]

    As the author notes, it's best to have some tests handy to see if your JVM behaves like you expect. - Don

    Yes, and even then it is rather risky. It is a pity that the contract for how they are supposed to behave was not tighter :(
  10. Am I too stupid?[ Go to top ]

    As the author notes, it's best to have some tests handy to see if your JVM behaves like you expect. - Don

    Yes, and even then it is rather risky. It is a pity that the contract for how they are supposed to behave was not tighter :(

    Or at least more tunable ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  11. Use for Phantom Reference[ Go to top ]

    The way you can make use of PhantomReferences is to build a Map associating the references to other Objects. When the PhantomReference shows up in the queue you use it to look up it's associated Object and do what ever needs to be done. One example would be to create a PhantomReference to a Connection Object in JDBC. When the PhantomReference shows up in the queue, you look up it's resources and clean up.

    I believe it's meant to be a superior alternative to the finalize() method.
  12. Use for Phantom Reference[ Go to top ]

    The way you can make use of PhantomReferences is to build a Map associating the references to other Objects. When the PhantomReference shows up in the queue you use it to look up it's associated Object and do what ever needs to be done...I believe it's meant to be a superior alternative to the finalize() method.

    Again - sorry for beating this to death - this is exactly how a weak reference works, too. A phantom reference actually keeps the object alive longer than weak reference does. What on earth is it good for?
  13. The way you can make use of PhantomReferences is to build a Map associating the references to other Objects. When the PhantomReference shows up in the queue you use it to look up it's associated Object and do what ever needs to be done...I believe it's meant to be a superior alternative to the finalize() method.
    Again - sorry for beating this to death - this is exactly how a weak reference works, too. A phantom reference actually keeps the object alive longer than weak reference does. What on earth is it good for?

    I was going to dispute this but, actually good question

    Cleared PRs would be the same as WRs; The real question is why aren't PR's cleared before they are enqueued since the object is unreachable anyway. My best guess is the .equals() although the source suggests that equals for ref types is not overrriden, so this is not an issue (is this true?). There are better ways to track identity which could use WeakRef just as well (below), and free the memory sooner.

    I'm not sure that GhostReference is a good idea since the unreachability is an important part of the PR logic (plus, unless the docs are wrong, the object has already been finalized!) - better to override finalize for those actions that require the object, or if only object identity is needed simply subclass PR/WR to add the identity hash code.

    Clive
  14. the object has already been finalized

    Maybe that's the key. With WR, I may run into concurrency issues with the finalizer, with PR the finalizer is definetely through.
  15. I read in the article:

    "The WeakHashMap has the keys embedded in weak references, so that would not be useful for building a cache."

    In javadoc for WeakHashMap, i read:

    "When a key has been discarded its entry is effectively removed from the map, so this class behaves somewhat differently than other Map implementations."

    Who's telling the truth?
  16. WeakHashMap not usefull for cache?[ Go to top ]

    I read in the article:"The WeakHashMap has the keys embedded in weak references, so that would not be useful for building a cache."In javadoc for WeakHashMap, i read:"When a key has been discarded its entry is effectively removed from the map, so this class behaves somewhat differently than other Map implementations."Who's telling the truth?

    I don't see any discrepancy here. When a reference is weakly reachable, the GC will normally reclaim it. When a reference is softly reachable, it's up to the JVM to decide whether to remove it. Normally, this will be when memory usage is high. If the VM determines it is running low on memory it can reclaim the softly-reachable Objects.

    The point is, as soon as a key in a WeakHashMap is not stringly referenced, it can disappear. That's not what you want for a cache. The point of a memory sensitive cache is to keep as many Objects in memory as possible until there is a memory crunch. A cache that doesn't cache anything is pretty useless.
  17. soft-reference key or value ?[ Go to top ]

    This is from the article:

    In newsletter 15, I described a SoftReference based HashMap, where the values were embedded in soft references. (The WeakHashMap has the keys embedded in weak references, so that would not be useful for building a cache.)

    What i understand from this: we need a map who's values, rather than it's keys, may disappear, when jvm needs memory.

    What you understand is: use soft rather than weak references, for implementing a cache.

    I get your point, i don't get the first point. After reading http://www.javaspecialists.co.za/archive/Issue015.html, it's still not clear why he's talking about referencing key <-> value.

    iow: what's the advantage (for cache implementation) of implementing the put(k,v) like this

       return hash.put(key, new SoftValue(value, key, queue));

    over implementing it like this (WeakHashMap)

       return hash.put(new SoftValue(key, key, queue), value);
  18. soft-reference key or value ?[ Go to top ]

    I get your point, i don't get the first point.
    ...
    iow: what's the advantage (for cache implementation) of implementing the put(k,v) like this&nbsp;&nbsp;&nbsp;return hash.put(key, new SoftValue(value, key, queue));over implementing it like this (WeakHashMap)&nbsp;&nbsp;&nbsp;return hash.put(new SoftValue(key, key, queue), value);

    OK, got it. The point is that in a WeakHashMap the keys are weak, not the values. What this means is that in order for a key-value pair in the Map to be retained, something else must reference the key.

    It's not obvious at first but when you are creating a memory sensitive cache, you care about whether there are references to the resources, not the keys. The keys are generally only have strong references for as long as it takes to look up the resource.

    Examples: Imagine you are using a WHM with Strings as keys where the Strings are in the String pool. Nothing will be automatically removed from the Map because the Strings will remain strongly referenced. Imagine a Map keyed off of Objects that are dynamic. Everytime I go to look up an item, I create an Object, grab the resource and go. I hold no reference to the key because I no longer need it. When the GC runs, it will clear all key-value pairs where the key is not being reference from the stack. This isn't helpful because that has very little relation to which resources are still in use. The GC'ing will essentially be random.

    As a side note, usually these caches will hold a number of strong references to subset of resources in the map. These references are updated regularly usually to reflect the most recently used resources.
  19. Thanks Heinz. Regardless of References that was a good demonstration of using generics.
  20. GhostReference[ Go to top ]

    The GhostReference class breaks the intent and contract of the PhantomReference. The reason that the referent is not reachable is (from the API):

    "In order to ensure that a reclaimable object remains so, the referent of a phantom reference may not be retrieved: The get method of a phantom reference always returns null."
  21. Code quality remark[ Go to top ]

    Thanks for the article.

    Just make sure you use "Collections.synchronizedSet (new HashSet())" instead of "new HashSet()" in the GhostReference class. ;)

    Vitaliy Shevchuk
  22. GhostReference does look like it's dangerously broken with respect to the PhantomReference contract. The standard idiom seems to be to subclass PhantomReference and keep a strong reference to the data required for cleanup. For example, you might have objects that need to maintain opaque handles to their associated native resources:

    import java.lang.ref.*;

    public class NativeResourceReference extends PhantomReference {

        private final long handle;

        public NativeResourceReference(Object referent, long handle, ReferenceQueue q) {
            super(referent, q);
            this.handle = handle;
        }

        public void destroy() {
            try {
                // use handle to do actual native resource reclamation
                System.out.println("destroy " + this);
            } finally {
                clear();
            }
        }

        public void clear() {
            try {
                System.out.println("clear " + this);
            } finally {
                super.clear();
            }
        }

        public String toString() {
            return (super.toString() + " [" + this.handle + "]");
        }
    }

    You could them set up some sort of registration and reclamation process:

    import java.lang.ref.*;
    import java.util.*;

    public class NativeResourceReclaimer implements Runnable {

        private final ReferenceQueue q;
        private final Collection refs;

        public NativeResourceReclaimer() {
            super();
            this.q = new ReferenceQueue();
            this.refs = Collections.synchronizedSet(new HashSet());
        }

        public Reference register(Object referent, long handle) {
            Reference ref = new NativeResourceReference(referent, handle, this.q);
            this.refs.add(ref);
            return ref;
        }

        public void run() {
            while (true) {
                try {
                    NativeResourceReference ref = (NativeResourceReference)this.q.remove();
                    ref.destroy();
                    this.refs.remove(ref);
                    System.out.println("remove " + ref);
                } catch (InterruptedException exc) {
                    Thread.currentThread().interrupt();
                }
            }
        }
    }

    Here's an example test app:

    import java.util.*;

    public class NativeResourceTest {

        public static void main(String[] args) {
            NativeResourceReclaimer reclaimer = new NativeResourceReclaimer();
            Thread reclaimerThread = new Thread(reclaimer, "Reclaimer");
            reclaimerThread.setDaemon(true);
            reclaimerThread.start();

            Timer memoryHog = new Timer(true);
            memoryHog.schedule(new MemoryHog(), 100, 100);

            for (int i = 0; i < 10; i++) {
                final long handle = i;
                Object referent = new Object() {
                        private final long h = handle;

                        public String toString() {
                            return (super.toString() + " [" + this.h + "]");
                        }

                        protected void finalize() throws Throwable {
                            try {
                                System.out.println("finalize " + this);
                            } finally {
                                super.finalize();
                            }
                        }
                    };
                reclaimer.register(referent, handle);
            }

            // don't exit main thread
            try {
                Thread.currentThread().join();
            } catch (InterruptedException exc) {
                exc.printStackTrace();
                System.exit(1);
            }
        }
    }

    For completeness, here's MemoryHog:

    import java.util.*;

    public class MemoryHog extends TimerTask {

        private static final int DEFAULT_CHUNK_SIZE = (1024 * 1024);

        private final int chunkSize;
        private final Collection chunks;

        public MemoryHog() {
            this(DEFAULT_CHUNK_SIZE);
        }

        public MemoryHog(int chunkSize) {
            super();
            this.chunkSize = chunkSize;
            this.chunks = new HashSet();
        }

        public void run() {
            this.chunks.add(new byte[this.chunkSize]);
        }
    }

    One of the key distinctions between using a finalizer and a phantom reference is that with phantom references you can control which thread (or set of threads) performs reclamation. In the case of the example, all reclamation occurs on the "Reclaimer" thread. As an alternative, you could dedicate a pool of threads to performing reclamation.

    Speaking of finalizers, their existence implies a reachability state that's not directly represented by a reference object. Here's a simplified view of reachability states (decreasing order of strength):

    Strongly reachable
    Softly reachable
    Weakly reachable
    Finalizer reachable (not weakly reachable and finalize not yet invoked)
    Phantom reachable
    Unreachable

    One of the interesting differences between soft/weak and phantom references is how they behave in the face of resurrection. Assume you have a referent that resurrects itself within its finalize method. Also assume you have a soft/weak and phantom reference to the same referent. Prior to resurrection, it's perfectly legal for the soft/weak reference to be cleared and enqueued (if it's registered with a reference queue). On the other hand, since resurrection will make the referent more than phantom reachable, the phantom reference will not be enqueued.