Sunday, October 16, 2011

EHCache - Write behind example

What is Write-Behind?

Write behind is asynchronous writing of data to the underlying database. Thus, when data is being written to the Cache, instead of writing simultaneously to the database, the cache saves the data into a queue and allows a background thread to write to the database later. 

This is a transformative capability because now you can:
  1. Move writes to the database at a particular time
  2. Use write coalescing, which means if there are multiple updates on the same key in the queue, only the latest one is considered
  3. Batch multiple write operations
  4. Specify the number of retry attempts in case of write failure
Here is an introductory video.

In order to write behind, you need to first implement the CacheWriter interface
/*
This class handles writing to the database or your backend persistence storage
*/
public class EhcacheWriteBehindClass implements CacheWriter {

 @Override
 public CacheWriter clone(Ehcache arg0) throws CloneNotSupportedException {
  throw new CloneNotSupportedException("EhcacheWriteBehindClass cannot be cloned!");
 }
 
 @Override
 public void delete(CacheEntry arg0) throws CacheException {
  // TODO Auto-generated method stub
  
 }
 
 @Override
 public void deleteAll(Collection arg0) throws CacheException {
  // TODO Auto-generated method stub
  
 }
 
 @Override
 public void dispose() throws CacheException {
  // You can close database connections here
  
 }
 
 @Override
 public void init() {
  // You can initialize the database here
  
 }
 
 @Override
 public void write(Element arg0) throws CacheException {
                // Typically you would write to your database here
  System.out.println("Write : Key is " + arg0.getKey());
  System.out.println("Write : Value is " + arg0.getValue());
 }
 
 @Override
 public void writeAll(Collection arg0) throws CacheException {
  // TODO Auto-generated method stub
  System.out.println("Write All");
 }

 @Override
 public void throwAway(Element arg0, SingleOperationType arg1,
   RuntimeException arg2) {
  // TODO Auto-generated method stub
  
 }
}

This class is instantiated by the CacheWriterFactory:

public class WriteBehindClassFactory extends CacheWriterFactory {

 public CacheWriter createCacheWriter(Ehcache arg0, Properties arg1) {
  return new EhcacheWriteBehindClass();
 }
}

Now register the factory in the ehcache.xml as follows:
 

           

              




In order to use this write behind functionality, your class would look like this:
 
public class EhcacheWriteBehindTest {

 public static void main(String[] args) throws Exception {
  // pass in the number of object you want to generate, default is 10
  int numberOfObjects = Integer.parseInt(args.length == 0 ? "100": args[0]);
  System.out.println(numberOfObjects);
  //create the CacheManager
  CacheManager cacheManager = CacheManager.getInstance();
  //get a handle on the Cache - the name "myCache" is the name of a cache in the ehcache.xml file
  Cache myCache = cacheManager.getCache("writeBehindCache");
  
  //iterate through numberOfObjects and use the iterator as the key, value does not matter at this time
  for (int i = 0; i < numberOfObjects; i++) {
   String key = new Integer(i).toString();
   if (!checkInCache(key, myCache)) {
    //when putting in the cache, it is as an Element, the key and the value must be serializable
    myCache.putWithWriter(new Element(key, "Value"));
    System.out.println(key + " NOT in cache!!!");
   } else {
    System.out.println("Put with writer ... value1");
                               //note, we use the putWithWriter method and not the put method
    myCache.putWithWriter(new Element(key, "Value1"));
   }
  }
  
  while (true) {
   Thread.sleep(1000);
  }
 }
 
 //check to see if the key is in the cache
 private static boolean checkInCache(String key, Cache myCache) throws Exception {
  Element element = myCache.get(key);
  boolean returnValue = false;
  if (element != null) {
   System.out.println(key + " is in the cache!!!");
   returnValue = true;
  }
  return returnValue;
 }
}

Thats it! For a detailed explanation of the configurations involved have a look at this.

The limitation of this is that if your JVM goes down, your write-behind queue is lost. In order to avoid this you can used clustered Terracotta, which uses the Terracotta Server Array. In this case the queue is maintained at the Terracotta Server Array which provides HA features. If one client JVM were to go down, any changes it put into the write-behind queue can always be loaded by threads in other  clustered JVMs, therefore will be applied to the database without any data loss. 


Terracotta Server Array is an enterprise feature and can be configured extremely easily. You can download a trial version from here


The only change you need to make in this app to make it clustered is in the ehcache.xml. You ehcache.xml would now look like this:



 
           
              

            
   

 

terracottaConfig url="localhost:9510" is where your Terracotta Server Array runs.

Wednesday, August 17, 2011

How to keep the Database in sync with your cache?


There are few different ways to achieve this. You can put the onus on the underlying cache to fetch the data periodically or when it determines that the data is stale. Secondly you can put the onus on the underlying database to “push” updates periodically or when the data is updated.

Read Heavy use cases

Cache -> DB


1.     The most straightforward way is to set the Time to Live (TTL) and Time to Idle (TTI) on the cache so the data will expire periodically. The next request will result in a cache miss and your application will pull the current value from the underlying database and put it into the cache.
Few things to note here are:
a.     There might be a window when data is in cache and is not in synch with the underlying database
b.     A cache miss can be interpreted as a performance hit.

This is called read-through caching.



1.     An alternate approach is to perform cache updates or invalidation periodically - use a batch process (could be scheduled using open source Quartz) running in periodic intervals to either invalidate the cache or update the cache. You could do this by using SelfPopulating Ehcache.


DB->Cache


1.     You could also transfer the synching onus to the underlying database itself. For ex. Oracle AQ provides a way to register a call back when any database updates happen. This can be leveraged to either invalidate or update the cache store.

2.     Alternatively you could also use middleware technologies like GoldenGate, JMS to capture DB changes when they occur to "push" notifications into the Memory Store.

 

Write Heavy use cases


1.     There are scenario’s that require frequent updates to stored data. Every update to the cached data must invoke a simultaneous update to the database at the same time. This is the Write-through feature provided by Ehcache. However, updates to the database are almost always slower, so this slows the effective update rate to the cache and thus the performance in general. When many write requests come in at the same time, the database can easily become a bottleneck or, even worse, be killed by heavy writes in a short period of time. The Write-behind feature provided by Ehcache allows quick cache writes with ensured consistency between cache data and database. The idea is that when writing data into the cache, instead of writing the data into database at the same time, the write-behind cache saves the changed data into a queue and lets a backend thread to do the writing later. Therefore, the cache-write process can proceed without waiting for the database-write and, thus, be finished much faster. Any data that has been changed can be persisted into database eventually. In the mean time, any read from cache will still get the latest data.

In case of Terracotta, the Terracotta Server Array maintains the write-behind queue. A thread on each JVM checks the shared queue and save each data change left in the queue.

1     Finally you could also make you application update the cache and DB simultaneously. It is advisable to use transactions to perform this in the following manner:
a.     Start a transaction
b.     Update the database
c.      Update the cache
d.     Commit the transaction
Some points to remember are that your update code is directly aware of the cache and there is a performance impact since your update latency reflects both DB and cache update time.