Wednesday, August 17, 2011

How to keep the Database in sync with your cache?


There are few different ways to achieve this. You can put the onus on the underlying cache to fetch the data periodically or when it determines that the data is stale. Secondly you can put the onus on the underlying database to “push” updates periodically or when the data is updated.

Read Heavy use cases

Cache -> DB


1.     The most straightforward way is to set the Time to Live (TTL) and Time to Idle (TTI) on the cache so the data will expire periodically. The next request will result in a cache miss and your application will pull the current value from the underlying database and put it into the cache.
Few things to note here are:
a.     There might be a window when data is in cache and is not in synch with the underlying database
b.     A cache miss can be interpreted as a performance hit.

This is called read-through caching.



1.     An alternate approach is to perform cache updates or invalidation periodically - use a batch process (could be scheduled using open source Quartz) running in periodic intervals to either invalidate the cache or update the cache. You could do this by using SelfPopulating Ehcache.


DB->Cache


1.     You could also transfer the synching onus to the underlying database itself. For ex. Oracle AQ provides a way to register a call back when any database updates happen. This can be leveraged to either invalidate or update the cache store.

2.     Alternatively you could also use middleware technologies like GoldenGate, JMS to capture DB changes when they occur to "push" notifications into the Memory Store.

 

Write Heavy use cases


1.     There are scenario’s that require frequent updates to stored data. Every update to the cached data must invoke a simultaneous update to the database at the same time. This is the Write-through feature provided by Ehcache. However, updates to the database are almost always slower, so this slows the effective update rate to the cache and thus the performance in general. When many write requests come in at the same time, the database can easily become a bottleneck or, even worse, be killed by heavy writes in a short period of time. The Write-behind feature provided by Ehcache allows quick cache writes with ensured consistency between cache data and database. The idea is that when writing data into the cache, instead of writing the data into database at the same time, the write-behind cache saves the changed data into a queue and lets a backend thread to do the writing later. Therefore, the cache-write process can proceed without waiting for the database-write and, thus, be finished much faster. Any data that has been changed can be persisted into database eventually. In the mean time, any read from cache will still get the latest data.

In case of Terracotta, the Terracotta Server Array maintains the write-behind queue. A thread on each JVM checks the shared queue and save each data change left in the queue.

1     Finally you could also make you application update the cache and DB simultaneously. It is advisable to use transactions to perform this in the following manner:
a.     Start a transaction
b.     Update the database
c.      Update the cache
d.     Commit the transaction
Some points to remember are that your update code is directly aware of the cache and there is a performance impact since your update latency reflects both DB and cache update time.







5 comments:

  1. As I mentioned on my other comment we were evaluating ordered writing on a clustered environment. However, I do not understand how can we use transactional write-through in a write-heavy use-case. Do you not believe that it will impact the performance so heavily in a write-heavy use case that my application will actually suffer by using Terracotta, or any other cache?

    ReplyDelete
  2. Hi Ankita,

    Depending on your write tps, your performance will definitely be impacted if you do a synchronous replication with Terracotta and a database. If you use transactions, performance will be hit, not due to Terracotta, but due to nature of 2PC and commit speed in databases. This can be alleviated by adding more stripes to your Terracotta server array layer, or use our asynchronous write-behind methodology (which is used by most of our customers).

    Sourabh

    ReplyDelete
  3. Why will I have lower write tps in write-heavy use-case? With the proposed architecture, I am forced to use 2PC because of introduction of another layer not because my business is benefiting from it. How much performance drop have you observed when using transactional updates to DB under write heavy use-cases (involving more than 100k updates per hour )

    ReplyDelete
  4. As I said, if you use transactional replication with BigMemory and database, then it will affect your performance and tps due to the nature of transactions and databases itself.

    It is hard to quantify how much performance drop there will be since it is use case dependent. If you are very concerned about performance i would suggest having a look at write-behind http://ehcache.org/documentation/recipes/writebehind

    BigMemory write behind guarantees your data will make it to the database at least once, and is never lost.

    ReplyDelete
  5. Just re-read your post, you mentioned 100K updates/hour and I read it as 100K updates/sec. This means about 30 tps, which is not much. If you would like some technical help to test BigMemory in your use case, please use the contact link in the Terracotta website and someone will reach out.

    ReplyDelete