I’m working on a Redis-like extension to HTML5 Storage objects (localStorage and sessionStorage). When thinking about implementing key expiry, there are two primary solutions: timer and data based implementations.
Timer-based Expiration
As John Resig brilliantly explains, when you set a timeout using setTimeout()
, there is no guarantee that the callback (the function to perform after the timeout is fired) gets executed immediately. I won’t rehash his explanation, but async events (like the timeout) are processed in an event loop. When a timeout fires, the callback gets queued up for processing. This delay is at least 4ms on most browsers, but could be much longer.
It’s not difficult to set up the expiry of data. For HTML5 Storage, in particular, we have a method for deleting data, removeItem(key)
. To expire data after delay milliseconds, just do:
var expire = function (key, delay) { setTimeout(function () { localStorage.removeItem(key); }, delay);};
This basically adds the removeItem
call to the execution stack, after delay milliseconds have elapsed. This does not mean that the callback is executed. Hence, the removal of that key could happen some milliseconds after the timer has fired. This creates the opportunity for race conditions; i.e., another process can access data that should be expired mainly due to the delay between the timeout firing and the processing of the callback.
It should be stated that HTML5 Storage is rather safe from race-conditions, in that, the only way for multiple processes to share data is for multiple tabs/windows of a single browser to be pointed to the same domain. This means that if you have multiple Google Chrome tabs open to www.google.com
, then all of those tabs share a single (file-based) storage object. If all of the processes are writing to that shared storage at the same time, then we can have unpredictable behavior.
Data-based Expiration
This solution was brought to my attention in a conversation with Josh Levine, VP of Engineering at Shapeways. He mentioned that on top of timers being inaccurate (the aforementioned argument), a single process can hog the CPU and prevent other processes from setting or firing their timers.
To overcome this limitation, we can achieve expiration by forcing processes to check the time to live (TTL) of a key before trying to use its value. Instead of relying on an async removal of the key (timer-based method), each process is responsible for expiring keys
To be a little bit more precise, if a process wanted to expire a key, that process would attach the current time (in ms) and a delay (in ms) to the value of that key. We’ll see why that information is helpful.
With the current time (the time at which the expiration is being set) and the delay, we can then compute the time to live (TTL) of that key. The time to live is basically a count of how many milliseconds the key has remaining in its lifetime:
ttl = (creation_time + delay) - now
If a process grabs the value of a key that has a set expiry, it needs to check if the TTL is greater than 0; i.e., whether or not the key should be expired. If the TTL is less than zero, then the key has expired, so the process should remove it from storage
Conclusion
Whereas a CPU-hogging process would break our timer-based expiration, this data-based solution works since the CPU-hog is forced to check the TTL of the desired keys and expire/remove them accordingly.
The data-based solution works better at the expense of a few extra reads and seems to eliminate the majority of the problems inherent in the timer-based method.