“Thought Experiment” – High Performance Distributed Cache

December 18, 2009

I have been thinking a lot about the current structure of data and storage locations across an enterprise and have been feeling like something was wrong with the picture I had formed.

First let me explain that I have come to love simplicity. There is nothing more beautiful than writing 10 lines of code that does something so simple and so native that you have to look at it twice just because it looks too simple to be true.

There is, however, an area that fells like a small battle is being waged – distributed cache. This problem arises after 90% of your read operations are humming along and you want your cache latency to be very small – say within 5 seconds. You have an issue because your legacy systems run on mainframes, you have older VB applications writing and reading data directly from a database and mainframe. There is no governance around information and any application it seems can go in an update sources of data at any time.

The above constraints mean that you have no means of receiving notifications of events or data actions which mean that you can’t wire into a BUS/MSMQ drop to get notified of informational changes. You also can not update these applications because nobody really knows how they work and the risk is seen to be too high by management (and they are right).

The “Perfect” solution

In a perfect would we would be running cache as a first level tier. Your SOA (REST) interfaces would be communicating with some enterprise bus (BizTalk, ApacheMQ, etc) and that bus would exist to broker to any legacy tier and database tier that existed.

Lets trace a call in this architecture…

  1. A request is made from the UI/APP to update a contact’s address
  2. The request is forwarded to a REST interface which delegates the call to a ESB
  3. The ESB picks up the request and forwards it to the primary implementation (see parallel branch)
  4. The ESB returns back to the REST/UI layer success

[Parallel Branch]

  • Thread 1: The ESB identifies a simple object update and saves the data to distributed cache
  • Thread 2: The ESB in a separate instance/thread drops a message into a transactional MSMQ with reliable messaging and ordered delivery

The reason this is so perfect is there is barely an IO hit that was incurred because of this read/write operation. All read/write operations were primarily done in memory (at the moment fastest). We get fail over in case our cache goes down because of the MSMQ drop.

The primary hit from web/app layer would hit the service broker in thread 2 (transaction MSMQ hits disk on target machine) but this would be executing well after the result was actually received. Updating data would primarily be as fast as network/CPU/memory allowed and that data would immediately be updated in cache and then a MSMQ message dropped into some further back end for a physical read/write to disk(s).

This implementation is also scalable as distributed cache scales extremely well and the ESB would likely be network/CPU bound which could be solved by horizontal scaling (to no known limit).

In a worst case scenario where your data center goes down all that would happen is your ESB would continue to write ordered messages to MSMQ and the cache would still be running. You’ll just have to hope when you bring the system back up that the back end can keep up :) . The business side wouldn’t even notice (provided 99.9% of data is in cache and written/read from as primary). It’s an amazing architecture – one that is used by facebook, youtube, amazon, etc. Very high performance and very scalable. However, I don’t exist in a perfect world.

The “Not-So-Perfect” Solution

With the constraints in place our primary issue is that of notification of data updates. The primary store is still treated as the database by 99% of the applications housed in your enterprise and updates are made in no structured manner. The closest it comes to structured is the database tables themselves. And to be honest, that is good enough for us.

We can keep the ESB like we have before at the cost of complexity (see 99% of other applications wont use them but they each have a lifespan of about 5 years). The core issue is that of cache integrity. If data has changed since last cache than augmenting data at the cache level would result in potential issues when writing back to the store.

This can be avoided by introduction of a “Cache Invalidation Service”. The primary goal of this service is to receive REST/SOAP based messages that are subscribed to from an ESB. Since 99% of applications write/read from SQL instances we can work with SQL Server 2005/2008 query notifications to flush our cache of specific items when they are updated.

Notice I said FLUSH not UPDATE! We wouldn’t want to perform a read “just because”. We can delay the read until somebody actually needs it which allows for higher-throughput.

At the end of the day

So this will work but there is something bothering me… This all seems far too much like a well defined pattern and this should already be solved by a vendor. The hardest part is identifying the objects throughout the enterprise and then exposing them via REST while leveraging distributed caching. The actual infrastructure after that is easy.

I can imagine in time somebody might come up with a tool/app/service that allows mapping between physical layers to objects and then it would do all the heavy lifting. However, caching is still something that is hand-crafted, which I don’t like handcrafted solutions.

Useful Links for Query Notification

Understanding When Query Notifications Occur

Planning for Notifications

Using SqlNotificationRequest to Subscribe to Query Notifications

ASP.NET Sql Connections over VPN w/ Trust

September 29, 2009

Overview

In a follow up to my previous post on VPN connections using SQL management studio I should say that the previous workaround still presented issues if you were doing home development.

Suffice to say, if you were trying to use trusted sql connections and were running from your home PC you likely got an error about trusted connections when connected through VPN.

I had thought there was no real work around for this but a couple moments of sheer frustration have helped put more importance on me getting this resolved. So I present the simple (although frankly idiotic) steps to get trusted connections with a .NET application working over VPN.

Steps To Follow

  1. Change your computer name to match the domain name you are connecting to; for example if your domain at work is somecompany\yourusername – rename your computer somecompany.
  2. Create a user account on your home PC that matches the username you have at work, following the above example we create a username called “yourusername”. Also use the exact same password you use at work.
  3. After restarting your computer sign into your local PC using the newly created username.
  4. Connect to VPN as per normal

You should now be able to just open SQL management studio and/or visual studio and have it “just work” like it does at work.

SQL Server 2008/2005 Partition Table

September 9, 2009

A quick video demo of using SQL partitioning for the product_price_fact table we created in the previous article. Below is the video – I hope you enjoy it!


video