Distributed .NET Part IV – Sql Data Services (SDS) + Memcached
January 20, 2009SDS is a nice system. With all of its bells and whistles we do get some pretty high performance numbers coming from the system. There are times though where we want to implement a caching policy to avoid hitting SDS with searches that are spawned from the UI. In the following post we’ll cover how to get memcached and SDS working together to provide an already fast query engine the facility to perform even faster look ups.
First lets cover the basics, SDS queries do not have a like operator for textual queries. However, this doesn’t stop us from faking it. If we take an input string, for example: “Apple” and we wanted to find any name in our SDS container that started with “Apple” we could create an artificial ceiling for searching in SDS by mutating the last character into the next logical character. In our “Apple” example this would create an artificial search ceiling at “Applf” as “Applf” is one more up than we care. The SDS query would then look something like:
from e in entities where e["LegalName"] >= "Apple" && e["LegalName"] < "Applf" orderby e["LegalName"] select e
The whole goal of this is to get an intelligent predictive auto-text when searching for persons, places, or things (ala google finance). So you can imagine as a user is typing "a", "ap", "appl" we are updating a auto-fill text box (using prototype.js + scriptaculous) with the closest matches.
C# Extension Method for Search Boundary
Below is a very, very basic search boundary creator that extends string. So in our code we could simply use this on any string we wanted to create the search boundary on.
// Assert.AreEqual("applf", "apple".ToSearchBoundary());
const string letters = " 0123456789abcdefghijklmnopqrstuvwxyz";
/// <summary>
/// Finds the search boundary for the specified input by
/// moving the last character to the next logical character in the string.
/// </summary>
/// <param name="input">The input.</param>
/// <returns></returns>
/// <remarks>
/// <code>
/// 'apple' => 'applef'
/// 'apple ' => 'apple z'
/// 'z' => 'z '
/// '0' => '1'
/// '9' => 'a'
/// '9a' => '9b'
/// </code>
/// </remarks>
public static string ToSearchBoundary(this string input)
{
if (String.IsNullOrEmpty(input)) return "";
if (input.Trim().Length <= 0) return "";
// make lowercase
input = input.ToLower();
// grab last letter (appl'e')
var letter = input.Substring(input.Length - 1);
// store root of search
var root = input.Substring(0, input.Length - 1);
// if last letter is space preserve it and just go with z
if (letter.Equals(" ")) return input + "z";
// if last letter is unknown (maybe foreign language?) return 'appl?'
if (false == letters.Contains(letter)) return root + "z";
// make (appl'e' => appl'f')
var idx = letters.IndexOf(letter);
// if last character update and add '0' so applez => 'applez '
if (idx >= (letters.Length - 1))
return input + " ";
return root + letters.Substring(idx+1, 1);
}
Distributed Caching
Of course we don't want to keep asking SDS for this information! I'm sure hot news and hot stocks will likely get searched on frequently by our users. So to save time (both the server's and user's time) we want to cache our results in memory for a fixed amount of time.
Extension method for implementing a SDS cache-able query.
/// <summary>
/// Provides a means to perform a cache-able query that will cache the query
/// using memcached.
/// </summary>
/// <param name="service">The service.</param>
/// <param name="scope">The scope.</param>
/// <param name="query">The query.</param>
/// <returns></returns>
public static List<entity> QueryCachable(this ISitkaSoapService sds, Scope scope, string query)
{
// create key for our query
var key = query.MD5Hash();
// check memcached for the specified client
var cache = BeIT.MemCached.MemcachedClient.GetInstance();
// return from cache
var o = cache.Get(key);
if (null != o)
return o as List<entity>;
var results = sds.Query(scope, query);
cache.Set(key, results, new TimeSpan(0, 1, 0));
return results;
}
/// <summary>
/// Performs an MD5 cache on the specified string.
/// </summary>
/// <param name="input">The input.</param>
/// <returns>Returns the MD5 of the specified string.</returns>
public static string MD5Hash(this string input)
{
System.Security.Cryptography.MD5CryptoServiceProvider x = new System.Security.Cryptography.MD5CryptoServiceProvider();
byte[] bs = System.Text.Encoding.UTF8.GetBytes(input);
bs = x.ComputeHash(bs);
System.Text.StringBuilder s = new System.Text.StringBuilder();
foreach (byte b in bs)
{
s.Append(b.ToString("x2").ToLower());
}
string password = s.ToString();
return password;
}
Example
var list = this._service.QueryCachable(new Scope
{
AuthorityId = _authorityId,
ContainerId = _containerId
}, query);
Of course there are limitations, SDS only returns the first 500 rows of a query. So be careful with this. I would argue that 500 rows is more data than a user typically needs to see, but YMMV.
Extra Credit - Weighted Searching
For extra credit we can apply ratings to those stocks that begin with certain characters based on how frequently users "discover" that stock. If we can extrapolate that users who type in "A" are 10% of the time looking for Apple stocks we can give apple higher weight when a person initially types in "A" and show it sooner in the predictive text; even though it comes alphabetically after stocks no one is interested in.
For this we would need to track a users session, and then report when a user found a stock and map that to the starting character. If you really wanted to get fancy you could have new users start off with a global (or geo-locational) based weighting of stocks.
So, if you have, on average people in Florida trying to find "Apple", and people in New York attempting to find "Applied Dynamics" we could tailor the predictive text to the person searching and their location.
Of course this gets into search theory and predictive reasoning systems which is beyond the scope of this post.