Distributed .NET Part IV – Sql Data Services (SDS) + Memcached

January 20, 2009
This article is outdated as the new Azure RTP is in play. For the latest and greatest on working with Azure Table Data Storage or Azure SQL Services go to the Azure category.

SDS is a nice system. With all of its bells and whistles we do get some pretty high performance numbers coming from the system. There are times though where we want to implement a caching policy to avoid hitting SDS with searches that are spawned from the UI. In the following post we’ll cover how to get memcached and SDS working together to provide an already fast query engine the facility to perform even faster look ups.

First lets cover the basics, SDS queries do not have a like operator for textual queries. However, this doesn’t stop us from faking it. If we take an input string, for example: “Apple” and we wanted to find any name in our SDS container that started with “Apple” we could create an artificial ceiling for searching in SDS by mutating the last character into the next logical character. In our “Apple” example this would create an artificial search ceiling at “Applf” as “Applf” is one more up than we care. The SDS query would then look something like:

from e in entities where e["LegalName"] >= "Apple" && e["LegalName"] < "Applf" orderby e["LegalName"] select e

The whole goal of this is to get an intelligent predictive auto-text when searching for persons, places, or things (ala google finance). So you can imagine as a user is typing "a", "ap", "appl" we are updating a auto-fill text box (using prototype.js + scriptaculous) with the closest matches.

C# Extension Method for Search Boundary

Below is a very, very basic search boundary creator that extends string. So in our code we could simply use this on any string we wanted to create the search boundary on.

// Assert.AreEqual("applf", "apple".ToSearchBoundary());
const string letters = " 0123456789abcdefghijklmnopqrstuvwxyz";

/// <summary>
/// Finds the search boundary for the specified input by
/// moving the last character to the next logical character in the string.
/// </summary>
/// <param name="input">The input.</param>
/// <returns></returns>
/// <remarks>
/// <code>
/// 'apple' => 'applef'
/// 'apple ' => 'apple z'
/// 'z' => 'z '
/// '0' => '1'
/// '9' => 'a'
/// '9a' => '9b'
/// </code>
/// </remarks>
public static string ToSearchBoundary(this string input)
{
    if (String.IsNullOrEmpty(input)) return "";
    if (input.Trim().Length <= 0) return "";

    // make lowercase
    input = input.ToLower();

    // grab last letter (appl'e')
    var letter = input.Substring(input.Length - 1);
    // store root of search
    var root = input.Substring(0, input.Length - 1);

    // if last letter is space preserve it and just go with z
    if (letter.Equals(" ")) return input + "z";

    // if last letter is unknown (maybe foreign language?) return 'appl?'
    if (false == letters.Contains(letter)) return root + "z";

    // make (appl'e' => appl'f')
    var idx = letters.IndexOf(letter);

    // if last character update and add '0' so applez => 'applez '
    if (idx >= (letters.Length - 1))
        return input + " ";
    return root + letters.Substring(idx+1, 1);
}

Distributed Caching

Of course we don't want to keep asking SDS for this information! I'm sure hot news and hot stocks will likely get searched on frequently by our users. So to save time (both the server's and user's time) we want to cache our results in memory for a fixed amount of time.

Extension method for implementing a SDS cache-able query.

/// <summary>
/// Provides a means to perform a cache-able query that will cache the query
/// using memcached.
/// </summary>
/// <param name="service">The service.</param>
/// <param name="scope">The scope.</param>
/// <param name="query">The query.</param>
/// <returns></returns>
public static List<entity> QueryCachable(this ISitkaSoapService sds, Scope scope, string query)
{
	// create key for our query
	var key = query.MD5Hash();

	// check memcached for the specified client
	var cache = BeIT.MemCached.MemcachedClient.GetInstance();

	// return from cache
	var o = cache.Get(key);
	if (null != o)
		return o as List<entity>;

	var results = sds.Query(scope, query);
	cache.Set(key, results, new TimeSpan(0, 1, 0));
	return results;
}

/// <summary>
/// Performs an MD5 cache on the specified string.
/// </summary>
/// <param name="input">The input.</param>
/// <returns>Returns the MD5 of the specified string.</returns>
public static string MD5Hash(this string input)
{
	System.Security.Cryptography.MD5CryptoServiceProvider x = new System.Security.Cryptography.MD5CryptoServiceProvider();
	byte[] bs = System.Text.Encoding.UTF8.GetBytes(input);
	bs = x.ComputeHash(bs);
	System.Text.StringBuilder s = new System.Text.StringBuilder();
	foreach (byte b in bs)
	{
		s.Append(b.ToString("x2").ToLower());
	}
	string password = s.ToString();
	return password;
}

Example

var list = this._service.QueryCachable(new Scope
{
	AuthorityId = _authorityId,
	ContainerId = _containerId
}, query);

Of course there are limitations, SDS only returns the first 500 rows of a query. So be careful with this. I would argue that 500 rows is more data than a user typically needs to see, but YMMV.

Extra Credit - Weighted Searching

For extra credit we can apply ratings to those stocks that begin with certain characters based on how frequently users "discover" that stock. If we can extrapolate that users who type in "A" are 10% of the time looking for Apple stocks we can give apple higher weight when a person initially types in "A" and show it sooner in the predictive text; even though it comes alphabetically after stocks no one is interested in.

For this we would need to track a users session, and then report when a user found a stock and map that to the starting character. If you really wanted to get fancy you could have new users start off with a global (or geo-locational) based weighting of stocks.

So, if you have, on average people in Florida trying to find "Apple", and people in New York attempting to find "Applied Dynamics" we could tailor the predictive text to the person searching and their location.

Of course this gets into search theory and predictive reasoning systems which is beyond the scope of this post.

Distributed .NET Part III – Intro to Sql Data Services (SDS)

January 19, 2009
This article is outdated as the new Azure RTP is in play. For the latest and greatest on working with Azure Table Data Storage or Azure SQL Services go to the Azure category.

While I watch an anime lets go over what it takes to get functioning with SDS. I’m still a newbie when it comes to SDS but here are a couple of things we should know before we start:

  1. SDS is new and as such doesn’t have a complete feature set
  2. SDS is implemented using Key/Value (Hashtable) objects which may lend itself well to memcached
  3. SDS queries return only the first 500 records and allow you to do paging to get any other records you need
  4. SDS does not include transaction support

Transactional support seems prudent but for the moment, since we are in rapid prototype mode we can forget both. I have the general idea that we keep SDS as a store, and use the properties of entities for two purposes; more on this later.

What is SDS?

SDS stands for SQL Data Services and is basically cloud computing 101. When you do cloud computing you need a place to store stuff and you really don’t care what they run on the back-end, what operating systems, what hardware, who they employ to run it, what patch they have. The whole point is to get your storage environment out from your local branch and get it into Microsoft hands. There is some major cost savings to you at this point because you no longer have to buy machines, bandwidth, server operating systems, implement SDS, configuration management, auditing, security.

Don’t get me wrong, this wont be free. I don’t think the price points have been announced and they are based on usage (think hosting providers). However, where I work we could really save some money by removing the overhead of running our own database department + server operations department.

What does SDS do?

SDS provides developers the ability to define a flexible schema for data storage and retrieval. Basically this means that we (developers) need not concern ourselves with relational modeling techniques. The only things we worry about is authorities, containers, and entities.

How you work with SDS

First this article assumes you have requested and received access to SDS services. SDS is a limited CTP so you have to request access to it. I’d suggest going with windows azure and getting access to it that way. Once you get your registration code you can go to the portal setup for Azure and once you are done you can now create a new project.

Step 1 – Enter Registration Code
step1-thumbnail

Step 2 – Create Solution
step2-thumbnail

Step 3 – Access Credentials
step3-thumbnail

Step 4 – Update Credentials
step4-thumbnail

Entity
An entity is the smallest data object recognized by SDS. An SDS entity is a hashtable-like object which allows you to define a flexible model to work with. This is best explained in code.

var ety = new Entity
{
	Id = "0001",
	Kind = "customer",
	Properties =
	{
		{ "FirstName", "Terrance" },
		{ "LastName", "Snyder" },
		{ "DateOfBirth", new DateTime(1984, 06, 28) }
	}
};

In our example above we have an entity that can be stored in SDS and queried. SDS supports a limited number of .NET types (only simple scalar types are supported on properties).

# string
# binary
# Boolean
# decimal
# dateTime

This also means entities don’t support collections of objects either which is a pain, and makes transactions an even more painful sore point; but I’ll show you how to work around that later.

Containers
A container serves to group our entities together. A container is also the largest domain for search and update. Because each authority is in one geo-location, all containers in an authority are located in the same data center. Containers house our entities and serve to distribute them across a surface (also called fabric).
Authority
An authority is a means of defining a domain in which to work in. So in a typical sense I believe this roughly equates to an organization. So if you have a business called “Tokyo Industrial Design” you would create an authority called tokyoindustrialdesign once you have completed your SDS enrollment. When you create your authority you get a microsoft DNS entry used to connect to your authority. So in our above example we get:
tokyoindustrialdesign.data.database.windows.net

SDS WCF Service Guide

To access your SDS via WCF you can add a service reference in Visual Studio 2008 to the following URI:

https://data.database.windows.net/v1/

After you have added a reference you’ll need to make sure you can authenticate to it. So if you followed the steps before you have a username and password to use when connecting via BasicHttp. And if you read my post on how to store basic credentials in WCF configuration by using a WCF endpoint extension you can do that now too.

Create A Authority

var client = new SitkaSoapServiceClient();
client.Create(new Scope { }, new Authority { Id = "<authority_id>" });

If you get an error it is because the authority has already been created and you’ll have to pick another one, or you goofed on your credentials.

Once you have an authority created you can now query it using the SSDS Explorer or using your browser, in our example:

https://<authority_id>.data.database.windows.net/v1/q='from e in entities select e'

Whats really nice is you now have a DNS name mapped to a windows.net domain, it makes you look that much more professional.

Create A Container

Now that we created our authority lets go ahead and create a container for our first entities.

var client = new SitkaSoapServiceClient();
client.Create(new Scope { AuthorityId = "<authority_id>" }, new Container { Id = "<container_id>" });

After you created your container you can now see it in your browser using the below template:

https://<authority_id>.data.database.windows.net/v1/<container_id>

Create A Entity

Now that we have an authority and an entity we can now add, remove, update, and search for entities in SDS!

var ety = new Entity
{
	Id = "1",
	Kind = "person",
	Properties = new Dictionary<string,object>
	{
		{ "FirstName", "Terrance" },
		{ "LastName", "Snyder" },
		{ "DateOfBirth", new DateTime(1984, 06, 28) }
	}
};
client.Create(new Scope { AuthorityId = "<authority_id>", ContainerId = "<container_id>" }, ety);

And through the magic of SDS we can now query for this information, either through code or through SSDS explorer (shown below).

ssds_explorer-thumbnail

var client = new SitkaSoapServiceClient();
var results = client.Query(new Scope { AuthorityId = "<authority_id>", ContainerId = "<container_id>" },
	@"from e in entities where e[""FirstName""] == ""Terrance""");
Assert.AreEqual("Terrance", results[0].Properties["FirstName"]);
Assert.AreEqual("Snyder", results[0].Properties["LastName"]);
Assert.AreEqual(new DateTime(1984, 6, 28), results[0].Properties["DateOfBirth"]);

LIKE operator support in SDS

When I originally played with SDS it didn’t seem like I could do LIKE operations on the nested data for my objects. This turns out not to be true. We can simulate like operations by performing them via >= and <=. So if I wanted to find any person with a legal name starting with "b" the below query would return it.

from e in entities where e["LegalName"] >= “B” && e["LegalName"] < "C" select e

So when I get around to implementing a type ahead feature for stocks, equities, mutual funds, etc I’ll likely end up using a query that take the users original input and creates a range by taking the last character typed and bumping it up one.

from e in entities where e["LegalName"] >= "Apple" && e["LegalName"] < "Applg" select e

So in the below example I get my nice range by taking the last character 'e' and create the bounded range by shifting it to a 'g'. Should be a simple operation to take the input someone enters into a UI and then translate that last char to the next logical char. Numbers and letters should be included, for example searching for 3com may need to do the following:

User types: '3'
from e in entities where e["LegalName"] >= "3" && e["LegalName"] < "4" select e
User types: '3c'
from e in entities where e["LegalName"] >= "3c" && e["LegalName"] < "3d" select e
User types: '3co'
from e in entities where e["LegalName"] >= "3co" && e["LegalName"] < "3cp" select e
User Types 'Apple'
from e in entities where e["LegalName"] >= "Apple" && e["LegalName"] < "Applf" orderby e["LegalName"] select e

Transactional Support

While SDS doesn't support transactions (yet) there are ways around this issue. The one that we could use is one in which we treat entities a bit differently than what someone would do at first glance. Traditionally we see entities as being "table" based. And, indeed, entities don't support much functionality other than being a kind of active record patten. However, if entities could somehow be treated as more like objects modeling our domain we could isolate the need for transaction support in SDS.

Take for example a customer. In a relational model we would have a core customer table, then a emails table, and a reference table to combine customers and emails. And we could accomplish this in SDS using three containers and simulate a table based structure.

However, in the above situation we would have to keep track of transactions and do compensating transactions manually. If we removed a customer we would also have to ensure that we also removed the customer from emails and the email association. The big issue is what happens if I could remove a customer and the network drops and I could not remove the emails associated to that customer?

Transactions traditionally solved this problem, and is really an offshoot of relational modeling. But what would happen if we moved customer into a single logical model. A customer object could be serialized to a single SDS store and single SDS record. Thereby, if I deleted a customer, I would also delete the corresponding emails associated to that customer. If the issued command failed, then I would just re-try deleting the customer and would no longer need to worry about compensating transactions or even the need for transaction support in SDS.

This can be accomplished, and there are drawbacks (reporting, querying) but if we took each object we have and created a JSON property in our entities that allowed an entire object to be serialized we could manage. Or we can wait until SDS supports either transactions or nested entities inside other entities (both seem logical).

Distributed .NET Part II – Memcached in .NET

This article is outdated as the new Azure RTP is in play. For the latest and greatest on working with Azure Table Data Storage or Azure SQL Services go to the Azure category.

As a continuation of our distributed environment setup we’ll do a quick check to make sure our memcached instances are working. This requires a memcached client and since you’re here because your a .NET developer lets show some of the options and go over some pros and cons of each of them.

Rating Notes

Before I get hammered for rating something badly, let me say the first thing I look for in code is if its got too much code for what it is trying to do. That’s not to say I don’t love patterns and implementing singleton, chain-of-responsibility, command, MVP, MVC, and on and on. I just dont like it when code introduces cruft. For example; most of these memcached clients include things like logging libraries, compression libraries, common libraries, and-on-and-on it goes.

.NET Memcached

Lets me start out and say something positive… this does the job and is open source and is hosted on sourceforge. All major benefits. However the drawback is that it includes three extra DLLs. This, at least to me, kills this project. We have far too many moving parts for memcached. Why do we need three extra DLLS for something that supports telnet?

Awhile back I thought I had to pump in abstractions over abstractions. Those days are over, maybe if someone killed all of this fluff I’d use it. The final part is there is a DLL in there called common.dll that seems to be missing the source. For shame. I don’t like telling my clients; “So what is common.dll?”, “No idea. Could be a keylogger, I don’t know.”

Eniym Memcached Client

Even more bloated than the above and still managed to include Log4Net. While it has some technical implementations that go more to the metal the actual API is a mess. Something that should be cache.Put(key, object, [expry]) becomes cache.Store(Store.Put, key, object, blah); Readable went out the window and it seems very VB like. When I read the code I get the impression the author(s) don’t like methods.

beitmemcached

Now here we go. This is a simple download, even simpler implementation. It doesn’t have unit tests, but that can be forgiven as we can write our own against this library (also gives us a good bit of training to).

On top of this the author thought it would be nice to have people use their own logging implementation as, in the author(s) own comments “The problem with logging on the .Net platform is that there is no common logging framework, and everyone seems to have their own favorite. We wanted this project to compile straight away without external dependencies, and we want you to be able to embed it directly into your code, without having to add references to some other logging framework.”

Now that is great, because it does allow people to get up and running really quickly, but one question; so what exactly does System.Diagnostics.Trace do? Its in .NET and included with .NET since 1.1.

Source code has a decent amount of comments, could do with some re-factoring and some performance improvements (authors own words). But overall nice and clean.

Final Verdict

So for our examples we will use the last of the client protocols. Over time I may contribute to this open-source product as I really think that something like memcached should be as simple as possible. After all memcached is simply Put, Get, Set, and Delete no need to create a crazy library or framework around it.

After you download the BeITMemCached you can create the following unit test after you have your server up and running.

using System;
using System.Text;
using System.Collections.Generic;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;

using BeIT.MemCached;

namespace UnitTests
{
	/// &lt;summary&gt;
	/// Summary description for UnitTest1
	/// &lt;/summary&gt;
	[TestClass]
	public class MemcachedTestFixture
	{
		[TestMethod]
		public void Test1_MemCached()
		{
			// create an instance of our cache from configuration
			var cache = MemcachedClient.GetInstance(&quot;development&quot;);

			// add an item to the cache
			Assert.IsTrue(cache.Set(&quot;test&quot;, &quot;My Test!&quot;));
		}

		[TestMethod]
		public void Test2_MemCached()
		{
			// create an instance of our cache from configuration
			var cache = MemcachedClient.GetInstance(&quot;development&quot;);

			// get an item from our cache
			var obj = cache.Get(&quot;test&quot;);

			// make sure it is equal
			Assert.AreEqual(&quot;My Test!&quot;, obj);
		}

		[TestMethod]
		public void Test3_MemCached_Expry()
		{
			// create an instance of our cache from configuration
			var cache = MemcachedClient.GetInstance(&quot;development&quot;);

			// add an item to the cache that expires in 5 seconds
			Assert.IsTrue(cache.Set(&quot;expire-test&quot;, &quot;product1&quot;, new TimeSpan(0, 0, 3)));

			// get the item from the cache to make sure it is there before 3 seconds
			Assert.AreEqual(&quot;product1&quot;, cache.Get(&quot;expire-test&quot;));

			// wait 4 seconds and check that it is no longer there
			System.Threading.Thread.Sleep(4000);

			// the below should not work as it has expired from the cache
			Assert.AreNotEqual(&quot;product1&quot;, cache.Get(&quot;expire-test&quot;));
		}
	}
}

The above assumes you have created a corresponding configuration entry in your UnitTest’s App.Config. Note: I changed the default sectionName used by the original author (the section name was beitmemcached).

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
	<configSections>
		<section name="memcached" type="System.Configuration.NameValueSectionHandler" />
	</configSections>
	<memcached>
		<add key="development" value="localhost" />
		<add key="production" value="server1.example.com:11211, server2.example.com:11211, server3.example.com:11211" />
	</memcached>
</configuration>

memcached-verbose-thumbnail

Later on we’ll get to integrating memcached and SDS (Sql Data Services). But for now have fun playing with memcached. I strongly recommend you watch the rockstar memcached video for ideas and general concepts around memcached.