About Peter Larsson

I'm a .NET developer and architect, working as a consultant at Connecta in Stockholm, Sweden. My current focus in on cloud computing and Windows Azure.

Implementing cache retries with Enterprise Library to solve intermittent DataCacheException

I have been using Windows Azure Cache for my Facebook application Am I Interesting for quite some time now and have kept getting intermittent exceptions like this one:

Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.)

Reading up on the Internet about this issue often suggests working with cache objects that are too large, but in my case I have gotten the exception primarily when reading objects that are very small (a couple of hundred bytes). Continuing to read about the Enterprise Library application block on MSDN blogs and on Patterns & Practices, I followed the examples and installed the Transient Fault Handling block using NuGet and updated my CacheHelper class to implement retries. Unfortunately – since I’m already using StructureMap – the install included the DI framework Unity, which Enterprise Library depends on, but so be it…

Cache helper code before implementing retries below. IDataCacheWrapper is a wrapper interface to Windows Azure Cache methods. I use it to inject an in-memory cache when I’m working on my development machine or when deploying to a single instance environment. The code for the concrete class is included further below.

    public static class CacheHelper
    {
        private static readonly IConfigurationHelper ConfigurationHelper;
        private static readonly IDataCacheWrapper Cache;

        static CacheHelper()
        {
            ConfigurationHelper = ObjectFactory.GetInstance<IConfigurationHelper>();
            Cache = ObjectFactory.GetInstance<IDataCacheWrapper>();
        }

        public static T GetFromCache<T>(string cacheKey)
        {
            var cacheItem = Cache.Get(cacheKey);
            if (cacheItem == null)
            {
                return default(T);
            }
            return (T) cacheItem;
        }

        public static void AddToCacheWithNoExpiration(string cacheKey, object value)
        {
            AddObjectToCache(cacheKey,value, TimeSpan.MaxValue);
        }

        public static void AddToCacheWithDefaultExpiration(string cacheKey, object value)
        {
            AddObjectToCache(cacheKey, value, TimeSpan.FromMinutes(ConfigurationHelper.DefaultCacheMinutes));
        }

        public static void AddToCacheWithSpecifiedExpiration(string cacheKey, object value, TimeSpan timeout)
        {
            AddObjectToCache(cacheKey, value, timeout);
        }

        private static void AddObjectToCache(string cacheKey, object value, TimeSpan timeout)
        {
            Cache.AddOrReplace(cacheKey, value, timeout);
        }

        public static void RemoveFromCache(string cacheKey)
        {
            Cache.Remove(cacheKey);
        }
    }

Code changes to implement retries (all other code remains intact):

    public static class CacheHelper
    {
        ...
        private static readonly RetryPolicy RetryPolicy;

        static CacheHelper()
        {
            ...
            RetryPolicy = CreateRetryPolicy();
        }

        private static RetryPolicy CreateRetryPolicy()
        {
            var retryStrategy = new Incremental(3, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(2)) { FastFirstRetry = true };
            var retryPolicy = new RetryPolicy<CacheTransientErrorDetectionStrategy>(retryStrategy);
            retryPolicy.Retrying += (sender, args) =>
                                        {
                                            var logger = LogManager.GetLogger(typeof(CacheHelper));
                                            logger.WarnFormat("Retrying cache access - Count: {0}, Delay: {1}, Exception {2}",
                                                args.CurrentRetryCount, args.Delay, args.LastException);
                                        };
            return retryPolicy;
        }

        public static T GetFromCache<T>(string cacheKey)
        {
            Object cacheItem = null;
            RetryPolicy.ExecuteAction(() => cacheItem = Cache.Get(cacheKey));
            ...
        }

        private static void AddObjectToCache(string cacheKey, object value, TimeSpan timeout)
        {
            RetryPolicy.ExecuteAction(() => Cache.AddOrReplace(cacheKey, value, timeout));
        }

        public static void RemoveFromCache(string cacheKey)
        {
            RetryPolicy.ExecuteAction(() => Cache.Remove(cacheKey));
        }
    }

And finally the concrete class for Windows Azure Cache that implements IDataCacheWrapper, if any of you need it:

    public class AzureDataCacheWrapper : IDataCacheWrapper
    {
        private readonly DataCache _cache;
        private readonly string _cacheKeyPrefix; // Unique cache objects for each environment

        public AzureDataCacheWrapper(IConfigurationHelper configurationHelper)
        {

            // Cache servers
            var cacheServers = new List<DataCacheServerEndpoint>
                                   {
                                       new DataCacheServerEndpoint(
                                           configurationHelper.CacheHostName, 
                                           configurationHelper.CachePort)
                                   };

            // Cache security
            var secureString = CreateSecureString(configurationHelper.CacheAuthorization);
            var cacheSecurity = new DataCacheSecurity(secureString);

            // Setup configuration
            var cacheConfiguration = new DataCacheFactoryConfiguration("default")
                                         {
                                             SecurityProperties = cacheSecurity,
                                             Servers = cacheServers
                                         };

            var cacheFactory = new DataCacheFactory(cacheConfiguration);
            _cache = cacheFactory.GetDefaultCache();
            _cacheKeyPrefix = configurationHelper.FacebookAppId;
        }

        public object Get(string key)
        {
            return _cache.Get(FullKeyName(key));
        }

        public void AddOrReplace(string key, object value, TimeSpan timeout)
        {
            _cache.Put(FullKeyName(key), value, timeout);
        }

        public void Remove(string key)
        {
            _cache.Remove(FullKeyName(key));
        }

        private string FullKeyName(string key)
        {
            return string.Format("{0}-{1}", _cacheKeyPrefix, key);
        }

        private static SecureString CreateSecureString(string unsecureString)
        {
            var secureString = new SecureString();
            foreach (var character in unsecureString.ToCharArray())
            {
                secureString.AppendChar(character);
            }
            return secureString;
        }
    }

I deployed this code to production earlier today, so if you don’t see an update to this post in the next few days stating some problem, you can assume the solution worked and that my problem is now gone ;-).

Advertisements

Getting federation with ACS & ADFS to work with multiple instances – certificate issue

I recently built a solution that federates ADFS login through Azure Access Control Service (ACS). Everything worked fine for me using the standard “Add STS reference” functionality, but when I increased the instance count to 2 in Azure (no problem in the development environment) I started getting strange errors similar to this:

Image

Now, being a security newbee, I won’t even try to explain the underlying issue, more than that it has to do with the encrypted cookies that are generated after login that are somehow tied to the server instance that generated the cookie and cannot be used by other servers. If we would have had sticky session, it might not have been a problem, but it is with Azure’s round robin load balancing.

The solution requires both coding to replace the cookie encryption, as well as using a self-generated certificate on all instances that is uploaded to Azure and referenced from web.config. Not very hard, but I struggled to find the solution online.

If you run into similar problems, I can highly recommend going through this guide in detail:
http://msdn.microsoft.com/en-us/gg557891

Don’t forget the step where you need to give Network Service access to your certificate for it to work in the development emulator:

Creating SSL certificate for Azure (wasted time on the wrong OS)

This is a shorter post than usual, hoping to help someone who is trying to add an SSL certificate to a Hosted Service in Azure and is a certificate newbie like myself. I just wasted almost an hour going about it in the wrong way, or actually on the wrong OS (as far as I can tell).

Short summary: Use a Windows Server, not Windows 7, to create your Certificate Signing Request (CSR), complete the certificate creation and export the needed .PFX file

I bought a certificate through GoDaddy, went through the certificate signing process from my Windows 7 machine, but was never able to export a .PFX file (option was greyed out) to upload on the Azure Management Portal. I tried and tried for over an hour, before I found some signs on the internet that the OS might play a difference. So I instead connected with Remote Desktop to one of my web role instances and went through the process there instead, and everything worked fine! I created my Certificate Signing Request using IIS Manager, uploaded it to GoDaddy (where I bought the SSL certificate), downloaded the created certificate, installed the intermediate certificate through MMC and then completed the certificate process through IIS by importing the .CRT file from GoDaddy. I could thereafter export the certificate from IIS as a .PFX file and upload it to Windows Azure.

I followed the step-by-step instructions here to generate the CSR and install the certificate in IIS 7:
http://support.godaddy.com/help/article/4801

Good luck!

Moving Azure storage (Tables & Queues) to another data center

As a follow-up to my last post Moving a SQL Azure database to another data center, I have one final step to gather all components in the same data center – move Azure Storage (tables & queues).

This article starts with the following geographical infrastructure, all in Azure but still in two separate data centers (North Central US and North Europe):

I haven’t found a way to sync table storage (except for blobs, which I don’t use) so for this migration I will need an outage so that data is no longer written to the old storage account while I migrate to the new one.

My plan of attack for this migration is:

  1. Implement a way to put my application (web and API) in offline mode with a friendly user message
  2. Migrate all static and historic data that doesn’t change during operations
  3. Put my application in offline mode
  4. Let my background worker process empty all queues
  5. Migrate the rest of the data
  6. Change the storage connection string
  7. Put the application back online
  8. Delete all tables and queues from the old storage account after smoke testing the app

Let’s do this and hope that my plan works! (documenting as I go…)

Step 1 – Implement a way to put my application offline

I implemented this step by adding two new configuration settings in ServiceConfiguration and a handler in global.asax that uses the settings to determine whether to redirect to a separate page showing that the application is offline. By placing the setting in ServiceConfiguration, instead of web.config, I can update the setting without re-deploying the app.

      <Setting name="Offline" value="false" />
      <Setting name="OfflineMessage" value="Am I Interesting is down for maintanence. Please try again in a few minutes!" />

And some code in global.asax to handle the setting:

        protected void Application_BeginRequest(object sender, EventArgs e)
        {
            ...
            // Handle offline mode

            if (Request.Url.AbsolutePath.ToLower().Contains("/offline"))
            {
                return;
            }

            if (!_configurationHelper.Offline)
            {
                return;
            }

            Response.Redirect("/Offline/Index");
        }

I added a simple offline page and then deployed the new version of the app (still with Offline=false).

Step 2 – Migrate all static and historic data that doesn’t change during operations

For the data migration I used Cloud Storage Studio from Cerebrata, now part of RedGate. First I created the same table names in the new storage account that I’ve created in North Central US. I then downloaded all static data and also all “old” data (modified earlier than yesterday) to one XML file per table on my machine:

Uploading the XML files was just as easy using the “Upload Table Data” function in Cloud Storage Studio. This was quite a bit more time consuming (about 3 minutes per 1000 rows) since the entities were uploaded individually, but that was still ok since my application remained online throughout the operation.

Now the remaining steps need to go quite fast to minimize downtime, even though the users now at least getting a message stating that we’re performing maintenance of the application and that they should retry again in a few minutes.

Step 3 – Put the application offline

I changed my new Offline setting to True in Azure Management Portal and clicked OK to recycle the instances and put the application offline.

Here is encountered an unexpected behavior! I expected the setting to get applied to one instance at a time (I’m running two instances of my hosted service) with no downtime, but during about one minutes I was unable to reach the application while this setting was applied. Actually, I never received an error, but the response time from the app was very long. After that minute I started seeing the offline page served from the instance that was updated, while the other instance was still recycling (nice behavior). Still not too bad with a slightly unresponsive app during one minute…

Step 4 – Let my background worker process empty all queues

After putting the application offline and the instances recycled to apply the setting, I checked the queue lengths and made sure they were empty before continuing.

Step 5 – Migrate the rest of the data

I did a new download of fresh data from Cloud Storage Studio by changing for example the query:
(UpdatedDateUtc le datetime’2012-02-18′)
to:
(UpdatedDateUtc gt datetime’2012-02-18′)

For those of you who are not familiar with querying the storage services:
le = Less than or Equal to
gt = Greater than

I then appended the data by uploading it to the new tables and verified that the count of entities in the new tables matched the old tables.

Step 6 – Change the storage connection string

I changed the connection string in ServiceConfiguration via the Azure Management portal…

Step 7 – Put the application back online

…and at the same time changed to Offline=false. The instances recycled once again and came back online, without interruption, now working against the new storage account in a completely different part of the world!

Step 8 – Delete all tables and queues the old storage account after smoke testing the app

For smoke testing, I checked that new data was written to tables in the new storage account. I also observed that new queues were created – as expected (I have code for this on startup) – when the instances recycled. I then deleted all tables and queues from the old storage account!

Done!

I only had about 10-15 minutes of app downtime during this operation, but it was “pretty” downtime as I started by implementing offline handling in the app.

Ahh…finally all my services live in the same data center – mission accomplished!

Moving a SQL Azure database to another data center

I recently decided to move Am I Interesting to the North Central US data center to improve performance for my Facebook queries. I’ve had relatively poor latency and some packet loss when executing queries over the Atlantic when the app was placed in the North Europe data center.

When I went to move the web role as a first step, I also found – to my surprise – that I had accidentally “misplaced” my database server in South Central US when I originally created it! :-|. Now that’s a new experience coming from traditional hosting – misplacing a server in an entirely different part of the world! 🙂

So this was my starting point:

After moving the hosted service (web role) to North Central US, which this post doesn’t cover, my infrastructure layout now looked like this:

I now prioritized to move the database service, as the app uses SQL Azure more intensely than other storages, from South Central US to North Central US, and my plan of attack to avoid downtime (even though data could be inconsistent a few minutes) was to:

  1. Create a new database server in North Central US
  2. Create a copy of my application database in North Central US
  3. Turn on one-way Data Sync from my old database server to new server
  4. Modify my ServiceConfiguration for my hosted service to use the new database instead
  5. Shut down the old database server (after successful sync)

Step 1 – Create a new database server in North Central US

This was simply done in the Azure Management Portal.

Step 2 – Create a copy of my application database in North Central US

For this step I first tried to follow the MSDN articles on http://msdn.microsoft.com/en-us/library/ff951629.aspx and http://msdn.microsoft.com/en-us/library/ff951624.aspx but came out short when trying to execute the command CREATE DATABASE xxx AS COPY OF server.database, which gave me a “transport-level error”.

So instead I created a new empty database on the destination server with the same schema as my source database (using GENERATE SCRIPTS in SQL Management Studio), hoping that data sync could transfer all data for me (see next step, I’m documenting as I go as you probably can tell :)).

Step 3 – Turn on Data Sync from old database server to new server

I then provisioned a new Data Sync server through the Management Portal and placed it in North Central US. I then selected to “Sync between SQL Azure databases” and proceeded in the wizard to define my new database (Hub) and my old database with single-direction sync to the Hub. After a few minutes all data was synchronized!:

Step 4 – Modify ServiceConfiguration to use the new database instead

I modified the ServiceConfiguration and published a new version of the app.

Step 5 – Shut down the old database server (after successful sync)

After all web role instances were online again after the changed connection string, I verified that new data was written to the new database, executed one final data sync and then deleted the old database.

Done!

I now have the following geographical infrastructure:

Next thing to tackle is moving Azure Storage (Queues & Tables) from North Europe to North Central US, but that challenge is for another day and another post…

Accessing a web role in development fabric from another machine

I have recently bought a Macbook Air and am running Windows virtualized (Parallels Desktop) from within Mac OS X for development. As part of developing Am I Interesting I want to check the browser compatibility with Safari and therefore want to access the development fabric from the Mac OS, instead of web deploying to a QA environment in Azure.

As it turns out, this is not trivial, as the development fabric binds to 127.0.0.1. I finally found a solution here though that I am paraphrasing in this blog entry.

Here are the steps I went though to get it working. I assume that this works even in a non-virtualized environment when you want to access the development fabric from another physical machine, even though I haven’t personally tried it myself.

On the (virtual) Windows machine:

  1. Download and unzip rinetd to some folder
  2. Delete all files except the .exe file (rinetd.exe)
  3. Create a new textfile in the same directory and name it rinetd.conf
  4. Add a line in the textfile to map the IP address of the machine to 127.0.0.1 for the port that the development fabric is running on. In my case, development fabric is running on port 81 and the machine’s IP address is 10.211.55.3 so my configuration file has the following line:
    10.211.55.3 81 127.0.0.1 81
  5. Run rinetd from the command prompt with the following command line:
    rinetd -c rinetd.conf 
  6. Now navigate from another machine, in my case the Mac host, to e.g.
    http://10.221.55.3:81 

Done! 🙂

You may want to use software to install the rinetd application as a service so that it always runs. In my case, it’s good enough to start it when I need it.

Using the Table Storage Upsert (Insert or Replace)

I just implemented the new Upsert functionality introduced in the Azure SDK 1.6 and wanted to document the solution as I missed one detail that kept me struggling for a while – hopefully this post can help someone else avoid it!

My intent with this change was to replace another update method that first checked to see if the entity already existed, and then either added the new entity or updated the existing one. Upsert will cut down my storage transactions for updates in half as it only executes one request per update. It will also boost performance for the same reason.

Step 1:
Make sure that a specific version header is sent in the REST requests to the storage service. I have a static method that creates a TableServiceContext that I updated with one row (line 10):

        public static TableServiceContext CreateContext()
        {
            var context = new TableServiceContext(BaseAddress, Credentials)
            {
                IgnoreResourceNotFoundException = true // FirstOrDefault() will return null if no match, instead of 404 exception
            };

            // Add specific header to support Upsert
            // TODO: Remove when Azure supports this by default
            context.SendingRequest += (sender, args) => (args.Request).Headers["x-ms-version"] = "2011-08-18";

            return context;
        }

Step 2:
Use the AttachTo method combined with UpdateObject and the ReplaceOnUpdate option for the SaveChanges method to execute the upsert (I have a generic class with the table storage methods):

        public void AddOrReplace(T obj)
        {
            _tableServiceContext.AttachTo(_tableName, obj);
            _tableServiceContext.UpdateObject(obj);
            _tableServiceContext.SaveChangesWithRetries(SaveChangesOptions.ReplaceOnUpdate);
        }

Note: I Already had an Update method that used the same code but with the following slightly different AttachTo call (the last asterisk parameter). This caused a ResourceNotFound error when saving changes and it took me a while to find and fix this error.

_tableServiceContext.AttachTo(_tableName, obj, "*");

That’s it!