Saturday, March 24, 2012

My programming tidbits – Clear Code

 

Hi All,

I know I didn’t write a full post for quite a while. I was quite busy with work and other personal issues but I assure you that I am working on something interesting for the next full post. Meanwhile, I will start a series of short posts where I describe my programming tidbits. I will try to write something different than the regular “Write tests! Don’t write long methods! Don’t use the mouse!” that you can read anywhere online. My number 1 tip is this (and if you are going to skip all the other tips, please still consider this one):

Write code that is clear for all the team members.

I will start with an example. I was sent a piece of code for code review which is similar to this one:

  1. public string Foo()
  2. {
  3.   string fileName = Path.GetTempFileName();
  4.   Func<string> func = () => { return Path.Combine(Directory.GetCurrentDirectory(), fileName); };
  5.  
  6.   string fullFileName = func();
  7.   while (!File.Exists(fullFileName))
  8.   {
  9.     fileName = Path.GetTempFileName();
  10.     fullFileName = func();
  11.   }
  12.  
  13.   return fullFileName;
  14. }

This code generates an inexistent temporary file name in the current directory. I kindly asked to change the code to the following:

  1. public string Foo2()
  2. {
  3.   string fullFileName;
  4.   do
  5.   {
  6.     fullFileName = Path.Combine(Directory.GetCurrentDirectory(), Path.GetTempFileName());
  7.   }
  8.   while (!File.Exists(fullFileName));
  9.  
  10.   return fullFileName;
  11. }

After a short argument it was agreed that my code is slightly better in terms of readability (lets leave the performance considerations aside for this example). The main reason that initially this was not the code is because the engineer who wrote the code doesn’t like the do-while loop. To tell the truth, I don’t like do-while loop either. There is something not natural about its flow but in this case it saves a lambda expression, generation and allocation of a new class, and some readability.

When writing code please remember that not all your team members know everything you know. Some might not know lambda expressions well while others might not be familiar with the syntax of LINQ, some don’t like using “var” everywhere while others don’t want to clear unused “using”s because then they miss fundamental system namespaces by default. I am not saying you should not use these advanced framework features. I am saying you should not abuse these features (as in the above example). Think of the poor programmer that has to understand the entire logic of your algorithm and then he sees this method. He may not be best friends with lambdas but now he has to understand what this method does and why the hell did you not use a simple do-while loop instead.

My conclusion: Use advanced language features when they bring benefit to the code and not in order to make your code look cool.

 

Thanks for reading,

Boris.

 

P.S.

If the person who sent this code to me is reading this. Please don’t take this personally, this is just a good example of my tip.

Saturday, March 3, 2012

Implementing CacheDictionary–A Dictionary with a limited number of items that can be used as cache.

 

Hi All,

Today we consider the following scenario: There is a process that needs to generate many resources. The generation process takes a long time and each generated resource is taking a considerable amount of memory (you don’t want to store more than a few such resources in the memory at a time). After analysis it is evident that when the process runs it often generates the same resource over and over again such that while the total number of generated resources is quite large, all these resources are the same. The solution would be to generate each resource only once and store it in some dictionary. Each subsequent request to generate this resource would simply retrieve it from this dictionary. To solve the issue I have implemented a CacheDictionary which holds only a specified amount of items and has a versatile removal strategy (when trying to add an item over the allowed limit).

It is possible to solve this problem using WeakReferenced values using a regular dictionary. In this solution a regular dictionary holds the resources as WeakReferences and therefore the GC may collect them at any time. The main advantage here is that you allow the CLR to manage the memory itself but you lose the ability to tailor the memory allocation and freeing behavior to your exact needs. At some cases this solution is probably very good but for other cases we need the CacheDictionary.

The CacheDictionary

My implementation encapsulates (and not inherits from) a regular dictionary. I have implemented the IDictionary<TKey,TValue> interface delegating the methods to an internal instance of the dictionary. The constructor requires two arguments: The maximum number of items allowed in the CacheDictionary and the RemovalStrategy. The RemovalStrategy is a class which controls which key is removed from the CacheDictionary when the Add method is called but a full capacity is reached. This class implements the following interface:

  1. public interface ICacheDictionaryRemovalStrategy<TKey>
  2. {
  3.   /// <summary>
  4.   /// Initialize the strategy and pass the maximum number of allowed items
  5.   /// </summary>
  6.   /// <param name="maxSize">The maximum number of allowed items</param>
  7.   void Initialize(int maxSize);
  8.  
  9.   /// <summary>
  10.   /// Notify the strategy that a key was added to the base collection
  11.   /// </summary>
  12.   /// <param name="key">The key that was added</param>
  13.   void KeyAdded(TKey key);
  14.  
  15.   /// <summary>
  16.   /// Notify the strategy that a key was removed from the base collection
  17.   /// </summary>
  18.   /// <param name="key">The key that was removed</param>
  19.   void KeyRemoved(TKey key);
  20.  
  21.   /// <summary>
  22.   /// Notify the strategy that a key was accessed (retrieved by the user) in the base collection
  23.   /// </summary>
  24.   /// <param name="key">The key that was retrieved</param>
  25.   void KeyAccessed(TKey key);
  26.  
  27.   /// <summary>
  28.   /// Notify the strategy that the base collection was cleared
  29.   /// </summary>
  30.   void Clear();
  31.  
  32.   /// <summary>
  33.   /// Get the most appropriate key to remove, this is called when the base collection runs out of space
  34.   /// </summary>
  35.   /// <returns>The key that the base collection will remove</returns>
  36.   TKey GetKeyToRemove();
  37. }

Most of the methods are used to notify the strategy about changes in the underlying dictionary with two exceptions. Initialize method is called in the constructor of the CacheDictionary to allow the strategy run its initialization. The GetKeyToRemove is called when the user tries to add an item to the CacheDictionary but the items count is equal to the maxSize passed in the constructor. The CacheDictionary expects to get an existing key or an exception will be thrown. A noteworthy method is KeyAccessed. In my implementation this method is called whenever the user retrieves some value by using a key (see next section for details).

Tracking Access

As mentioned above, I define access to the CacheDictionary as retrieving some value by its key. The KeyAccessed method is therefore called in the following locations:

  • TryGetValue method if the underlying call to TryGetValue was successful.
  • The indexer property.
  • The Current property of the custom enumerator.

The third point is worth elaborating upon. I am implementing the IDictionary interface and therefore I must provide a way to enumerate over the Key-Value pairs inside the dictionary. If I would just delegate these calls to the regular enumerator then I would miss an access done by the enumerator itself. For this reason I have added a wrapper class (CacheDictionaryEnumerator) which wraps the calls to the underlying enumerator but calls KeyAccessed method when needed.

The full implementation is therefore:

  1. public class CacheDictionary<TKey, TValue> : IDictionary<TKey, TValue>
  2. {
  3.   private Dictionary<TKey, TValue> _data;
  4.   private int _maxSize;
  5.   private ICacheDictionaryRemovalStrategy<TKey> _removalStrategy;
  6.  
  7.   public CacheDictionary(int maxSize, ICacheDictionaryRemovalStrategy<TKey> removalStrategy)
  8.   {
  9.     if (removalStrategy == null)
  10.       throw new ArgumentNullException("removalStrategy");
  11.     if (maxSize == 0)
  12.       throw new ArgumentException("maxSize must be a positive integer value");
  13.     _maxSize = maxSize;
  14.     _removalStrategy = removalStrategy;
  15.     _data = new Dictionary<TKey, TValue>();
  16.  
  17.     _removalStrategy.Initialize(maxSize);
  18.   }
  19.  
  20.   #region IDictionaty Implementation
  21.  
  22.   public void Add(TKey key, TValue value)
  23.   {
  24.     if (_data.ContainsKey(key))
  25.       _data.Add(key, value); //I want to throw the same exception as the internal dictionary for this case.
  26.  
  27.     if (_data.Count == _maxSize)
  28.     {
  29.       TKey keyToRemove = _removalStrategy.GetKeyToRemove();
  30.       if (_data.ContainsKey(keyToRemove))
  31.         _data.Remove(keyToRemove);
  32.       else
  33.         throw new Exception(String.Format("Could not find a valid key to remove from cache, key = {0}", key == null ? "null" : key.ToString()));
  34.     }
  35.     _data.Add(key, value);
  36.     _removalStrategy.KeyAdded(key);
  37.   }
  38.  
  39.   public bool ContainsKey(TKey key)
  40.   {
  41.     return _data.ContainsKey(key);
  42.   }
  43.  
  44.   public ICollection<TKey> Keys
  45.   {
  46.     get {return _data.Keys;}
  47.   }
  48.  
  49.   public bool Remove(TKey key)
  50.   {
  51.     bool result = _data.Remove(key);
  52.     if (result)
  53.       _removalStrategy.KeyRemoved(key);
  54.     return result;
  55.   }
  56.  
  57.   public bool TryGetValue(TKey key, out TValue value)
  58.   {
  59.     bool result = _data.TryGetValue(key, out value);
  60.     if (result)
  61.       _removalStrategy.KeyAccessed(key);
  62.     return result;
  63.   }
  64.  
  65.   public ICollection<TValue> Values
  66.   {
  67.     get { return _data.Values; }
  68.   }
  69.  
  70.   public TValue this[TKey key]
  71.   {
  72.     get
  73.     {
  74.       TValue result = _data[key];
  75.       _removalStrategy.KeyAccessed(key);
  76.       return result;
  77.     }
  78.     set
  79.     {
  80.       _data[key] = value;
  81.       _removalStrategy.KeyAccessed(key);
  82.     }
  83.   }
  84.  
  85.   public void Add(KeyValuePair<TKey, TValue> item)
  86.   {
  87.     Add(item.Key, item.Value);
  88.   }
  89.  
  90.   public void Clear()
  91.   {
  92.     _data.Clear();
  93.     _removalStrategy.Clear();
  94.   }
  95.  
  96.   public bool Contains(KeyValuePair<TKey, TValue> item)
  97.   {
  98.     return _data.Contains(item);
  99.   }
  100.  
  101.   public void CopyTo(KeyValuePair<TKey, TValue>[] array, int arrayIndex)
  102.   {
  103.     ((IDictionary<TKey, TValue>)_data).CopyTo(array, arrayIndex);
  104.   }
  105.  
  106.   public int Count
  107.   {
  108.     get { return _data.Count; }
  109.   }
  110.  
  111.   public bool IsReadOnly
  112.   {
  113.     get { return ((IDictionary<TKey, TValue>)_data).IsReadOnly; }
  114.   }
  115.  
  116.   public bool Remove(KeyValuePair<TKey, TValue> item)
  117.   {
  118.     bool result = ((IDictionary<TKey, TValue>)_data).Remove(item);
  119.     if (result)
  120.       _removalStrategy.KeyRemoved(item.Key);
  121.     return result;
  122.   }
  123.  
  124.   public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator()
  125.   {
  126.     return new CacheDictionaryEnumerator(_data.GetEnumerator(), _removalStrategy);
  127.   }
  128.  
  129.   System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
  130.   {
  131.     return new CacheDictionaryEnumerator(_data.GetEnumerator(), _removalStrategy);
  132.   }
  133.   #endregion
  134.  
  135.   public class CacheDictionaryEnumerator : IEnumerator<KeyValuePair<TKey, TValue>>
  136.   {
  137.     private IEnumerator<KeyValuePair<TKey, TValue>> _innerEnumerator;
  138.     private ICacheDictionaryRemovalStrategy<TKey> _removalStrategy;
  139.  
  140.     internal CacheDictionaryEnumerator(IEnumerator<KeyValuePair<TKey, TValue>> innerEnumerator, ICacheDictionaryRemovalStrategy<TKey> removalStrategy)
  141.     {
  142.       if (innerEnumerator == null)
  143.         throw new ArgumentNullException("innerEnumerator");
  144.       if (removalStrategy == null)
  145.         throw new ArgumentNullException("removalStrategy");
  146.  
  147.       _innerEnumerator = innerEnumerator;
  148.       _removalStrategy = removalStrategy;
  149.     }
  150.  
  151.     public KeyValuePair<TKey, TValue> Current
  152.     {
  153.       get
  154.       {
  155.         KeyValuePair<TKey, TValue> result = _innerEnumerator.Current;
  156.         _removalStrategy.KeyAccessed(result.Key);
  157.         return result;
  158.       }
  159.     }
  160.  
  161.     public void Dispose()
  162.     {
  163.       _innerEnumerator.Dispose();
  164.       _innerEnumerator = null;
  165.     }
  166.  
  167.     object System.Collections.IEnumerator.Current
  168.     {
  169.       get {return this.Current;}
  170.     }
  171.  
  172.     public bool MoveNext()
  173.     {
  174.       return _innerEnumerator.MoveNext();
  175.     }
  176.  
  177.     public void Reset()
  178.     {
  179.       _innerEnumerator.Reset();
  180.     }
  181.   }
  182. }

Strategy Implementations

For the solution to be useable out of the box (or out of the downloaded zip file) I have added three implementation of the ICacheDictionaryRemovalStrategy<TKey> interface. The three implementations are:

  1. EmptyRemovalStrategy<TKey> – Removes the first item in it’s internal collection. Does not track access in any way.
  2. MruRemovalStrategy<TKey> – Removes the most recently used (most accessed) item in the CacheDictionary.
  3. LruRemovalStrategy<TKey> – Removes the least recently used (least accessed) item in the CacheDictionary.

These three implementation should allow you a basic use of the CacheDictionary right after download.

As usual you can download all the needed data from my skydrive (link below). The 7zip contains a small demo program on how to use the provided class. If you find bugs/improvements to my implementations I would be more than happy if you share those in the comments.

Thank you for reading,

Boris.