Saturday, December 15, 2012

A short update

Hi All,

It has been a long time since I posted on my blog. In the past weeks I was quite busy with various small projects and simply didn’t have the time to write a new post. Here is a quick review of the main things that I was working on.

  1. I fixed that annoying bug in npm where it could not parse a package.json file that had a BOM header. It was particularly annoying to Visual Studio users since it creates the BOM marker by default on any file you create. The main issue was that npm didn’t give a nice warning about it but instead failed to parse the package.json file. The fix was very small as you can see here.
  2. I developed an npm package called hashes which is an implementation of a HashTable and a HashSet in JavaScript. I tried to make the package as high quality as I could therefore it includes detailed documentation, tests, and extensibility points.
  3. I wrote a small application to help me catch the person who constantly turns off my air conditioner at work. (Just to clarify things, the aforementioned AC covers only two cubes, mine and my team mate’s. Needless to say that my team mate doesn’t turn the AC off.) The application uses a webcam to capture a short film when it detects changes in pre-defined areas of the frame. The code is written really badly so don’t take it as an example for anything.
  4. I uploaded some of my older projects to my GitHub. Feel free to fork them.
  5. Last and not least, I am involved in the Eclipse – Orion project currently mostly solving bugs but in the future I am planning to contribute features.

That’s it for now. Thank you for reading.

Boris

Saturday, October 20, 2012

Simply Explained: Multithreaded vs. Async as the hamburger restaurant

 

Hi,

Recently I have been lurking around the Node.js IRC channel and I saw that many people start using Node.js without realizing the main differences between the multithreaded approach and the asynchronous approach. This post will explain this difference through a simple (non computer oriented) example.

Suppose you are starting a hamburger fast food restaurant. In your restaurant you have only one dish: A hamburger. It is made by strictly following these steps:

  1. When a customer comes in the clerk takes his order (10 sec).
  2. The clerk puts a meatball on the grill and roasts it (30 sec).
  3. The clerk puts the buns in the oven and warms them (30 sec).
  4. The clerk assembles the hamburger and serves it to the customer (10 sec).

This recipe may not result in the best of hamburgers but it’s a start. In our example we have two critical resources (marked bold), the grill and the oven. When each of these devices is in use, no one else can use them. The clerk represents our thread and the recipe our code. The clerk follows the recipe to the letter in the same way our code is executing on a CPU.

--- What happens in a single threaded environment?

We have one clerk. When he puts the meatball in the grill he has to wait near the grill (doing nothing) while the meatball is fully roasted, then he takes the meatball out and can proceed to the oven. He puts the buns in the oven and again… waits… Until the buns are warm. The entire hamburger making process takes 80 seconds (10 + 30 + 30 + 10).

This is not bad for a single customer. In fact, the fastest any customer can be served a hamburger is 80 seconds so in the case when only one customer comes to the restaurant we are doing a perfect job serving him fast. The problem starts when there are several customers coming in. Consider the case when two customers are coming in together. The clerk has to follow the recipe for both customers, the first customer  gets a hamburger after 80 seconds while the second customer after 160 seconds (since the clerk can start making the second hamburger only after he is done with the first). Each additional customer would have to wait additional 80 seconds for any other customer standing before him in the line. The next diagram depicts serving one customer by a single clerk.

Drawing1

To solve the aforementioned problem we usually use a multithreaded model. In our example we simply add another clerk. In this case when two customers come into the restaurant each customer will be served by a different clerk. On first glance you might think that now each customer gets a hamburger after 80 seconds but this is not the case. Remember the critical resources? Both clerks take the order at the same time but only one of the clerks can use the grill (while the other waits for the grill to be available). The diagram for two clerks serving two customers looks like this:

Drawing2

The first customer still gets the hamburger after 80 seconds but the second customer gets his after 110 seconds, 50 seconds faster than in the single threaded case. From this example we can deduce two things:

  1. If more than two customers come we need more clerks. Clearly, there is a limit to the number of clerks we can have (and that limit is probably not very high). More clerks mean a crowded kitchen and at some point they start bumping into each other and even taking the wrong meatballs from the grill. This happens in software too. The more threads you have the more time you computer spends on switching between them and the more time you spend on debugging various race conditions .
  2. No matter how many clerks we hire we still have only one grill and one oven so at some point adding clerks will have no added value (in fact it will just cause more problems, see item 1). This happens in software too. So what do we do? We buy more grills… errr… RAM, CPUs, storage, bandwidth etc... Our handling capacity is directly linked to the hardware upgrades we are making.

--- Where does the asynchronous model comes in?

In the asynchronous model we make a small change to our kitchen equipment. Instead of hiring more clerks we have only the one clerk but he doesn’t wait at the critical resources. Instead, the grill and the oven have an input box and an output box each. When the clerk wants to use the grill, for example, he puts the meatball in the input box and goes away to do anything else. When the meatball is ready (after 30 seconds) it drops into the output box and the grill rings a bell. The clerk can pickup the meatball and continue the recipe. The diagram for serving two customers in this model looks like this:

Drawing3 (3)

Note the colors: White – the clerk is actually doing some work, Grey – the clerk is not doing anything, Yellow – the machines are doing the actual work.

We can see that the second customer still gets his hamburger after 110 seconds with only one clerk. We can also see that during these 110 seconds the clerk was occupied for 40 seconds and the rest of the time he was waiting for additional customers to come in (we didn’t have that slack time in the multithreaded model).

--- Wait! What? Making the algorithm async didn’t make it faster?

No! Async will not make your algorithm run faster, if anything it will make it run slower.

--- Then why people use Node.js which works in the async model?

The answer is in those Grey areas in the last diagram. Think about the second example (with the multithreaded model), if another customer comes in and wants to order ice-cream (which takes 10 seconds to make and requires the ice-cream machine). The customer would have to wait 120 seconds to get the ice-cream simply because there is no clerk to take his order. In the async model it would take 30 seconds because the single clerk can take the order 20 seconds into handling the first two customers.

The main point here is that you can’t take the clerk from the second example and put it to work as the clerk in the third example.  He simply would not understand why the machines are making all those sounds. In the software world, you can’t take code written for a multithreaded server, translate it to JavaScript and let it run on Node.js because it doesn’t work in the async model. Doing so means that you are back to the single threaded model of the first example which would certainly cause severe degradation in the response times of your server.

So what should you do to make your Node.js server as responsive as possible? (Not as fast as possible!)

  1. Use the async framework that Node.js provides for you and avoid using the sync methods.
  2. Make your code async by implementing your own async calls or using the calls of Node.js APIs.
  3. Make the non async code section as short as possible. Look at the ice-cream example. If we could reduce the “Take order” step into 5 steps that take 2 second each then the third customer could get the ice-cream faster (14 seconds).

Conclusion, Node.js with its async nature is a great tool but as with any other tool it requires understanding of how and when it should be used. Node.js is not a magic solution that makes any server run faster but a framework that allows you to build a server with improved response times for high number of users without considerably increasing the hardware capacity.

Thank you for reading. Please write your thoughts in the comments.

Boris.

Saturday, September 1, 2012

Async ForEach–The basic three levels of async loops

 

Hi All,

Today I am going to continue my series about async. This time I will talk about loops or more specifically about for each loops over a collection. I present async for each loops by through three examples where each example adds the asynchronous part to a different section of the loop.

Non-Async loop

This is the most basic type of loop and is the same loop that is provided by all sorts of frameworks (for example underscoreJS has a forEach method). In this case it is up to the user to make the code asynchronus and the loop function itself does not provide any facilities for it.

Non async loop
Code Snippet
  1. App.eachSync = function (collection, iterator, context) {
  2.     var token = {
  3.         canceled: false
  4.     };
  5.     for (var i = 0, l = collection.length; i < l; i++) {
  6.         iterator.call(context, collection[i], token);
  7.         if (token.canceled) {
  8.             break;
  9.         }
  10.  
  11.     }
  12. };
(for brevity I have omitted all the error handling code)
The loop function takes three arguments:

  • collection – The collection to iterate over.
  • iterator – The function that will be called for each iteration.
  • context – The pointer to “this” operator within the iterator function.

The iterator takes two arguments


  • The item of the current iteration
  • A token which can be used to transfer data between the iteration (in this example used for breaking out of the loop).

If the user wants to run something asynchronously then she must add the asynchronous code calls within the iterator method.


Async-Itarators


The next obvious step from the previous version is that the loop calls the iterator asynchronously. In the following code the loop runs synchronously but schedules each call to the iterator to run asynchronously. In this case, if the collection contains many items, the loop itself may make your thread stuck.


A loop that calls the iterator asynchronously
Code Snippet



  1. App.eachAsync1 = function (collection, iterator, context) {
  2.     var token = {
  3.         canceled: false
  4.     };
  5.     for (var i = 0, l = collection.length; i < l; i++) {
  6.         setTimeout(function (index) {
  7.             if (!token.canceled) {
  8.                 iterator.call(context, collection[index], token);
  9.             }
  10.         }, 0, i);
  11.     }
  12. };

Async-Loop


Another approach to the async loop problem would be to make the loop itself asynchronous. To do this we need to save the current iteration nu,ber within the scope and call an intermediate method iteratively. The intermediate method will schedule itself and call the iterator. The code would look something like this: 


Both the loop and the iterator calls are async
Code Snippet



  1. App.eachAsync2 = function (collection, iterator, context) {
  2.     var token = {
  3.         canceled: false
  4.     },
  5.     totalLength = collection.length,
  6.     innerFunction = function (index) {
  7.         setTimeout(function (innerIndex) {
  8.             if (!token.canceled) {
  9.                 iterator.call(context, collection[innerIndex], token);
  10.             }
  11.         }, 0, index);
  12.         index++;
  13.         if (index < totalLength) {
  14.             setTimeout(innerFunction, 0, index);
  15.         }
  16.     }
  17.  
  18.     innerFunction(0);
  19. };

At first, the innerFunction is called with the index 0. Next, it schedules the iterator call and itself until we reach the end of the collection. Note that it is possible to run the iterator directly and not schedule it.


Conclusion


As always with async we gain non blocking code on the expense of time. The code runs slower but in chunks, each chunk block the thread for a short while but allows other things to run in the thread in between the chunks. I ran the three loops on a code that prints the numbers 1 to 1000 to the console. As expected the performance degradated noticeably from one method to the next.


To quote Isaac Z. Schlueter (and I hope I am quoting right) – Async is something that you do not when you want to make things go faster but when you can’t make things go fast enough.


 


Thank you for reading,


Boris.

Friday, August 10, 2012

Async Mutex – Control flow in asynchronous functionality

 

Hi All,

In this post I am going to continue talking about asynchronous execution. Having read about async one might think that mutual exclusion and flow control structures are no longer required but this is not the case. For async execution you need similar structures with the async twist to them. In this post I am going to present a simple case where an async mutex is needed and used.

The scenario is quite simple. Through async calls you acquire some data structure, calculate some value and add it to the data structure (which affects all subsequent calculations). In my example I would like to calculate the first 7 elements of the Fibonacci series. For the sake of the argument I will assume that each operation must be performed asynchronously.

(Sorry C# guys, this code is in JavaScript).

 var data = [0, 1];

function retrieveData(callback) {
var x = data[data.length - 1];
var y = data[data.length - 2];
setTimeout(function() {
callback(x, y);
}, 0);
}

function calculateSum(x, y, callback) {
var sum = x + y;
setTimeout(function() {
callback(sum);
}, 0);
}

function addSum(sum, callback) {
data.push(sum);
setTimeout(callback, 0);
}

//Adds count fibonnacci elementd to data
function fibonacci(count) {
var i;
for ( i = 0; i < count; i++) {
retrieveData(function(x, y) {
calculateSum(x, y, function(sum) {
addSum(sum, function() {
console.log(data);
});
});
})
}
}

So what do we have here…  The next Fibonacci number is calculated through three async calls. The first call to retrieveData takes the last two numbers on the array and returns them through an async call to a callback. Think of it as retrieving a record from a remote repository or database. Next we have an async call that does the “logic”. It calculates the sum of the two numbers through the calculateSum function. The sum itself is then passed to a callback which adds it back to the original array, again through an async call. When I run this code on V8 I get the following result:


fibonacci(5);


5 x [0, 1, 1, 1, 1, 1, 1]


5 times the array you see above. But why? Lets look at the loop and try to trace what happens. The loop runs 5 times without interruptions (we are not multi-threaded). Each run it schedules the call to retrieveData on the event loop. Next, the function fibonacci exits and retrieveData is run five times in a row. At this point all five execution scenarios run on the same two values of the array – 0 and 1. Next, calculateSum and addSum run (each runs five times in a row). The end result is now clear. What we need here is to tell the event loop that it can run the functions asynchronously but only one execution may run within the loop. To this end I have implemented an async mutex:

   App.AsyncMutex = function() {
this.queue = [];
this.ownerToken = null;
this.token = 0;
};

App.AsyncMutex.prototype.enter = function(callback) {
if (this.ownerToken !== null) {
this.queue.push(callback);
} else {
this.token++;
this.ownerToken = this.token;
callback(this.token);
}
}

App.AsyncMutex.prototype.leave = function(token) {
if (this.ownerToken === null || this.ownerToken !== token) {
throw new Error("Owner token mismatch. Expected " + this.ownerToken + " but received " + token);
};

var callback;

if (this.queue.length > 0) {
this.token++;
this.ownerToken = this.token;
callback = this.queue.shift();
callback(this.token);
} else {
this.ownerToken = null;
}
}




The async mutex implementation is very simple. To protect some code simply wrap it with a call to mutex.enter. In the callback you receive a token which you use when you are done with the protected code and you call mutex.leave(token). The mutex assures that only one executer is within the protected code and adds all the others to a queue. Once the executer is done it runs the next executer in the list.


To use the mutex we need only a tiny code change:

      function fibonacci(count) {
var i, mux = new App.AsyncMutex();
for ( i = 0; i < count; i++) {
mux.enter(function(token) {
retrieveData(function(x, y) {
calculateSum(x, y, function(sum) {
addSum(sum, function() {
console.log(data);
mux.leave(token);
});
});
});
});
}
}
Run the code again and the result is:
[0, 1, 1] 

[0, 1, 1, 2]

[0, 1, 1, 2, 3]

[0, 1, 1, 2, 3, 5]

[0, 1, 1, 2, 3, 5, 8]

Exactly as we want it to be.

 

 

Thank you for reading.

Boris.

 


Friday, August 3, 2012

Simply Explained – Async

 

Hi All,

The blog is back after a somewhat long break. The break was due to two major changes in my life, the first is that I moved to a new apartment and had to deal with lots of new apartment stuff. The second is that I made a career change of 180 degrees (in terms of technological stack) and moved from Windows-.NET-WPF to Linux-Javascript-NodeJS. This means that you will probably see a change in content in future posts.

Anyway…

Modern applications are becoming more demanding in CPU resources. The CPU manufacturers used to deal with this demand by increasing the clock speed of the processor and by improving the internal architecture. In recent years (~10) this trend has changed to building multi-code processors and the dependence on multi threaded designs to speed up the application. In the world of .NET dealing with threads was usually done by experts as there are many pitfalls such as race conditions and mutual exclusion to be considered. In the last two versions of .NET framework this entire area was dramatically improved by introducing many new classes and the TPL framework and integration into the language in C# 5. You can read more about it in Pavel’s blog here and in this nice post that summarizes some of the key features of multi threaded libraries in .NET 4.

Parallel to the trend of multi-threaded design another trend evolved (or rather forced). The event loop design. It is not new by any means and in fact Windows is using this concept from the very first versions. The basic idea is that there is a queue of events and a process that de-queues the events one by one and handles them. The events may come from external sources like IO or internally from the process. In this post I will demonstrate how to implement a simple factorial method in the three paradigms:

Simple Code

In this paradigm we don’t do anything special and just write the code ignoring the implications of a long running method. The code would look something like that:

The simple way.
Code Snippet
  1. private BigInteger Factorial(int n)
  2. {
  3.   BigInteger result = 1;
  4.   for (int i = 2; i <= n; i++)
  5.   {
  6.     result = result * i;
  7.   }
  8.  
  9.   return result;
  10. }

 

This is how you would use it.
Code Snippet
  1. myLabel.Content = Factorial(50000).ToString();

The main problem here is that when this code is run, everything else doesn’t. The UI will appear to be stuck since it is not being refreshed and no interactions are possible with the process.

What you would probably do in this case is jump straight to the multi-threaded paradigm.

Multi-Threaded

In this paradigm we want to call our calculation on a background thread and once it is done let it report the result on the UI thread (updating the label). The code would be something like this:

Running the calculation in the background.
Code Snippet
  1. Thread thread = new Thread(new ParameterizedThreadStart(ThreadedFactorial));
  2. thread.IsBackground = true;
  3. thread.Start(50000);

 

The code that runs on the background thread.
Code Snippet
  1. public void ThreadedFactorial(object n)
  2. {
  3.   BigInteger result = Factorial(Convert.ToInt32(n));
  4.   this.Dispatcher.Invoke(new Action<string>(PrintResult), new object[] { result.ToString() });
  5. }

 

The result is reported on the UI thread.
Code Snippet
  1. public void PrintResult(string result)
  2. {
  3.   myLabel.Content = result;
  4. }

This implementation solves the issues with the simple implementation and is quite simple. The problems begin when there are resources shared between threads. You have to use locks, semaphores, and other thread synchronization constructs to make things work right. When the code gets larger you find yourself thinking “but what if the context switch is between the “if” statement and the assignment” and end up adding a lock on the entire method. We all been there. (And I will not start with how hard it is to debug concurrency issues.)

One more thing to take into account is that some environments don’t have threads (not .NET). Your browser running JavaScript probably does it on a single thread (the JavaScript not the browser).

The next paragraph shows how to write the same functionality on a single threaded system.

Event-Loop

The dispatcher is a “magical” object. It allows you not only to sync calls from different threads but also to schedule calls on the same thread (the UI thread in our case). Note the second parameter of the method Dispatcher.BeginInvoke is DispatcherPriority which allows you to add your method execution to a specific priority within the event loop of the UI thread. If you are able to break the calculation (or any other task) to small enough pieces then you can schedule those pieces one after the other on the event loop. This will insure that other events such as repaints and IO inputs are still handled by the process since they are interlaced within your calculation. The code may look something like this:

 

Calling the factorial method. Note the last parameter is the callback that should be run once the method is finished. Passing the callback as the last parameter is a common practice in this paradigm.
Code Snippet
  1. PartialFactorial(1, 1, 50000, new Action<string>(PrintResult));

 

The partial factorial method.
Code Snippet
  1. public void PartialFactorial(BigInteger result, int current, int end, Action<string> callback)
  2. {
  3.   BigInteger tempResult = result;
  4.   int i;
  5.   for (i = current; i <= current + 100 && i <= end; i++)
  6.   {
  7.     tempResult = tempResult * i;
  8.   }
  9.  
  10.   if (i <= end)
  11.   {
  12.     this.Dispatcher.BeginInvoke(new Action<BigInteger, int, int, Action<string>>(PartialFactorial), System.Windows.Threading.DispatcherPriority.Background, new object[] { tempResult, i, end, callback });
  13.   }
  14.   else
  15.   { callback(tempResult.ToString()); }
  16. }

There are a couple things to notice here. First, I perform only 100 calculations and then schedule the rest by invoking the same method with the new parameters. The scheduling is done with the Background priority which is after all the IO and rendering is already finished. If the calculation is complete I call the callback provided for me by the original caller. I could schedule the callback call as well.

The main advantage here is that you don’t need to deal with any threads as everything is done on a single thread. There are several disadvantages that I should mention:

  1. The code is a bit awkward. One can get used to this but it will always seem a bit clogged compared to the clean code of the multi-threaded approach. This is mainly due to the specific example I give here. Many processes can be broken into very small work units without making the flow so hard to understand.
  2. There is a loss of call stacks. Every time you schedule a method to the event-loop you are basically starting a new call stack. This is for better and for worse (no stack overflow as in recursion).
  3. The framework has to support this approach. In the example above one of the most time consuming methods was actually converting the BigInt to string (if you talk binary you know why it takes so long). There was no simple way to break this method to smaller bits so the UI is stuck while in this method. Frameworks like NodeJS which are built on the principle of event loop will make sure that no method blocks for that long.

 

I ran a small performance test on all three methods. And the event loop approach was a little slower than the other two but not by a factor. For a factorial of 50000 it took the first two methods ~7.5 seconds to complete while the event loop method took ~8 seconds. I tried different number of iterations within the PartialFactorial methods. Values above 300 made the UI cough a little bit. With a value of 100 there were no visible penalties.

Thank you for reading,

Boris

Friday, April 27, 2012

LongString: When things become unequal (comparing two long strings) – Part 2

 

Hi All,

In Part 1 of the LongString post I have shown how we can tackle the problem of running out of memory when manipulating long string using a class similar to the LongString. In Part 2 I will discuss a problem which arises from my implementation of LongString class when implementing the IEquatable<LongString> interface. As you may recall, the LongString is an aggregation of regular strings which are less or equal than a given value. The direct result of this definition is that two equal long strings may have a completely different representation (even if the given value is the same). I will discuss editing operations in Part 3 of this series but you may consider the case where for the given value of 3 the long string (“AB”,”CD”) is equal to the long string (“ABC”,”D”) but they have a different internal representation.

Implementing Equality

The first thing I did was to check how the framework string implements the equality method. You can see the code below but essentially since the strings are represented as a continues memory block the code simply goes over all the bytes and bitwise compares them (with a small optimization which I will not discuss).

  1. private unsafe static bool EqualsHelper(string strA, string strB)
  2.     {
  3.       int i = strA.Length;
  4.       if (i != strB.Length)
  5.       {
  6.         return false;
  7.       }
  8.       char* ptr = &strA.m_firstChar;
  9.       char* ptr2 = &strB.m_firstChar;
  10.       while (i >= 12)
  11.       {
  12.         if (*(long*)ptr != *(long*)ptr2 || *(long*)(ptr + (IntPtr)8 / 2) != *(long*)(ptr2 + (IntPtr)8 / 2) || *(long*)(ptr + (IntPtr)16 / 2) != *(long*)(ptr2 + (IntPtr)16 / 2))
  13.         {
  14.         IL_7D:
  15.           while (i > 0 && *(int*)ptr == *(int*)ptr2)
  16.           {
  17.             ptr += (IntPtr)4 / 2;
  18.             ptr2 += (IntPtr)4 / 2;
  19.             i -= 2;
  20.           }
  21.           return i <= 0;
  22.         }
  23.         ptr += (IntPtr)24 / 2;
  24.         ptr2 += (IntPtr)24 / 2;
  25.         i -= 12;
  26.       }
  27.       goto IL_7D;
  28.     }

I wanted to implement something similar and therefore my first implementation was like this:

  1. public bool EqualsVerySlow(LongString other)
  2. {
  3.   if (Length != other.Length) return false;
  4.   for (int i = 0; i < Length; i++)
  5.   {
  6.     if (this[i] != other[i])
  7.     { return false; }
  8.   }
  9.   return true;
  10. }

But then I discovered that I didn’t implement the indexer yet so I simply use the method from Part 1 which returns the internal indices (within the array of strings and in a string) GetItemIndexForCharIndex which resulted in this simple code:

  1. public char this[int index]
  2. {
  3.   get
  4.   {
  5.     Tuple<int, int> innerIndex = GetItemIndexForCharIndex(index);
  6.     return _items[innerIndex.Item1][innerIndex.Item2];
  7.   }
  8. }

The main problem here is that the implementation of GetItemIndexForCharIndex calculates the indices by going over the entire array of strings. This is OK for the indexer but it is not very good when used in a loop therefore it is time to implement the enumerator. A simple enumerator which “remembers” the current position and moves forward should be enough for this case.

  1. class LongStringEnumerator : IEnumerator<char>
  2. {
  3.   LongString _innerString;
  4.   int _currentIndex;
  5.   int _currentCharIndex;
  6.  
  7.   public LongStringEnumerator(LongString target)
  8.   {
  9.     _innerString = target;
  10.     _currentIndex = 0;
  11.     _currentCharIndex = 0;
  12.   }
  13.  
  14.   public char Current
  15.   {
  16.     get
  17.     { return _innerString._items[_currentIndex][_currentCharIndex]; }
  18.   }
  19.  
  20.   public void Dispose(){}
  21.  
  22.   object System.Collections.IEnumerator.Current
  23.   {
  24.     get
  25.     { return Current; }
  26.   }
  27.  
  28.   public bool MoveNext()
  29.   {
  30.     _currentCharIndex++;
  31.     if (_currentCharIndex >= _innerString._items[_currentIndex].Length)
  32.     {
  33.       _currentCharIndex = 0;
  34.       _currentIndex++;
  35.       if (_currentIndex >= _innerString._items.Count)
  36.       {
  37.         return false;
  38.       }
  39.     }
  40.     return true;
  41.   }
  42.  
  43.   public void Reset()
  44.   {
  45.     _currentCharIndex = 0;
  46.     _currentIndex = 0;
  47.   }
  48. }

Now I can fix the Equal code to use the enumerator instead of the indexer.

  1. public bool Equals(LongString other)
  2. {
  3.   if (Length != other.Length)
  4.     return false;
  5.  
  6.   LongStringEnumerator enumerator = new LongStringEnumerator(this);
  7.   LongStringEnumerator enumerator2 = new LongStringEnumerator(other);
  8.  
  9.   do
  10.   {
  11.     if (enumerator.Current != enumerator2.Current)
  12.       return false;
  13.     enumerator2.MoveNext();
  14.   }
  15.   while (enumerator.MoveNext());
  16.  
  17.   return true;
  18. }

One question remains… How bad is this solution? I have run some performance tests and it turns out that this equals method is about 30 to 50 times slower than the equals method of a regular string. In absolute terms, this equals method takes up to 20ms when comparing two 1,000,000 char long strings with only one different character (in a random position). The original method is so fast that it cannot be measured in ms Smile.

Conclusion

The main problem with the long string is going to be performance (in terms of time) compared to the regular string. We see that if an algorithm required mainly comparisons then the LongString is not a good choice. Still, remember that the LongString was designed for a very specific scenario in which it should perform better. In the next part I will implement some editing methods.

A small note about concurrency. Just out of curiosity I have implemented the slow method using the TPL:

  1. public bool EqualsSlow(LongString other)
  2. {
  3.   if (Length != other.Length)
  4.     return false;
  5.  
  6.   bool isEqual = true;
  7.   Parallel.For(0, Length, (i, loopState) =>
  8.   {
  9.     if (this[i] != other[i])
  10.     {
  11.       isEqual = false;
  12.       loopState.Break();
  13.     }
  14.  
  15.   });
  16.  
  17.   return isEqual;
  18. }

This was indeed faster in some cases over the very slow implementation but I cannot compete with the concurrency done in the CPU (which will be used to run the original Equals method of string). This implementation may give you some ideas on how to improve the Equals implementation of LongString (e.g. run a forward and a backward enumerators in separate tasks).

Thank you for reading,

Boris

Saturday, April 14, 2012

LongString: A long string for the small heap – Part 1

 

Hi All,

Today I am going to write about strings. I think that the most famous feature of the .Net string is its immutability and this property of the string is a source of comfort in many cases but also a source of pain in others. In this post I will describe a scenario where the immutability of a string may cause a serious degradation of performance and suggest a LongString class (with a partial implementation) which may somehow help in this case. To understand the problem in the scenario I am about to describe we have to throw in another .Net player – The large object heap (LOH for short). If you are unsure of how the LOH works you can read this Microsoft article but I will give a brief description of the relevant parts. The LOH is an additional heap used by the CLR to store large objects (and sometimes other object). How large? Currently (.Net 4) the magic number is 85,000 bytes and above. Every object allocated by the CLR which takes more than this number of byes will not be allocated on the regular heap but on the LOH. The problem is that the LOH has two undesirable properties:

  1. The objects on the LOH are always Generation 2 (i.e. you need a full GC run to remove them from memory).
  2. The memory in the LOH is not compacted because the CLR doesn’t want to copy such large objects around the memory. This may cause fragmentation in the LOH and sometimes to an OutOfMemory exception (even when there is plenty of memory remaining).

Now for the problematic scenario. Consider the case where you have a string which can be edited by the user and an internal mechanism that stores that string. Lets assume that for every change done by the user you must update some other data and display that data to the user immediately. In the general case this will not be a big problem because the strings will be pretty small. For every change you will be able to “edit” the internal string (by replacing it with a new instance) and the old instance will be garbage collected, probably as Generation 0. The problem begins when your string exceeds the magic number or actually half of the magic number since string characters are in Unicode. Then every generation of a new string will create a new instance on the LOH and the old instances will not be removed until full GC is run. If you are not careful enough and the string undergoes several manipulations then each such cycle may cause several objects to be allocated on the LOH and create fragmentation.

At this point some of you would stop me and say “Then use a StringBuilder, it is editable”. This is true, the StringBuilder is indeed a sort of editable version of the string class but internally it represents the data as a character array so the same problem remains (although not as severely as with String since the array will be reallocated much less).

I am going to propose an initial implementation of a LongString class which will be only rarely allocated on the LOH. The idea is simple, the LongString will keep an array of strings, each string is of length equal or less than a given number. It looks something like this:

  1. public class LongString
  2. {
  3.   private int _runSize;
  4.   private List<string> _items;
  5.   private int _length;
  6.  
  7.   public LongString(string data, int runSize)
  8.   {
  9.     _runSize = runSize;
  10.     _items = SplitString(data, runSize);
  11.     _length = GetTotalLength();
  12.   }
  13.  
  14.   private int GetTotalLength()
  15.   {
  16.     int count = 0;
  17.     foreach (string item in _items)
  18.     {
  19.       count += item.Length;
  20.     }
  21.  
  22.     return count;
  23.   }

Note that for brevity reasons I have omitted all the range/null checks from the code. Those checks are very important and mandatory in any code!

So, what are the pros and what are the cons of this implementation? The main advantage is that with a high enough runSize we can avoid the LOH altogether. The only part of the object that may grow too large is the items list but for run size of 100 we would be able to represent a string 100 times larger than we could with the regular string class without it being allocated on the LOH. I doubt that in any practical use you will have strings which are over 400Mb big (and you can always increase the runSize even more). The main disadvantage is that we don’t use the regular string class which means compatibility issue with any other existing code and some time and space penalties for all the extra calculations we do and data we store for the functionality of the regular string.

I will start by implementing the Substring method.

  1. private LongString(List<string> items, int runSize)
  2. {
  3.   _runSize = runSize;
  4.   _items = items;
  5.   _length = GetTotalLength();
  6. }
  7.  
  8.  
  9. public string Substring(int startIndex, int length)
  10. {
  11.   Tuple<int, int> innerStratIndex = GetItemIndexForCharIndex(startIndex);
  12.   Tuple<int, int> innerEndIndex = GetItemIndexForCharIndex(startIndex + length - 1);
  13.  
  14.  
  15.   if (innerStratIndex.Item1 == innerEndIndex.Item1) //This is the case when the substring is contained within one item
  16.   {
  17.     return _items[innerStratIndex.Item1].Substring(innerStratIndex.Item2, innerEndIndex.Item2 - innerStratIndex.Item2 + 1);
  18.   }
  19.  
  20.   StringBuilder result = new StringBuilder(); //Can be optimized by calculating the exact length
  21.  
  22.   result.Append(_items[innerStratIndex.Item1].Substring(innerStratIndex.Item2));
  23.  
  24.   for (int i = innerStratIndex.Item1 + 1; i < innerEndIndex.Item1; i++)
  25.   {
  26.     result.Append(_items[i]);
  27.   }
  28.  
  29.   result.Append(_items[innerEndIndex.Item1].Substring(0, innerEndIndex.Item2 + 1));
  30.   return result.ToString();
  31. }
  32.  
  33. public LongString LongSubstring(int startIndex, int length)
  34. {
  35.   Tuple<int, int> innerStratIndex = GetItemIndexForCharIndex(startIndex);
  36.   Tuple<int, int> innerEndIndex = GetItemIndexForCharIndex(startIndex + length - 1);
  37.  
  38.   List<string> result = new List<string>();
  39.   if (innerStratIndex.Item1 == innerEndIndex.Item1) //This is the case when the substring is contained within one item
  40.   {
  41.     result.Add(_items[innerStratIndex.Item1].Substring(innerStratIndex.Item2, innerEndIndex.Item2 - innerStratIndex.Item2 + 1));
  42.   }
  43.   else
  44.   {
  45.     result.Add(_items[innerStratIndex.Item1].Substring(innerStratIndex.Item2));
  46.  
  47.     for (int i = innerStratIndex.Item1 + 1; i < innerEndIndex.Item1; i++)
  48.     {
  49.       result.Add(_items[i]);
  50.     }
  51.  
  52.     result.Add(_items[innerEndIndex.Item1].Substring(0, innerEndIndex.Item2 + 1));
  53.   }
  54.   return new LongString(result, _runSize);
  55. }
  56.  
  57.  
  58. /// <summary>
  59. /// Returns a tuple where the first value is the index of the string that contains startIndex within the _items array
  60. /// and the second value is the offset within that item (string)
  61. /// </summary>
  62. /// <param name="index">The index of the character within the full string</param>
  63. /// <returns></returns>
  64. private Tuple<int, int> GetItemIndexForCharIndex(int index)
  65. {
  66.   int indexRemaining = index;
  67.   int itemIndex = 0;
  68.   while (indexRemaining >= 0)
  69.   {
  70.     if (_items[itemIndex].Length <= indexRemaining)
  71.     {
  72.       indexRemaining -= _items[itemIndex].Length;
  73.       itemIndex++;
  74.       continue;
  75.     }
  76.     return new Tuple<int, int>(itemIndex, indexRemaining);
  77.   }
  78.   return null;
  79. }

The code is quite simple. I have added a special private constructor that allows me to create a LongString from a list of strings. This allows me to speed up the internal cloning process without going through a regular string again. The substring operation is straight forward, cut away the start and the beginning of the target string from the saved string parts and copy all the parts in the middle as they are.

In the next post I will show more implementations of various methods of the LongString class.

Thank you for reading, Boris.

Saturday, March 24, 2012

My programming tidbits – Clear Code

 

Hi All,

I know I didn’t write a full post for quite a while. I was quite busy with work and other personal issues but I assure you that I am working on something interesting for the next full post. Meanwhile, I will start a series of short posts where I describe my programming tidbits. I will try to write something different than the regular “Write tests! Don’t write long methods! Don’t use the mouse!” that you can read anywhere online. My number 1 tip is this (and if you are going to skip all the other tips, please still consider this one):

Write code that is clear for all the team members.

I will start with an example. I was sent a piece of code for code review which is similar to this one:

  1. public string Foo()
  2. {
  3.   string fileName = Path.GetTempFileName();
  4.   Func<string> func = () => { return Path.Combine(Directory.GetCurrentDirectory(), fileName); };
  5.  
  6.   string fullFileName = func();
  7.   while (!File.Exists(fullFileName))
  8.   {
  9.     fileName = Path.GetTempFileName();
  10.     fullFileName = func();
  11.   }
  12.  
  13.   return fullFileName;
  14. }

This code generates an inexistent temporary file name in the current directory. I kindly asked to change the code to the following:

  1. public string Foo2()
  2. {
  3.   string fullFileName;
  4.   do
  5.   {
  6.     fullFileName = Path.Combine(Directory.GetCurrentDirectory(), Path.GetTempFileName());
  7.   }
  8.   while (!File.Exists(fullFileName));
  9.  
  10.   return fullFileName;
  11. }

After a short argument it was agreed that my code is slightly better in terms of readability (lets leave the performance considerations aside for this example). The main reason that initially this was not the code is because the engineer who wrote the code doesn’t like the do-while loop. To tell the truth, I don’t like do-while loop either. There is something not natural about its flow but in this case it saves a lambda expression, generation and allocation of a new class, and some readability.

When writing code please remember that not all your team members know everything you know. Some might not know lambda expressions well while others might not be familiar with the syntax of LINQ, some don’t like using “var” everywhere while others don’t want to clear unused “using”s because then they miss fundamental system namespaces by default. I am not saying you should not use these advanced framework features. I am saying you should not abuse these features (as in the above example). Think of the poor programmer that has to understand the entire logic of your algorithm and then he sees this method. He may not be best friends with lambdas but now he has to understand what this method does and why the hell did you not use a simple do-while loop instead.

My conclusion: Use advanced language features when they bring benefit to the code and not in order to make your code look cool.

 

Thanks for reading,

Boris.

 

P.S.

If the person who sent this code to me is reading this. Please don’t take this personally, this is just a good example of my tip.

Saturday, March 3, 2012

Implementing CacheDictionary–A Dictionary with a limited number of items that can be used as cache.

 

Hi All,

Today we consider the following scenario: There is a process that needs to generate many resources. The generation process takes a long time and each generated resource is taking a considerable amount of memory (you don’t want to store more than a few such resources in the memory at a time). After analysis it is evident that when the process runs it often generates the same resource over and over again such that while the total number of generated resources is quite large, all these resources are the same. The solution would be to generate each resource only once and store it in some dictionary. Each subsequent request to generate this resource would simply retrieve it from this dictionary. To solve the issue I have implemented a CacheDictionary which holds only a specified amount of items and has a versatile removal strategy (when trying to add an item over the allowed limit).

It is possible to solve this problem using WeakReferenced values using a regular dictionary. In this solution a regular dictionary holds the resources as WeakReferences and therefore the GC may collect them at any time. The main advantage here is that you allow the CLR to manage the memory itself but you lose the ability to tailor the memory allocation and freeing behavior to your exact needs. At some cases this solution is probably very good but for other cases we need the CacheDictionary.

The CacheDictionary

My implementation encapsulates (and not inherits from) a regular dictionary. I have implemented the IDictionary<TKey,TValue> interface delegating the methods to an internal instance of the dictionary. The constructor requires two arguments: The maximum number of items allowed in the CacheDictionary and the RemovalStrategy. The RemovalStrategy is a class which controls which key is removed from the CacheDictionary when the Add method is called but a full capacity is reached. This class implements the following interface:

  1. public interface ICacheDictionaryRemovalStrategy<TKey>
  2. {
  3.   /// <summary>
  4.   /// Initialize the strategy and pass the maximum number of allowed items
  5.   /// </summary>
  6.   /// <param name="maxSize">The maximum number of allowed items</param>
  7.   void Initialize(int maxSize);
  8.  
  9.   /// <summary>
  10.   /// Notify the strategy that a key was added to the base collection
  11.   /// </summary>
  12.   /// <param name="key">The key that was added</param>
  13.   void KeyAdded(TKey key);
  14.  
  15.   /// <summary>
  16.   /// Notify the strategy that a key was removed from the base collection
  17.   /// </summary>
  18.   /// <param name="key">The key that was removed</param>
  19.   void KeyRemoved(TKey key);
  20.  
  21.   /// <summary>
  22.   /// Notify the strategy that a key was accessed (retrieved by the user) in the base collection
  23.   /// </summary>
  24.   /// <param name="key">The key that was retrieved</param>
  25.   void KeyAccessed(TKey key);
  26.  
  27.   /// <summary>
  28.   /// Notify the strategy that the base collection was cleared
  29.   /// </summary>
  30.   void Clear();
  31.  
  32.   /// <summary>
  33.   /// Get the most appropriate key to remove, this is called when the base collection runs out of space
  34.   /// </summary>
  35.   /// <returns>The key that the base collection will remove</returns>
  36.   TKey GetKeyToRemove();
  37. }

Most of the methods are used to notify the strategy about changes in the underlying dictionary with two exceptions. Initialize method is called in the constructor of the CacheDictionary to allow the strategy run its initialization. The GetKeyToRemove is called when the user tries to add an item to the CacheDictionary but the items count is equal to the maxSize passed in the constructor. The CacheDictionary expects to get an existing key or an exception will be thrown. A noteworthy method is KeyAccessed. In my implementation this method is called whenever the user retrieves some value by using a key (see next section for details).

Tracking Access

As mentioned above, I define access to the CacheDictionary as retrieving some value by its key. The KeyAccessed method is therefore called in the following locations:

  • TryGetValue method if the underlying call to TryGetValue was successful.
  • The indexer property.
  • The Current property of the custom enumerator.

The third point is worth elaborating upon. I am implementing the IDictionary interface and therefore I must provide a way to enumerate over the Key-Value pairs inside the dictionary. If I would just delegate these calls to the regular enumerator then I would miss an access done by the enumerator itself. For this reason I have added a wrapper class (CacheDictionaryEnumerator) which wraps the calls to the underlying enumerator but calls KeyAccessed method when needed.

The full implementation is therefore:

  1. public class CacheDictionary<TKey, TValue> : IDictionary<TKey, TValue>
  2. {
  3.   private Dictionary<TKey, TValue> _data;
  4.   private int _maxSize;
  5.   private ICacheDictionaryRemovalStrategy<TKey> _removalStrategy;
  6.  
  7.   public CacheDictionary(int maxSize, ICacheDictionaryRemovalStrategy<TKey> removalStrategy)
  8.   {
  9.     if (removalStrategy == null)
  10.       throw new ArgumentNullException("removalStrategy");
  11.     if (maxSize == 0)
  12.       throw new ArgumentException("maxSize must be a positive integer value");
  13.     _maxSize = maxSize;
  14.     _removalStrategy = removalStrategy;
  15.     _data = new Dictionary<TKey, TValue>();
  16.  
  17.     _removalStrategy.Initialize(maxSize);
  18.   }
  19.  
  20.   #region IDictionaty Implementation
  21.  
  22.   public void Add(TKey key, TValue value)
  23.   {
  24.     if (_data.ContainsKey(key))
  25.       _data.Add(key, value); //I want to throw the same exception as the internal dictionary for this case.
  26.  
  27.     if (_data.Count == _maxSize)
  28.     {
  29.       TKey keyToRemove = _removalStrategy.GetKeyToRemove();
  30.       if (_data.ContainsKey(keyToRemove))
  31.         _data.Remove(keyToRemove);
  32.       else
  33.         throw new Exception(String.Format("Could not find a valid key to remove from cache, key = {0}", key == null ? "null" : key.ToString()));
  34.     }
  35.     _data.Add(key, value);
  36.     _removalStrategy.KeyAdded(key);
  37.   }
  38.  
  39.   public bool ContainsKey(TKey key)
  40.   {
  41.     return _data.ContainsKey(key);
  42.   }
  43.  
  44.   public ICollection<TKey> Keys
  45.   {
  46.     get {return _data.Keys;}
  47.   }
  48.  
  49.   public bool Remove(TKey key)
  50.   {
  51.     bool result = _data.Remove(key);
  52.     if (result)
  53.       _removalStrategy.KeyRemoved(key);
  54.     return result;
  55.   }
  56.  
  57.   public bool TryGetValue(TKey key, out TValue value)
  58.   {
  59.     bool result = _data.TryGetValue(key, out value);
  60.     if (result)
  61.       _removalStrategy.KeyAccessed(key);
  62.     return result;
  63.   }
  64.  
  65.   public ICollection<TValue> Values
  66.   {
  67.     get { return _data.Values; }
  68.   }
  69.  
  70.   public TValue this[TKey key]
  71.   {
  72.     get
  73.     {
  74.       TValue result = _data[key];
  75.       _removalStrategy.KeyAccessed(key);
  76.       return result;
  77.     }
  78.     set
  79.     {
  80.       _data[key] = value;
  81.       _removalStrategy.KeyAccessed(key);
  82.     }
  83.   }
  84.  
  85.   public void Add(KeyValuePair<TKey, TValue> item)
  86.   {
  87.     Add(item.Key, item.Value);
  88.   }
  89.  
  90.   public void Clear()
  91.   {
  92.     _data.Clear();
  93.     _removalStrategy.Clear();
  94.   }
  95.  
  96.   public bool Contains(KeyValuePair<TKey, TValue> item)
  97.   {
  98.     return _data.Contains(item);
  99.   }
  100.  
  101.   public void CopyTo(KeyValuePair<TKey, TValue>[] array, int arrayIndex)
  102.   {
  103.     ((IDictionary<TKey, TValue>)_data).CopyTo(array, arrayIndex);
  104.   }
  105.  
  106.   public int Count
  107.   {
  108.     get { return _data.Count; }
  109.   }
  110.  
  111.   public bool IsReadOnly
  112.   {
  113.     get { return ((IDictionary<TKey, TValue>)_data).IsReadOnly; }
  114.   }
  115.  
  116.   public bool Remove(KeyValuePair<TKey, TValue> item)
  117.   {
  118.     bool result = ((IDictionary<TKey, TValue>)_data).Remove(item);
  119.     if (result)
  120.       _removalStrategy.KeyRemoved(item.Key);
  121.     return result;
  122.   }
  123.  
  124.   public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator()
  125.   {
  126.     return new CacheDictionaryEnumerator(_data.GetEnumerator(), _removalStrategy);
  127.   }
  128.  
  129.   System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
  130.   {
  131.     return new CacheDictionaryEnumerator(_data.GetEnumerator(), _removalStrategy);
  132.   }
  133.   #endregion
  134.  
  135.   public class CacheDictionaryEnumerator : IEnumerator<KeyValuePair<TKey, TValue>>
  136.   {
  137.     private IEnumerator<KeyValuePair<TKey, TValue>> _innerEnumerator;
  138.     private ICacheDictionaryRemovalStrategy<TKey> _removalStrategy;
  139.  
  140.     internal CacheDictionaryEnumerator(IEnumerator<KeyValuePair<TKey, TValue>> innerEnumerator, ICacheDictionaryRemovalStrategy<TKey> removalStrategy)
  141.     {
  142.       if (innerEnumerator == null)
  143.         throw new ArgumentNullException("innerEnumerator");
  144.       if (removalStrategy == null)
  145.         throw new ArgumentNullException("removalStrategy");
  146.  
  147.       _innerEnumerator = innerEnumerator;
  148.       _removalStrategy = removalStrategy;
  149.     }
  150.  
  151.     public KeyValuePair<TKey, TValue> Current
  152.     {
  153.       get
  154.       {
  155.         KeyValuePair<TKey, TValue> result = _innerEnumerator.Current;
  156.         _removalStrategy.KeyAccessed(result.Key);
  157.         return result;
  158.       }
  159.     }
  160.  
  161.     public void Dispose()
  162.     {
  163.       _innerEnumerator.Dispose();
  164.       _innerEnumerator = null;
  165.     }
  166.  
  167.     object System.Collections.IEnumerator.Current
  168.     {
  169.       get {return this.Current;}
  170.     }
  171.  
  172.     public bool MoveNext()
  173.     {
  174.       return _innerEnumerator.MoveNext();
  175.     }
  176.  
  177.     public void Reset()
  178.     {
  179.       _innerEnumerator.Reset();
  180.     }
  181.   }
  182. }

Strategy Implementations

For the solution to be useable out of the box (or out of the downloaded zip file) I have added three implementation of the ICacheDictionaryRemovalStrategy<TKey> interface. The three implementations are:

  1. EmptyRemovalStrategy<TKey> – Removes the first item in it’s internal collection. Does not track access in any way.
  2. MruRemovalStrategy<TKey> – Removes the most recently used (most accessed) item in the CacheDictionary.
  3. LruRemovalStrategy<TKey> – Removes the least recently used (least accessed) item in the CacheDictionary.

These three implementation should allow you a basic use of the CacheDictionary right after download.

As usual you can download all the needed data from my skydrive (link below). The 7zip contains a small demo program on how to use the provided class. If you find bugs/improvements to my implementations I would be more than happy if you share those in the comments.

Thank you for reading,

Boris.