Having been challenged with sorting a linked list, and trying to brush up on my C++ at the same time, I set out this morning to best the challenge. I decided I would use the STL list type so that I could avoid the boilerplate of writing a linked list class and also have a built-in sort() algorithm against which to test the vailidity and performance of my version.
So, the first question is which sorting algorithm to use? Quicksort is the canonical answer from arrays, because it requires constant auxillary space in which to operate (typically just one element). However, quicksort also relies upon constant time access to any element in the array in order to maintain its O(N log(N)) performance. So, since quicksort won't work, the other obvious solution from arrays is mergesort. When operating on arrays, mergesort requires O(N) additional space to perform the merging step. (Technically this isn't true, but without the additional space mergesort takes longer to run.)
The key fact about mergesort on linked lists is that it no longer requires O(N) additional space. In fact, it can be done with zero additional space! This is because of the advantage linked lists have over arrays in insertion speed. Arbitrary insertion in a linked list (assuming you already have a pointer to the location you're inserting to) is constant time, whereas arrays require O(N) time to shift back all the items following the insertion by one space. Now instead of interleaving two sorted arrays into a "merged" array, we can simply insert items from the second sorted list into the first sorted list where they belong.
Typically, mergesort works by splitting the list in half and then recursing. Once you get down to zero or one element lists, you can guarantee that you are sorted and can start going back up the call stack. I want to say something like "this is impractical for linked lists", but the reality is that you have to walk through the entire list log(N) times, one way or another. I wanted to write a version that didn't recurse, so I did my mergesort backwards. On the first pass, we simply sort pairs of items. After that, each pair is sorted, so we can merge pairs. Next step is to merge adjacent 4-element pairs, and so on and so forth. Another way of looking at it is that instead of splitting the list into 2 parts, then 4, then 8, and so on, I'm splitting it into N/2 parts, then N/4, then N/8, and so on. This lets me re-use the right boundary pointer from the previous merge as the left boundary pointer for the next merge.
My first pass was as follows:
#include <list>
using namespace std;
template <typename T>
void mySort(list<T> &l)
{
int i, stepsize = 1, listsize = l.size();
typename list<T>::iterator left, middle, right, end=l.end();
for (stepsize = 1; stepsize < listsize; stepsize <<= 1) {
right = l.begin();
while ( right != end ) {
left = middle = right;
for (i = 0; i < stepsize && middle != end; i++) middle++;
right = middle;
for (i = 0; i < stepsize && right != end; i++) right++;
while ( left != middle && middle != right ) {
while ( left != middle && *left <= *middle )
left++;
while ( middle != right && *middle < *left ) {
l.insert(left, *middle);
l.erase(middle++);
}
}
}
}
} A few notes about the above code: I use a local variable to store a reference to l.end() for the entire function. This saves me about half a second on a 10,000 element list versus calling l.end() each time. I would do the same thing for l.begin(), but because I might change the first element of the list I can't. Luckily, I only need to call l.begin() log(N) times.
I would use the advance() function to replace the for loops stepping through the list, but advance doesn't check against l.end(), and (at least my version of) the STL list is circular, in that ++l.end() == l.begin()! So, I have to write my own.
Finally, I wasn't sure if l.erase(middle++) was going to work like I wanted. lists are much more flexible when it comes to keeping iterators valid through insertions and deletions, but the one thing my STL book was sure to point out was that pointers to deleted elements would become invalidated. An hour or so after I wrote this and saw that it did indeed work, I found that on the next page it pointed out this exact line in another example and said that it was ok, but the seemingly similar l.erase(middle); middle++ was not. The difference is that in the correct version, middle already points to the next element before the previous element is deleted.
So, I wrote some boilerplate testing code and compared the result of this function against list.sort and found that it worked just fine, but it took approximately 3 times as long to run. I loaded up my profiler and tried to see where all the time was going, but all I could see was that a lot of time was spent freeing memory. Taking a glance at how the STL had done it, I noticed they were making use of a function named splice(). I looked at it and realized that it did exactly what I had meant when I wrote the insert / erase pair. Once I replaced that line with:
l.splice(left, l, middle++); My version was only 10% slower than the STL's version. Not half bad; in two hours I had come very close to the reference implementation. I spent some more time trying to understand the STL's version, and found some very cool tricks with pointers (and an array of 64 lists, which empty only take up 8 bytes apiece), but felt that my version was more immediately readable. Of course, I had just written mine, y'all can be the judge of readability.
Over breakfast I had the idea that instead of using a set stepsize I could simply increment the middle and right pointers as long as the previous element was less than it, effectively taking advantage of any natural sorting the list might have. Unfortunately, with all the additional variables and assignments (plus a need to move onto other things) I was unable to get it to perform better than my original implementation. As it stands here, it runs about 15% slower than the STL's sort, or about 2% slower than my original.
template <typename T>
void mySort2(list<T> &l)
{
typename list<T>::iterator left, middle, right, end=l.end(),
lastleft=end, temp;
while ( lastleft != l.begin() ) {
lastleft = right = l.begin();
while ( right != end ) {
left = temp = middle = right;
while ( ++middle != end && *middle >= *temp) { temp = middle; }
if ( middle != end ) {
temp = right = middle;
while ( ++right != end && *right >= *temp) { temp = right; }
lastleft = left;
} else if ( left == lastleft ) {
break;
} else { // middle = end, so merge [lastleft, left) w/ [left, end)
right = end;
middle = left;
left = lastleft;
}
while ( left != middle && middle != right) {
while ( left != middle && *left <= *middle )
left ++;
while ( middle != right && *middle < *left )
l.splice(left, l, middle++);
}
}
}
} As a final note, replacing the merging section with temporary lists and calls to list.splice and list.merge doesn't seem to improve performance by an appreciable degree. What they would offer is an easier way to encapsulate a variety of comparison operators instead of just using less than. Oh well, enough duplication of library code for now.
