I ran into an interesting performance problem today. I was working on an algorithm to process text data files. My first stab at the design went through my data file starting from the first character and on, and output a string built based on some rules applied to each character. I ended up with a O(n2) performance, which really didn’t seem that good for the kind of processing I was doing.
So I redesigned the algorithm to build strings backward and ended up with an O(n) performance, which is much better.
Happy with the design, I set out to implement it in Java (the rest of the system is in Java). In order to build my strings backwards, I used StringBuffer.insert(0, c), which inserted my character at the beginning of my output string. I ran my test on a 5Mb file and it took over 20 min. Actually I don’t really exactly know how long because I aborted it. What?!
As it turns out, StringBuffer.insert(0, c) is terribly inefficient. Instead of calling this method, I changed it to build my string reversed using StringBuffer.append(c), then reversing it using StringBuffer.reverse() right before returning. Now this same test runs in 719 milliseconds… Ridiculous difference!

Need proof? Here’s some code:

int size = 100000;

StringBuffer str1 = new StringBuffer();
long t1 = System.currentTimeMillis();
for(int i=0; i<size; i++) {
  char c = (char)('a'+(i % 26));
  str1.insert(0, c);
}
System.out.println("StringBuffer.insert(): "
    +(System.currentTimeMillis()-t1)+" ms");

StringBuffer str2 = new StringBuffer();
long t2 = System.currentTimeMillis();
for(int i=0; i<size; i++) {
  char c = (char)('a'+(i % 26));
  str2.append(c);
}
str2 = str2.reverse();
System.out.println("StringBuffer.append(): "
    +(System.currentTimeMillis()-t2)+" ms");

System.out.println("Equal? "+str1.toString().equals(str2.toString()));

The results speak clearly:

insert() append() + reverse()
size=10000 19ms 3ms
size=100000 1642ms 16ms
size=1000000 162542ms 41ms
size=10000000 RIP 85ms

I didn’t have the heart to run the insert() method with size=10000000, that’s just cruel. But looking at the little data I collected and interpolating, it looks like it would have taken about 4 hours on my machine. Feel free to test it yourself… But the point is proven. The insert() implementation is exponential and the append()+reverse() implementation is linear. Ridiculous…