2008-02-29

those moments

i very much enjoyed work yesterday. my colleague and i improved the runtime of a tool used mostly in-house by a factor of 100. fun. much fun. it now runs so fast the other guys will think it doesn't work.

i have read a book or two about optimizations and it is really interesting how much of applies to the real world. main conclusion (it is so obvious it really hurts)


never think about optimization until you know whats going on!


the program we put on steroids was know to be slow. it worked quite well for relatively small inputs but goes nuts when hitting bigger (10mb) chunks of 3D data. i did a huge improvement some time ago by using trivial caching. i picked up the cache-idea because the profiler i use told me exactly where 90% of the work was done: looking up strings. which was surprising since i suspected the algorithm itself but, profiler says "looking up strings" so i optimized "looking up strings". bingo.

the reason to look again on the tool in question was an actual bug where the algorithm was wrong a little bit. the bugreport mentioned a 150mb file as the input .. hehe .. i couldn't even RUN this in reasonable time to check out the bugfix. so my colleague and i discussed the whole algorithm in detail and pros and cons and "lets write it from scratch" and so on. next day i actually was closing the bug (since it fixed the problem which could be shown for much smaller files) and wanted to take a screenshot of what the profiler would tell me, so i would have a starting point when coming back to the speed issue later. i ran the profiler and ...

wow, what the heck? sscanf sucks big time!

as it turned out the actual problem was reading in LOTS of numbers (indices, vertices, texture coordinates) from a huge string. the program spent more than 40 minutes on just reading the numbers from the "big" 150mb file and was not even finished when we got tired and stopped it.

so, i have actually never done this before but the compiler said "sscanf is bad" so be it. i had even seen the sscanf popping up in the profiler when i did the first speedup but dropped further investigation by telling myself "well, we have to parse the numbers, face it".

but that slow? we started to write our own little StringToNumber function. as dan bernstein pointed out at 3.3:

The essence of user interfaces is parsing: converting an unstructured sequence of commands, in a format usually determined more by psychology than by solid engineering, into structured data.


and

The parser often has bugs: it fails to handle some inputs according to the documented interface. The quoter often has bugs: it produces outputs that do not have the right meaning.



the key to success now was the fact that we already do a preprocessing step to transform bad formated (valid, but bad for parsing) input into good (technically not valid anymore, but good to parse) one. but sscanf is prepared to be ultra flexible and ready for bad input, it throws away unneeded whitespaces, endlines etc. but all this flexibility comes at a price: speed. and since we eliminated the bad, bad formated input already we could skip exactly that: the error checking.

the resulting code looked somewhat similar like this but after the change our own function did not even stick its nose out of the crowd. in fact it ran so fast, we first really thought it was not really working. we took a "medium" sized input file (10mb) and timed the improvements: 4 minutes and 43 seconds down to 2 seconds. crazy.

so, conclusions:


  1. do not draw any assumptions without backing them up

  2. use a profiler (i use oprofile at home but at work i am a vtune fan boy

  3. do not drop the idea of replacing a standard function by your own if the profiler says so :)



btw, the function that the profiler complains about now is called 'fprintf("%f")' ... hehe.


happy coding!