Friday, February 13, 2009

Optimizations

It was time to optimize my raytracer and gprof turned up something unexpected: most of the time was being spent in vec3f constructors. I expected most of the time to be spent checking for intersections. It was time to turn on gcc optimizations.

The result: the constructor calls disappear after passing either a -O or -O1. Best performance was to be had at -O3.

This raytracer is being compiled on a MacBook Pro 17", 2.4GHz Intel Core 2 Duo with 4GB of RAM. gcc:

gcc -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5465~16/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5465)

Image size: 512x512, 1 plane, two spheres, 1 triangle. Average run times for 5 runs each:
Average Run Time Optimization Debugging and profiling
1. 8.1604598 None yes
2. 2.0264776 -O yes
3. 2.036941 -O1 yes
4. 2.2078138 -O2 yes
5. 1.2405112 -O3 yes
6. 1.2331454 -O4 yes
7. 1.1143894 -O3 no
8. 1.1208828 -O6 no
9. 1.1019208 -O4 no

Notes:

  1. 1. gprof shows that most of the time is spent in vec3f constructors

  2. 2. gprof shows that most of the time is spent in sphere and plane intersections,
    but I still see more calls than expected to vec3f constructors

  3. 3. This generates a different binary than the previous test and runs at approximately
    the same speed. Still more vec3f constructor calls than expected.

  4. 4. -O2 yields no real change from -O1

  5. 5. gprof yields expected results, big speedup from -O1 and -O2

  6. 6. Seems to be the same as -O4

  7. 7. Gained 1/10th of a second from dropping debugging and profiling

  8. 8. -O6 shows no gain over -O3

  9. 9. -O4 without debugging and profiling is 1/100th of a second faster than -O3, but that isn't big enough of a change to say that something changed. There could just be less load on this system.

No comments:

Post a Comment