The result: the constructor calls disappear after passing either a -O or -O1. Best performance was to be had at -O3.
This raytracer is being compiled on a MacBook Pro 17", 2.4GHz Intel Core 2 Duo with 4GB of RAM. gcc:
gcc -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5465~16/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5465)
Image size: 512x512, 1 plane, two spheres, 1 triangle. Average run times for 5 runs each:
Average Run Time | Optimization | Debugging and profiling | |
---|---|---|---|
1. | 8.1604598 | None | yes |
2. | 2.0264776 | -O | yes |
3. | 2.036941 | -O1 | yes |
4. | 2.2078138 | -O2 | yes |
5. | 1.2405112 | -O3 | yes |
6. | 1.2331454 | -O4 | yes |
7. | 1.1143894 | -O3 | no |
8. | 1.1208828 | -O6 | no |
9. | 1.1019208 | -O4 | no |
Notes:
- 1. gprof shows that most of the time is spent in vec3f constructors
- 2. gprof shows that most of the time is spent in sphere and plane intersections,
but I still see more calls than expected to vec3f constructors - 3. This generates a different binary than the previous test and runs at approximately
the same speed. Still more vec3f constructor calls than expected. - 4. -O2 yields no real change from -O1
- 5. gprof yields expected results, big speedup from -O1 and -O2
- 6. Seems to be the same as -O4
- 7. Gained 1/10th of a second from dropping debugging and profiling
- 8. -O6 shows no gain over -O3
- 9. -O4 without debugging and profiling is 1/100th of a second faster than -O3, but that isn't big enough of a change to say that something changed. There could just be less load on this system.
No comments:
Post a Comment