Movies:
480p (12.1 MB) 720p (18.2MB).
Download source code
here To build:
wget http://riskybacon.com/classes/cs513/assign4.2/assign4.2.tar.bz2
tar jxf assign4.2.tar.bz2
cd assign4.2/ps
make
./ps
Press 'q' to quit, click and drag the mouse to move the camera.
This was built under Mac OS 10.5.8 and requires Cg and a Cg capable card to run the shaders. Machine: MacBook Pro 17", 2.4GHz, 4GB of RAM, GeForce 8600M GT, 256MB Ram.
For this assignment, the particle system from the previous assignment has been changed to run on the GPU. 4 source textures are sent to the fragment shader for the update pass: initial position, initial velocity, current position and current velocity. The fragment shaders outputs two colors to multiple render targets: the new position and the new velocity. Multiple render targets are used to get both textures written to in a single pass.
There was a lot of trickiness regarding OpenGL state. I never really appreciated just how much state is kept. Looking at the OpenGL spec cleared up a lot of problems as it is fairly clear about what state is kept with things like framebuffer objects and what things will case their state to become invalid.
The final stumbling point for me was changing the bound texture for unit 0 in the update pass, not realizing it and having that texture be used for rendering the particle system. This usually resulted in seeing nothing. At the end of the update pass, it was critical to restore as much state as possible. Alternatively, the render pass could have been more thorough about setting its required state.
In the first step, I made the particles look much nicer. In the previous assignment, the particles had black boxes around them. Blending was clearly not looking right. To fix this, the depth mask was turned off when rendering the particle system. This was a very simple fix and made the particle system look much better.
The next step was to back out vertex buffer objects and compare the timings between plain vertex arrays and vbos:
The following two tests used 5000 particles:
Feature added | Frames per second |
vertex arrays | 61.25 |
vertex buffer objects | 61.5 |
This didn't make any difference. I suspect that this is because the VBO is being pushed out to the GPU for every frame.
The next step was to move the simulation from the CPU to the GPU. First, I altered the CPU based program to use a 4th order Runga-Kutta method to find the positions of the particles and time it.
Next, the incredibly simple step of moving the simulation to the GPU was undertaken. It turns out that this was very, very hard for me. I'm not too happy with the resulting simulation. It looks different than the one on the CPU and doesn't feel quite right. It was nice to see a very large speedup.
Particles | CPU (FPS) | GPU (FPS) | Speedup |
4096 | 85.2014 | 397.667 | 4.7 |
16384 | 21.8503 | 228.354 | 10.5 |
65536 | 5.47826 | 79.6116 | 14.5 |
262144 | 1.36898 | 14.6308 | 12.1 |
589824 | 0.609053 | 5.7222 | 9.4 |
1048576 | 0.345382 | 3.07792 | 8.9 |
The next step was to add an opaque object to the scene and make sure that it rendered properly. As you can see from the screen shot and movie, it worked out.
I'm not very happy with the quality of the simulation, so I will be continuing to work on it.