Sunday, October 25, 2009

Rendering to a depth texture doesn't work

Annoying problem: I want to capture the depth for each fragment and use it for some computation. So, I set up the FBO with a depth texture attached to the depth attachment point. Then I bind the FBO and render the scene. Now the moment of truth: draw a quad that is textured with the depth info.

The quad is completely white. Ok, maybe I'm setting up the texture wrong. Check some examples, tweak things a bit, check some of my old code, everything looks good for another try.

Now the depth texture looks like its full of random stuff. It never gets updated, but the actual scene is updating quite nicely.

What am I doing wrong? *bangs head against wall for an hour or two*

I just needed to to:

glClearColor(1.0f, 1.0f, 1.0f, 1.0f);

before rendering the depth. That's all I was missing.

Friday, October 23, 2009

Assignment 4.3

Download, compile, and run

This worked on both Ubuntu 9.04 and OS X 10.5.8

tar jxf assign4.3.tar.bz2
cd assign4.3/cpusort
cd ../peel

For both binaries, 'q' will quit the program. Click and drag the mouse to
change the camera position.


In all cases, 4096 particles were used, screen size was 1280x720:
8 layer depth peel GPU / CPU particle system Depth sort on CPU Frames per second>
Yes GPU No 63
No GPU No 348
No CPU Yes 377

So, something has happened that has slowed down the GPU version. Previously I was seeing large speedups. For 4096 particles it was at least 3 times faster. Running traces on the code show a lot of time being spent in the driver. At first I suspected that the branching in the point sprite fragment shader for the depth peeling, which also happens in the non-depth peeling version migh have been slowing things down, but actual impace was about 14 frames per second, which doesn't explain the large discrepancy in speed that I'm seeing when compared to the previous assignment.

Screen shots

Depth sorted on the CPU:

The depth peeled version looks darker than the non-depth peeled and CPU sorted versions:

In all cases, notice how the hotter particles, the redder particles, are closer to the camera and occlude the white particles which are farther away.

Part one of the assignment is to depth sort the particles on the CPU and render them in the correct order. This is fairly simple since C++ has a sort() function and all that is required of the programmer is to implement operator<. I computed the depth a single time using a dot product to project the vector from the camera to the particle onto the vector from the camera to the depth plane.

Then that list was sorted and the particles were rendered in the new order:

Notice how the "hot" particles, which are closer to the camera, are always in front of the white particles. There are some occasional problems with roundoff and the render order is wrong. Here's an example of the render order being backwards:

This was the result of the first attempt. The depth order was smallest to largest, which means that closer particles were rendered first. Since I'm using back-to-front rendering, this didn't work out so well.

Tuesday, October 20, 2009


I'd like a plane in my next assignment, but they're so boring. So I added a grid:

This would be a bit of work if the approach was to draw lines on the plane. But I also want those lines to be lit. As usual, shade the plane using Blinn-Phong and the grid was created by getting sine of the x & y positions of the fragments. If the sine fell between -0.05 and 0.05, the shaded color was lerped with black to create the grid, ie:

// Blinn-Phong shading comes before this, result ends up in "color"
float4 black = float4(0.0f, 0.0f, 0.0f, 1.0f);

float min = -0.05;
float max = 0.05;

float x = sin(position.x * 0.33);
float z = sin(position.z * 0.33);

if((x >= min && x <= max) || (z >= min && z <= max))
color = lerp(color, black, 0.2f);

Assignment #4.2 new movie

After a lot of work, the particle system is working correctly now. I've verified it against my CPU implementation and they are within 0.001 of each other for each step for 10000 steps. I stopped checking after that many steps.

Here's a movie (52MB) of the particle system.

Tuesday, October 6, 2009

Assignment #4.2

Movies: 480p (12.1 MB) 720p (18.2MB).

Download source code here

To build:
tar jxf assign4.2.tar.bz2
cd assign4.2/ps

Press 'q' to quit, click and drag the mouse to move the camera.

This was built under Mac OS 10.5.8 and requires Cg and a Cg capable card to run the shaders. Machine: MacBook Pro 17", 2.4GHz, 4GB of RAM, GeForce 8600M GT, 256MB Ram.

For this assignment, the particle system from the previous assignment has been changed to run on the GPU. 4 source textures are sent to the fragment shader for the update pass: initial position, initial velocity, current position and current velocity. The fragment shaders outputs two colors to multiple render targets: the new position and the new velocity. Multiple render targets are used to get both textures written to in a single pass.

There was a lot of trickiness regarding OpenGL state. I never really appreciated just how much state is kept. Looking at the OpenGL spec cleared up a lot of problems as it is fairly clear about what state is kept with things like framebuffer objects and what things will case their state to become invalid.

The final stumbling point for me was changing the bound texture for unit 0 in the update pass, not realizing it and having that texture be used for rendering the particle system. This usually resulted in seeing nothing. At the end of the update pass, it was critical to restore as much state as possible. Alternatively, the render pass could have been more thorough about setting its required state.

In the first step, I made the particles look much nicer. In the previous assignment, the particles had black boxes around them. Blending was clearly not looking right. To fix this, the depth mask was turned off when rendering the particle system. This was a very simple fix and made the particle system look much better.

The next step was to back out vertex buffer objects and compare the timings between plain vertex arrays and vbos:

The following two tests used 5000 particles:
Feature added Frames per second
vertex arrays 61.25
vertex buffer objects 61.5

This didn't make any difference. I suspect that this is because the VBO is being pushed out to the GPU for every frame.

The next step was to move the simulation from the CPU to the GPU. First, I altered the CPU based program to use a 4th order Runga-Kutta method to find the positions of the particles and time it.

Next, the incredibly simple step of moving the simulation to the GPU was undertaken. It turns out that this was very, very hard for me. I'm not too happy with the resulting simulation. It looks different than the one on the CPU and doesn't feel quite right. It was nice to see a very large speedup.

Particles CPU (FPS) GPU (FPS) Speedup
4096 85.2014 397.667 4.7
16384 21.8503 228.354 10.5
65536 5.47826 79.6116 14.5
262144 1.36898 14.6308 12.1
589824 0.609053 5.7222 9.4
1048576 0.345382 3.07792 8.9

The next step was to add an opaque object to the scene and make sure that it rendered properly. As you can see from the screen shot and movie, it worked out.

I'm not very happy with the quality of the simulation, so I will be continuing to work on it.

Monday, October 5, 2009

Assignment #4

Simple particle system:

Movie of particle system

Program built on a MacBook Pro running OS X Leopard with an NVidia card:

tar jxf assign4-jbowles.tar.bz2
cd assign4-jbowles/ps

You can move the view around by holding down the left button and dragging. Shaders can be reloaded by pressing 'r'. You can quit with 'Q' or escape.

This particle system models particles being launched out of a gravity well at just over escape velocity. The particles make a couple of passes past the gravity well and then escape.

Once they've reached a reasonable distance away, the particle is reinitialized and the process starts over.

The particles are modeled as point sprites and passed to the GPU in a vertex buffer object.

The vertex shader does the usual: it tranforms the model coordinates into world space and leaves the texture coordinates alone. The vertex shader also calculate a luminance value based on the distance of the particle from the gravity well. This value is passed on TEXCOORD1 to the pipeline.

The fragment shader uses an exponential function to determine the alpha component of the fragment. The input to the exponential is the distance of the fragment from the center of the point sprite.

The color (not the alpha) of the fragment is determined by the distance of the point sprite from the gravity well. As the sprite gets farther from the well, it "cools off" and the red component goes to zero.

It is interesting to see how ordering has an effect on the alpha blending of the particles. Since these particles are not ordered, the blending is not always correct. I'm looking forward to putting the particles onto the GPU and sorting them there.

On my MacBook Pro, this ran at about 60 FPS when using 1, 20, 2000, or 100000 particles. I suspect that this is because I'm using synchronized swap buffers. When I remove this constraint, FPS goes up to 600 with 200 particles. If I use 20000 particles, the frame rate is about 100 without synchronized buffers.