A couple months ago, I started work on a project of optimizing a ray tracing program that was in an awful state. It started off at taking about 180,000ms to complete. It had no optimization at all, and would bounce 7 times, even if a surface was not really reflective.

I originally planned to do things like multi-threading, assumption of pixels, pixel skipping, replacing of more expensive mathematic functions with faster ones, replacing maths libraries and even using the graphics card to render.
Below are the steps that I took to reduce that render time down to ~5 seconds.

step 0: added omp parallel optimization to main loop in main thread

step 1: lowered resolution, reasonable difference

step 2: changed RENDERABLES to SPHERE, to avoid using the virtual call. minimal change (113721ms)

current benchmark: (single line updates) 149097ms (cores @ 90%, no shadows, resolution 512 )) *THIS RESOLUTION WAS USED FOR ALMOST WHOLE OPTIMIZATION PROCESS)

step 3: set OMP parallelization to primary loop, instead of parralel loop. this removed ability to hit ESC though. CPU cores can finish early and are not handed out new tasks yet. (113535ms)

step 4: set primary loop to split main task into 8 (for 8 cores) and set OMP parallelization to split those 8 tasks. This should Load Balance the tasks. currently stops rendering after the first 8/16 lines though (copy paste issue)
with full picture rendered: (>11000ms)
may be having issues with constantly creating and deleting new threads

This did not work properly because I am not using this correctly. Each of the threads are set to work up until a barrier (their end point in the loop) and what I want is for them to start getting work from the incomplete threads. there needs to be a task pool for them to work in.

step 5: properly set up dynamic task pool, (101518ms) & (103121ms)

step 6: set ray bounces to 1 instead of 4 (~79000ms)

step 7: project settings optimization (71766ms)

step 8: recursive limit set to 0 (36350ms)

step 9: removed ambient from final scene calc ( 36434ms)

step 10: removed reflection calculation from scene (36533ms)

step 11: set up scene octree (15991ms)

step 12: set octree to depth 5, max 10 (612ms)

step 13: shadows on, full resolution (12350ms)

step 14: max depth 10, max spheres 50 (5959ms)

step 15: set progressive to 20 (instead of 1) ( 1 is 8365ms, 20 is 6387ms)

step 16: skip every 2nd row of pixels (looks awful) (3317ms)

The final render test completed at about 5 seconds. The main optimization techniques that were useful were multi-threading, removing un-needed steps, like reflection bounces in a scene wi=thout reflective surfaces, setting up an octree and a dynamic task pool.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s