1. Introduction
  2. Modus operandi
  3. Files
Forenote: feel free to mail the idiot responsible for that mess, namely Thierry Berger-Perrin <tbptbp@gmail.com>.
Introduction

sphere flake 9, 48427561 spheresA post on comp.graphics.rendering.raytracing, pointing to this page, caught my attention.
The purpose is to, more or less, pack a barebone raytracer and scene description within 100 LOCs.
And it struck me that the C++ version wasn't spending those LOCs very wisely.

After an initial re-implementation that was much faster, i figured out that eventually a real sphere flake could be generated, in place, and the traversal could done iteratively.

And this is where the current version stands.

Somewhere on an Opteron 146 (2ghz) running debian64, with g++-4120050515...
# time ./sphereflake100 8 >pix.ppm
5380840 spheres,claiming 369.473 MB.

real 0m7.642s
user 0m7.329s
sys 0m0.295s
Current features:
Modus operandi

Packing that many features (scene generation, shadows, anti-aliasing, image output) within 100 lines while remaining efficient requires to pay close attention to details.
It's a two prong attack.

The crime scene.

The goal is to generate the classic SPD sphere flake, a recursive figure where each level spawns 9, nicely distributed, new spheres.
It can be described in a lot of ways, but a sphere BVH is what comes to mind first.
An immediate corrolary, is that such a tree would be complete if you put one sphere per node.

I have previously tinkered with BVH and the most efficient way to handle them on current architectures is to turn such a tree into an array[1] because it helps tremendously with memory access patterns and opens the way for a cheap iterative traversal without any explicit stack.
Turning a binary tree into an array+skip requires 2 passes in the general case to resolve forward references.
That's where the complete nature of this particular tree helps, because forward references can be computed as we build the tree.
Following that track, the tree can be directly written in situ in its final form (via placement new in the current implementation).
The machinery to handle rotation and so on was a bit troublesome to implement in a compact way, that's why we're left with some transcendentals; it happens that an efficient SSE libm implementation makes that point a non issue, at least on x86-64 linux :)

Shooting rays.

With each node holding 2 spheres, one bounding the sub-tree  and the other being the visual content, and a skip pointer (or more precisely the distance between the current node and the next tree part to inspect in case of a miss), the traversal becomes:

template<bool shadow> static void intersect(const ray_t &ray,hit_t &hit){
const node_t*p=pool;
while(p < end) {
if(p->bound.intersect(ray)>=hit.t) /*missed bound*/
p+=p->diff; /*skip subtree*/
else{
const double t=p->leaf.intersect(ray);
if(t < hit.t) { /*if hit, update, then break for shadows*/
hit.t=t;
if(shadow) break;
hit.n=p->leaf.get_normal(ray.o+ray.d*t);
}
++p; /*next!*/
}
}
}
After that point, there's nothing really fancy going on; primary & shadow rays are shot, gathered & anti-aliased through a 2x2 rotated grid and finally ouput to a monochrome PPM.
References
Files

Last modified: 2005/05/20