A post on
comp.graphics.rendering.raytracing, pointing to this
page,
caught my attention.
The purpose is to, more or less, pack a barebone raytracer and scene
description within 100 LOCs.
And it struck me that the C++ version wasn't spending those LOCs very
wisely.
After an initial re-implementation that was much faster, i figured out that eventually a real sphere flake could be generated, in place, and the traversal could done iteratively.
And this is where the current version stands.
Somewhere on an Opteron 146 (2ghz) running debian64, with g++-4120050515...# time ./sphereflake100 8 >pix.ppm
5380840 spheres,claiming 369.473 MB.
real 0m7.642s
user 0m7.329s
sys 0m0.295s
Packing that many features (scene generation, shadows,
anti-aliasing, image output) within 100 lines while remaining efficient
requires to pay close attention to details.
It's a two prong attack.
The goal is to generate the classic SPD sphere flake, a
recursive figure where each level spawns 9, nicely distributed, new
spheres.
It can be described in a lot of ways, but a sphere BVH is what comes to
mind first.
An immediate corrolary, is that such a tree would be complete if you
put one sphere per node.
I have previously tinkered with BVH and the most efficient way to
handle them on current architectures is to turn such a tree into an
array[1] because it helps tremendously
with memory access
patterns and opens the way for a cheap iterative traversal without any
explicit stack.
Turning a binary tree into an array+skip requires 2 passes in the
general case to resolve forward references.
That's where the complete nature of this particular tree helps, because
forward references can be computed as we build the tree.
Following that track, the tree can be directly written in situ in its
final form (via placement new in the current implementation).
The machinery to handle rotation and so on was a bit troublesome to
implement in a compact way, that's why we're left with some
transcendentals; it happens that an efficient SSE libm implementation
makes that point a non issue, at least on x86-64 linux :)
With each node holding 2 spheres, one bounding the sub-tree and the other being the visual content, and a skip pointer (or more precisely the distance between the current node and the next tree part to inspect in case of a miss), the traversal becomes:
template<bool shadow> static void intersect(const ray_t &ray,hit_t &hit){
const node_t*p=pool;
while(p < end) {
if(p->bound.intersect(ray)>=hit.t) /*missed bound*/
p+=p->diff; /*skip subtree*/
else{
const double t=p->leaf.intersect(ray);
if(t < hit.t) { /*if hit, update, then break for shadows*/
hit.t=t;
if(shadow) break;
hit.n=p->leaf.get_normal(ray.o+ray.d*t);
}
++p; /*next!*/
}
}
}
After that point, there's nothing really fancy going on; primary &
shadow rays are shot, gathered & anti-aliased through a 2x2 rotated
grid and finally ouput to a monochrome PPM.