My initial implementation of the BIH is a bit dissapointing, half the speed of my SAH kdtree at best (primary ray only rendering time). But the build time _is_ dramatically faster (that was the whole point wasn't it
At this point, most of my questions revolve around the build routine. toxie, feel free to enlighten us if possible.
* What are the termination criteria for the recursive build? All my scenes seem to have at least a handfull of leaves that go all the way down to max-depth, even though they only have like 2 or 3 prims in them (which I assume are not seperable - haven't debugged graphically yet). How do you detect these cases efficiently? The limit case is N copies of the same triangle - right now the build routine I have can't do anything good with that case. Should the code just try to backtrack these useless "splits"?
* What do you set the clip value to when the corresponding side is empty ? Right now I use +/- Inf which seems to work. Is that robust during traversal though? (NaN safe traversal round 2)
* Not storing empty leaves. This assumes your traversal is perfectly robust (see point above). I assume you just trick the children pointer when the empty node ends up on the left side? ie: store index-3 so you end up indexing into the right place without storing one of the children?
* Storing compressed leaves (2x4 bytes vs. 3x4 bytes for inner nodes). Same questions as above really. When there is a mismatch of types (a leaf being sibling of a node for example) doesn't this pose a problem for indexing during traversal ? Or do you only compress leaf nodes when they won't impact their sibling ?
* Any suggestions for correctly multithreading the on-demand build? Did you use OpenMP ? How close can you stay to 100% cpu usage when all procs are competing for a write lock to such a central data structure?
And a more open question:
The BKD paper is indeed the same thing as BIH with full intervals (mentioned in the BIH paper). They do an SAH build though and get dynamic speed from incremental updates. Hardware considerations aside, what are you guys' thoughts on the two methods ? Surely doing SAH for these schemes is going to be much faster than the clipping kdtree compilers we've done. If it can be optimized enough, could SAH become worthwhile ? or is the heuristic close enough to optimal that the difference will be minor ?
Well thats all for now. Chances are I'll be able to answer some of these myself after spending a bit more time thinking about them, but I figured we could start a discussion anyway.
These papers have opened up some great new possibilities =)
