bih / bkd implementations

Practical and theoretical implementation discussion.

bih / bkd implementations

Postby fpsunflower » 23 Jun 2006, 07:23

Starting a new thread for those of us trying out these new hybrid schemes.


My initial implementation of the BIH is a bit dissapointing, half the speed of my SAH kdtree at best (primary ray only rendering time). But the build time _is_ dramatically faster (that was the whole point wasn't it ;)). Only spent an evening on it so far though so plenty of room for improvement.


At this point, most of my questions revolve around the build routine. toxie, feel free to enlighten us if possible.

* What are the termination criteria for the recursive build? All my scenes seem to have at least a handfull of leaves that go all the way down to max-depth, even though they only have like 2 or 3 prims in them (which I assume are not seperable - haven't debugged graphically yet). How do you detect these cases efficiently? The limit case is N copies of the same triangle - right now the build routine I have can't do anything good with that case. Should the code just try to backtrack these useless "splits"?

* What do you set the clip value to when the corresponding side is empty ? Right now I use +/- Inf which seems to work. Is that robust during traversal though? (NaN safe traversal round 2)

* Not storing empty leaves. This assumes your traversal is perfectly robust (see point above). I assume you just trick the children pointer when the empty node ends up on the left side? ie: store index-3 so you end up indexing into the right place without storing one of the children?

* Storing compressed leaves (2x4 bytes vs. 3x4 bytes for inner nodes). Same questions as above really. When there is a mismatch of types (a leaf being sibling of a node for example) doesn't this pose a problem for indexing during traversal ? Or do you only compress leaf nodes when they won't impact their sibling ?

* Any suggestions for correctly multithreading the on-demand build? Did you use OpenMP ? How close can you stay to 100% cpu usage when all procs are competing for a write lock to such a central data structure?


And a more open question:

The BKD paper is indeed the same thing as BIH with full intervals (mentioned in the BIH paper). They do an SAH build though and get dynamic speed from incremental updates. Hardware considerations aside, what are you guys' thoughts on the two methods ? Surely doing SAH for these schemes is going to be much faster than the clipping kdtree compilers we've done. If it can be optimized enough, could SAH become worthwhile ? or is the heuristic close enough to optimal that the difference will be minor ?


Well thats all for now. Chances are I'll be able to answer some of these myself after spending a bit more time thinking about them, but I figured we could start a discussion anyway.

These papers have opened up some great new possibilities =)
fpsunflower
 
Posts: 193

Postby Phantom » 23 Jun 2006, 09:23

Are we allowed to discuss the BIH paper? I thought it was private and secret.
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby tbp » 23 Jun 2006, 10:12

Please stay calm and no harm will happen.
I've plonked that thread under the carpet and blessed fpsunflower's account until NDA issues are resolved.

I really don't like to have to do that. Toxie, what's the current status?
User avatar
tbp
Overlord
 
Posts: 1445
Location: France

Postby toxie » 23 Jun 2006, 11:27

From monday on it's okay to talk about it in public. ;)
But unfortunately it's still not okay to put the paper on a website. :(

Actually i don't know why the other guys are putting their stuff out
(http://myweb.hinet.net/home7/hks/Papers ... Papers.htm)
but we had to sign a paper that basically says "your not allowed to put that paper
on your website for a year, only EG is allowed to".
Spreading it via mail to "colleagues, etc." is allowed though.

Nevertheless we (Alex and me) just discussed it and the paper will be available on our site
(http://graphics.uni-ulm.de) from monday on (or maybe tuesday).

btw: The closed discussion here is of course okay for now, as i've sent the paper to all you guys
to have fun with. ;)
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby toxie » 23 Jun 2006, 11:47

Now for the more fun stuff:

What are the termination criteria for the recursive build? All my scenes seem to have at least a handfull of leaves that go all the way down to max-depth, even though they only have like 2 or 3 prims in them (which I assume are not seperable - haven't debugged graphically yet). How do you detect these cases efficiently? The limit case is N copies of the same triangle - right now the build routine I have can't do anything good with that case. Should the code just try to backtrack these useless "splits"?


You just need to track all cases where an empty leaf is cut off AND the other child-bbox is the same size as the father-bbox.
Or what do you mean by N copies of the same triangle??!

What do you set the clip value to when the corresponding side is empty ? Right now I use +/- Inf which seems to work. Is that robust during traversal though? (NaN safe traversal round 2)


Absolutely robust as the intersection with this splitplane will be ALWAYS outside the current ray-segment!
You have to be careful though (same as with the kD traversal) that non-active rays in a ray bundle
not traverse the empty leaf "by accident" (=false masking). This can be tested easily by checking if an empty leaf is really
NEVER visited by a ray bundle.

Not storing empty leaves. This assumes your traversal is perfectly robust (see point above). I assume you just trick the children pointer when the empty node ends up on the left side? ie: store index-3 so you end up indexing into the right place without storing one of the children?


Bingo!

Storing compressed leaves (2x4 bytes vs. 3x4 bytes for inner nodes). Same questions as above really. When there is a mismatch of types (a leaf being sibling of a node for example) doesn't this pose a problem for indexing during traversal ? Or do you only compress leaf nodes when they won't impact their sibling ?


Actually it only means trouble for the tree construction (at least in my case).

Any suggestions for correctly multithreading the on-demand build? Did you use OpenMP ? How close can you stay to 100% cpu usage when all procs are competing for a write lock to such a central data structure?


I did a OpenMP version, but never got it to work altough it should be ridiculously easy. :(
My own stupidity or the compilers fault? I don't know.

The BKD paper is indeed the same thing as BIH with full intervals (mentioned in the BIH paper). They do an SAH build though and get dynamic speed from incremental updates. Hardware considerations aside, what are you guys' thoughts on the two methods ? Surely doing SAH for these schemes is going to be much faster than the clipping kdtree compilers we've done. If it can be optimized enough, could SAH become worthwhile ? or is the heuristic close enough to optimal that the difference will be minor ?


SAH should be producing faster -static- trees. But as they mentioned themselves in the paper it's still too complicated to do in hardware, whereas "our" simple heuristic shouldn't pose any problems to do in hardware.
If this can be achieved for real, then all the hassle using updates on the trees -may- be worthless as i doubt that the -updated- SAH trees are as optimal as the per-frame built "custom" trees using "our" own heuristic.

Apart from that we also have a four-splitplane-variation in our code (see last section in the BIH-paper) and this was actually almost as fast as the two-splitplane-thingie for most scenes.
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby Phantom » 23 Jun 2006, 11:47

Ok then I'll address one of fpsunflower's questions:

You need to add a check to see if a potential split is valid. This test reads something like:

if ((left_maximum != right_minimum) || (left_count && right_count)) ALL_OK; else SKIP_CANDIDATE;

This got me from 50% performance (bih vs. kd/sah) to 60% performance.

Maybe you are running into the same problems as I did: Can you verify what kinds of node counts and other stats you get for Sponza and Buddha? Sponza is problematic for me, as it has many axis-aligned triangles. Buddha is better, but even that one doesn't come close to my kd/sah implementation. Although the insane build times, even with a naive hierarchy builder, are certainly nice. ;)

Also, I had some problems grasping the concept of keeping track of two bbox'es during construction. Brick state. Took some explanation from Toxie to sort that out.
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby toxie » 23 Jun 2006, 11:57

Oh and another thing that just popped to my mind:

I've got an additional explanation why you guys don't reach our performance (yet)!
Could it be that the triangle test is slower than ours? Cause i know that most of
you don't use Badouel's/Wald's test anymore, but Pluecker (or something).
This may be okay for the kD, as the number of triangles to test with is rather small.
With the BIH (or any kind of BVH) the triangle testing gets more dominant due to overlaps
and larger node-volumes. My triangle-test is optimized to the max (even quiet some
bit faster then Badouel's/Wald's original version) so maybe this is also a "problem"?!
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby Phantom » 23 Jun 2006, 12:09

Not in my case. I simply get (far) more nodes than you report, or, when I increase the mimimum prims per leaf count, I get (far) more ray/tri intersections. The reason I use pluecker at the moment is that this test is cheaper than Wald's when the ray misses a triangle. Besides, it's more stable, numerically. You can check out the details in Carsten Benthin's thesis on the OpenRT ray tracer:
http://graphics.cs.uni-sb.de/~benthin/final.pdf
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby tbp » 23 Jun 2006, 12:09

Another administrative note:
Hmmk, toxie, it would be simpler for me if you provided a list of people that already have da paper so i can grant them access to that section thusly so i won't have to move threads around :)
User avatar
tbp
Overlord
 
Posts: 1445
Location: France

Postby toxie » 23 Jun 2006, 12:16

tbp: greenhybrid, Ho Ho, playmesumch00ns, fpsunflower, Shadow007 all mailed me after someone posted about the BIH paper and they all received it by now.
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby toxie » 23 Jun 2006, 12:23

phantom: Of course i've read Benthin's thesis, but all my experiments with different tri-intersection-algo's
and my own variant ALWAYS were slower with the kD. I even wonder why one would want to include Pluecker
into the kD as most of the time a ray HITS the triangle in a leaf. In a BVH it's more the opposite. Chances are
good that you'll miss the triangle or the hitpoint is behind an already found one!
So all i want to say: Maybe give the old barycentric test a try if it is faster for a BVH?! :)
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby toxie » 23 Jun 2006, 12:43

sorry.. yet another post..

tbp: and lycium also.
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby Phantom » 23 Jun 2006, 12:58

Here's some pseudo code, fpsunflower can you compare, Toxie can you verify?

Code: Select all
Recurse( Node n, int left, right, aabb nodebox, splitbox )
{
   // termination
   if ((right - left) < 2)
   {
      turn n into a leaf
      return
   }
   // init ranges, axis and split plane candidate
   leftmax = -inf, rightmin = inf
   axis = splitbox.LongestSide()
   candidate = splitbox.centre[axis]
   pivotidx = left
   // partition
   for all prims in range left..right
   {
      if (prim.centre < candidate)
      {
         if (prim.bbox.max[axis] > leftmax)
            leftmax = prim.bbox.max[axis]
         swap( primlist[pivotidx], primlist[curidx] )
         pivotidx++
      }
      else
      {
         if (prim.bbox.min[axis] < rightmin)
            rightmin = prim.bbox.min[axis]
      }
   }
   // subdivide
   Node lnode, rnode = NewNodePair()
   n->clip = { leftmax, rightmin }
   n->axis = axis
   aabb leftnodebox = nodebox, leftnodebox.max[axis] = leftmax
   aabb rightnodebox = nodebox, rightnodebox.min[axis] = rightmin
   aabb leftsplitbox = rightsplitbox = splitbox
   leftsplitbox.max[axis] = rightsplitbox.min[axis] = candidate
   // recurse
   Recurse( lnode, left, pivotidx - 1, leftnodebox, leftsplitbox )
   Recurse( rnode, pivotidx, right, rightnodebox, rightsplitbox )
}


To catch bad candidates, the lines

Code: Select all
   axis = splitbox.LongestSide()
   candidate = splitbox.centre[axis]


are actually in a while loop; if the candidate plane is 'greater than' the nodebox (for the determined axis), I split the splitbox in half along 'axis' and retry with the lef half; otherwise I split it in half and retry with the right half.

And finally there is 'Toxie's test' that looks like this:

if ((leftmax != rightmin) || (leftcount && rightcount))

If this test fails, the splitbox is cut in half and the side that has all the primitives is processed using another call to Recurse.

I believe this is exactly what you describe in the paper. Please let me know if it's not. :)

EDIT: splitbox and nodebox are both initialized with the full scene bbox. left and right are initialized with 0 and the index of the last primitive.
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby toxie » 23 Jun 2006, 14:01

Looks wonderful!
(Actually looks far better than mine ;)
So i don't get it why this produces much more nodes than my code. :(
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby Phantom » 23 Jun 2006, 14:30

O great.
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby tbp » 23 Jun 2006, 14:37

In my eyes the winning argument for Pluecker is behaviour wrt aliasing.

PS: graced them all.
User avatar
tbp
Overlord
 
Posts: 1445
Location: France

Postby fpsunflower » 23 Jun 2006, 15:09

toxie wrote:You just need to track all cases where an empty leaf is cut off AND the other child-bbox is the same size as the father-bbox.
Or what do you mean by N copies of the same triangle??!


The case I was trying to explain is a bit contrived, but I don't really see how to deal with it correctly. Lets say you accidentally load your model twice. Now each triangle has an exact duplicate of itself. At some point your routine will get down to a node where you are trying to split these two copies. No matter how you go through the list, your routine will always put the two prims on the same side. Splitting more and more will never help. So how do you bail out from that case?

Storing compressed leaves (2x4 bytes vs. 3x4 bytes for inner nodes). [....]

Actually it only means trouble for the tree construction (at least in my case).

Trouble indeed :P So I am right to assume that in some cases you have to keep a leaf's extra word just to keep its sibling aligned?


Phantom: I think we're pretty close, though I took out the while loop you describe because I couldn't get it to stop properly (see case above). I also don't see how you avoid going one step too far. It makes sense to keep looping while you are cutting away empty space (to avoid creating extra levels of the same empty space) but you need to be able to break before you do the next valid cut - or worse, before you reach some unsplittable prims. The "toxie test" doesn't seem to help with the cases I'm thinking of.


And regarding triangle tests - I'm still using wald's right now, but I was planning on moving to pluecker for robustness reasons.

Glad to see other people have started already :)
fpsunflower
 
Posts: 193

Postby toxie » 23 Jun 2006, 15:47

No matter how you go through the list, your routine will always put the two prims on the same side.


For that case you could have a maximum retry variable. As the split is omitted you'll try to split the same triangles over and over again. If you tried to split a node on one axis X times and it always failed due to "toxies test" (sounds stupid, btw. ;) , stop. (X = very small)

Trouble indeed So I am right to assume that in some cases you have to keep a leaf's extra word just to keep its sibling aligned?


No. (Would love to go into details, but can't. :(


Oh, and btw.: Don't know if i can check ompf too often this weekend/next week, so please don't be "angry" if my replies are delayed by some days. ;(
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby Phantom » 23 Jun 2006, 19:30

I got a small fix that improves matters:

Add these tests:

If (leftcount == 0) && (minright == nodebox.min) retry with right half of scenebox
If (rightcount == 0) && (maxleft == nodebox.max) retry with left half of scenebox

Or in other words: Using these tests, one could discard a split plane candidate after partitioning.

Here's my current full code:

Code: Select all
void BIHierarchy::Recurse( BIHNode* a_Root, int a_Left, int a_Right, aabb a_Box, aabb a_SBox, int a_Depth )
{
restart:
   if (((a_Right - a_Left) < 2) || (a_Depth > 60))
   {
      // turn 'a_Root' into a leaf
doleaf:   a_Root->data = (a_Left << 2) + 3;
      a_Root->items = (a_Right - a_Left) + 1;
      return;
   }
   // partition
   float maxa = -INFINITY, minb = INFINITY, candidate;
   int i, axis, pivotidx = a_Left;
   for ( i = 0; i < 3; i++ )
   {
      axis = a_SBox.LongestSide();
      candidate = a_SBox.Centre( axis );
      if (candidate >= a_Box.Max( axis )) a_SBox.Max( axis ) = candidate;
      else if (candidate <= a_Box.Min( axis )) a_SBox.Min( axis ) = candidate;
      else break;
   }
   for ( i = a_Left; i <= a_Right; i++ )
   {
      aabb primbox = m_GOIA[i]->CalcBBox();
      if (primbox.Centre( axis ) < candidate)
      {
         if (primbox.Max( axis ) > maxa) maxa = primbox.Max( axis );
         SWAPPRIM( pivotidx, i );
         pivotidx++;
      }
      else if (primbox.Min( axis ) < minb) minb = primbox.Min( axis );
   }
   // extra test
   aabb lsbox = a_SBox, rsbox = a_SBox;
   lsbox.Max( axis ) = rsbox.Min( axis ) = candidate;
   if ((pivotidx == a_Left) && (minb == a_Box.Min( axis )))
   {
      // left side is empty
      a_SBox = rsbox;
      a_Depth++;
      goto restart;
   }
   if ((pivotidx > a_Right) && (maxa == a_Box.Max( axis )))
   {
      // right side is empty
      a_SBox = lsbox;
      a_Depth++;
      goto restart;
   }
   // Toxie's bug catcher
   if ((maxa != minb) || ((pivotidx > a_Left) && (pivotidx <= a_Right)))
   {
      // update tree node box
      aabb lbox = a_Box, rbox = a_Box;
      lbox.Max( axis ) = maxa;
      rbox.Min( axis ) = minb;
      // recurse
      BIHNode* lnode = MManager::NewBIHNodePair();
      a_Root->Init( (int)lnode + axis, maxa, minb );
      // recurse
      if (a_Depth < 8) DrawSplitPlane( lsbox, axis, candidate );
      Recurse( lnode, a_Left, pivotidx - 1, lbox, lsbox, a_Depth + 1 );
      Recurse( lnode + 1, pivotidx, a_Right, rbox, rsbox, a_Depth + 1 );
   }
   else goto doleaf;
}
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby Phantom » 23 Jun 2006, 19:38

fpsunflower:

If you can compile Sponza, could you give me some stats to compare?

I'm subdividing if the prim count is 3 or more; max depth is 100 (although with the just proposed tests I never reach that anymore, thankfully).
For Sponza this gives me a very high number of ray/tri intersections (like 45 or more). Node counts seem reasonable now: 129665 total nodes, 64833 leafs, 64832 interior nodes, 26480 empty nodes (that's for 76155 prims).

By the way, I'm starting to suspect that this whole issue has something to do with the fact that Toxie is holding on to 'volume elements' up to a certain depth (i.e., as long as a tree node contains more than a single grid voxel). By treating voxel elements as a whole, splitting grid elements is discouraged. This might be good, as the scene box matches the grid due to the way it is split in two each time. I skipped implementing this part of the paper, as a per-primitive approach seemed simpler to start with. But as far as I can see, this is the only major difference with Toxie's approach right now. I fail to see how this could explain my 45 ints/tri in Sponza vs. Toxie's 12-13... Int/tri for Buddha seems OK, so it's gotta have something to do with all those axis aligned polies.
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby fpsunflower » 23 Jun 2006, 22:38

Phantom wrote:I got a small fix that improves matters:

Add these tests:

If (leftcount == 0) && (minright == nodebox.min) retry with right half of scenebox
If (rightcount == 0) && (maxleft == nodebox.max) retry with left half of scenebox


That looks very close to what I have at the moment. I did have those tests in as well but I think they are wrong. Basically you will be going along, looping while you are clipping away empty space, but the stopping point is when you find a new thing to split. Maybe I need to think through this more, but what happens if that last "valid" split is on a different axis from the one you were cutting empty space out of? Now you've created a node for something which is surrounded by empty space (presumbly mainly along one axis) but you are making a split around a totally different axis, which partitions objects into each side. In ascii art, it might go something like:

Code: Select all

scene at initial call to Recurse:
+-----------------------*-----------------------------+
|                       *                             |
| (obj)                 *                             |
|                       *                             |
|                       *                             |
| (obj)                 *                             |
|                       *                             |
|                       *                             |
+-----------------------*-----------------------------+

split plane moves left in a loop:

+-----------*-----------------------------------------+
|           *                                         |
| (obj)     *                                         |
|           *                                         |
|           *                                         |
| (obj)     *                                         |
|           *                                         |
|           *                                         |
+-----------*-----------------------------------------+

keep moving left (pretend like thats half):
+-------*---------------------------------------------+
|       *                                             |
| (obj) *                                             |
|       *                                             |
|       *                                             |
| (obj) *                                             |
|       *                                             |
|       *                                             |
+-------*---------------------------------------------+
woops, now longest axis is different - try it:
+-----------------------------------------------------+
|                                                     |
| (obj)                                               |
*******************************************************
|                                                     |
| (obj)                                               |
|                                                     |
|                                                     |
+-----------------------------------------------------+
looks good - make the node:
+-----------------------------------------------------+
|                                                     |
| (obj)                                               |
[[---------------------------------------------------]]
[[---------------------------------------------------]]
| (obj)                                               |
|                                                     |
|                                                     |
+-----------------------------------------------------+




obviously that last thing is not what we wanted. We really wanted to stop at the one before and create that nice big chunck of empty space

Does that make sense?


Is sponza sitting in ra2 form somewhere? I have a copy but I dont know how many conversions its been through so coords/tri count might be different.
fpsunflower
 
Posts: 193

Postby fpsunflower » 23 Jun 2006, 22:54

toxie wrote:For that case you could have a maximum retry variable. As the split is omitted you'll try to split the same triangles over and over again. If you tried to split a node on one axis X times and it always failed due to "toxies test" (sounds stupid, btw. ;) , stop. (X = very small)

Thats kind of what I was thinking of. It really is the same as the max-depth test, just inside that empty space culling/split retry loop. I just was wondering if there was a smarter way to do it.

I was also thinking that since the loop is triggered by two situations: culling empty space or unsplittable objects, you need to be able to differentiate between them easily to know if you should create a leaf or a node when you finally break out of the loop (related to post above - I think remembering the result of the previous split will be necessary).


toxie wrote:No. (Would love to go into details, but can't. :(

No worries - we'll figure it out ;) I was merely trying to think ahead at some of the more advanced optimizations you mention in the paper. Phantom - any thoughts of those?
fpsunflower
 
Posts: 193

Postby Phantom » 24 Jun 2006, 06:54

obviously that last thing is not what we wanted. We really wanted to stop at the one before and create that nice big chunck of empty space


Yeah that's what I thought also. Omitting split candidates from the split box does not really omit them, it just splits the same current node box by a different plane. Which is not always better, like you just illustrated.
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby fpsunflower » 24 Jun 2006, 07:04

Phantom wrote:fpsunflower: If you can compile Sponza, could you give me some stats to compare?


My sponza doesn't have the same triangle count, instead here are some stats for the ra2 pack we should all have access to. I haven't put rendertime stats in yet. I hope these are somehow helpfull.

Code: Select all
clown:
[SCN] Scene stats:
[SCN]   * Primitives:          44154
[BIH] Storing bounding boxes ...
[BIH] Creating tree ...
[BIH] Trimming tree ...
[BIH] Tree stats:
[BIH]   * Nodes:          65628
[BIH]   * Leaves:         65629
[BIH]   * Objects: min    0
[BIH]              avg    0.67
[BIH]            avg(n>0) 1.00
[BIH]              max    2
[BIH]   * Depth:   min    6
[BIH]              avg    21.79
[BIH]              max    7
[BIH]   * Leaves w/: N=0   32%
[BIH]                N=1   67%
[BIH]                N=2    0%
[BIH]                N=3    0%
[BIH]                N=4    0%
[BIH]                N>4    0%


FairyForestF160:
[SCN] Scene stats:
[SCN]   * Primitives:          174117
[BIH] Storing bounding boxes ...
[BIH] Creating tree ...
[BIH] Trimming tree ...
[BIH] Tree stats:
[BIH]   * Nodes:          261972
[BIH]   * Leaves:         261973
[BIH]   * Objects: min    0
[BIH]              avg    0.66
[BIH]            avg(n>0) 1.03
[BIH]              max    4
[BIH]   * Depth:   min    6
[BIH]              avg    31.55
[BIH]              max    6
[BIH]   * Leaves w/: N=0   35%
[BIH]                N=1   62%
[BIH]                N=2    2%
[BIH]                N=3    0%
[BIH]                N=4    0%
[BIH]                N>4    0%


happy_buddha_vrip:
[SCN] Scene stats:
[SCN]   * Primitives:          1087716
[BIH] Storing bounding boxes ...
[BIH] Creating tree ...
[BIH] Trimming tree ...
[BIH] Tree stats:
[BIH]   * Nodes:          1636897
[BIH]   * Leaves:         1636898
[BIH]   * Objects: min    0
[BIH]              avg    0.66
[BIH]            avg(n>0) 1.01
[BIH]              max    4
[BIH]   * Depth:   min    6
[BIH]              avg    27.05
[BIH]              max    7
[BIH]   * Leaves w/: N=0   34%
[BIH]                N=1   64%
[BIH]                N=2    0%
[BIH]                N=3    0%
[BIH]                N=4    0%
[BIH]                N>4    0%


kitchenTransformed:
[SCN] Scene stats:
[SCN]   * Primitives:          110561
[BIH] Storing bounding boxes ...
[BIH] Creating tree ...
[BIH] Trimming tree ...
[BIH] Tree stats:
[BIH]   * Nodes:          174288
[BIH]   * Leaves:         174289
[BIH]   * Objects: min    0
[BIH]              avg    0.63
[BIH]            avg(n>0) 1.00
[BIH]              max    2
[BIH]   * Depth:   min    4
[BIH]              avg    31.93
[BIH]              max    5
[BIH]   * Leaves w/: N=0   36%
[BIH]                N=1   63%
[BIH]                N=2    0%
[BIH]                N=3    0%
[BIH]                N=4    0%
[BIH]                N>4    0%
[BIH]   * Creation time:  146ms


MengerSponge_scaled_cubes -- 1 ...
[SCN] Scene stats:
[SCN]   * Primitives:          1920000
[BIH] Storing bounding boxes ...
[BIH] Creating tree ...
[BIH] Trimming tree ...
[BIH] Tree stats:
[BIH]   * Nodes:          1306771
[BIH]   * Leaves:         1306772
[BIH]   * Objects: min    0
[BIH]              avg    1.47
[BIH]            avg(n>0) 2.00
[BIH]              max    2
[BIH]   * Depth:   min    9
[BIH]              avg    22.89
[BIH]              max    9
[BIH]   * Leaves w/: N=0   26%
[BIH]                N=1    0%
[BIH]                N=2   73%
[BIH]                N=3    0%
[BIH]                N=4    0%
[BIH]                N>4    0%


scene6:
[SCN] Scene stats:
[SCN]   * Primitives:          804
[BIH] Storing bounding boxes ...
[BIH] Creating tree ...
[BIH] Trimming tree ...
[BIH] Tree stats:
[BIH]   * Nodes:          867
[BIH]   * Leaves:         868
[BIH]   * Objects: min    0
[BIH]              avg    0.93
[BIH]            avg(n>0) 1.57
[BIH]              max    4
[BIH]   * Depth:   min    4
[BIH]              avg    16.08
[BIH]              max    4
[BIH]   * Leaves w/: N=0   40%
[BIH]                N=1   25%
[BIH]                N=2   33%
[BIH]                N=3    0%
[BIH]                N=4    0%
[BIH]                N>4    0%


q2:
[SCN] Scene stats:
[SCN]   * Primitives:          26366
[BIH] Storing bounding boxes ...
[BIH] Creating tree ...
[BIH] Trimming tree ...
[BIH] Tree stats:
[BIH]   * Nodes:          29379
[BIH]   * Leaves:         29380
[BIH]   * Objects: min    0
[BIH]              avg    0.90
[BIH]            avg(n>0) 1.35
[BIH]              max    3
[BIH]   * Depth:   min    4
[BIH]              avg    23.64
[BIH]              max    5
[BIH]   * Leaves w/: N=0   33%
[BIH]                N=1   42%
[BIH]                N=2   23%
[BIH]                N=3    0%
[BIH]                N=4    0%
[BIH]                N>4    0%


I've implemented what I described in my previous posts and things seem to be relatively robust now (not going down to max depth). I've compiled up to the xyzrgb dragon (7M tris) without a problem. Performance is still lagging pretty far behind the kdtree. I think my trees are still not quite optimal. There's also some optimizing to be done on the traversal front (adding early termination helps alot).

I added the "invisible" empty nodes (they aren't stored -- only implicitly defined by a parent with one infinite clip). This didn't help too much except on memory usage. But bandwidth/cache savings are most likely drowned in Java overhead in my case.
fpsunflower
 
Posts: 193

Postby Phantom » 24 Jun 2006, 07:58

If I remove the test for split candidates that are outside the node bbox, Buddha runs at 75% of kd/sah... So for me, that's an improvement. Also, tree size doesn't grow tremendously that way. Sponza is still pure crap though.
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby phresnel » 24 Jun 2006, 17:21

What's this all about? Did I sleep to long? Someone teared out the time?
Tell me tell me tell me please!
~/www/ | picogen.org | metatrace
- --- / .-.. . - / -- -.-- / ... .. --. -. .- - ..- .-. . / .- .--. .--. . .- .-. / --. . . -.- -.--
User avatar
phresnel
formely known as 'el verde híbrido'
 
Posts: 948
Location: Deutschland.

Postby Phantom » 24 Jun 2006, 17:36

If you're here, you must have read the papers. Well so did we, and we have working implementations. They just need some... tuning. :)
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby fpsunflower » 26 Jun 2006, 03:08

Just got my subversion repository going. The relevant file is:

http://svn.sourceforge.net/viewcvs.cgi/ ... a?view=log

Its a bit obfuscated by the use of Java, but I tried to put comments around the important phases.

I decided to set maxprims/leaf = 2 as this gave me numbers closest to the paper memory usage wise (taking into account non-compressed leaves).

I don't bother storing the object bounding boxes ahead of time. The build is so fast, the memory savings are well worth the extra max3/min3 operations inside the splitting loop. This allows me to build the thai statue quite comfortably.


Phatom, toxie, I'm curious to hear from you if I got all the traversal cases right. It appears to be robust, but its also quite slow (still around half the speed of the kd-tree one). The build code doesn't seem to get stuck on any scene anymore and the output stats seem halfway decent.


Regarding the compression of leaves I was asking about earlier, the only ways I can think of to do this right now are to either store an extra bit on each leaf (is the left child a leaf?). Or to prefetch the left side during the traversal. Both cases require some additional bit-trickery in the traversal to make it go fast. Anyone have any better ideas? Does this even matter that much? Lets say 30% of the leaves are empty (what I typically see), then you are left with a bout a 10% memory savings from compressing the leaves. 16% at best. So unless there is a really clever way of doing the indexing, I would say its probably not really worthwile. toxie, am I missing something?
fpsunflower
 
Posts: 193

Postby toxie » 26 Jun 2006, 12:45

Hey dudes. Have to check your posts in detail later.
But just wanted to say that you can now open a public discussion on the BIH!
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby Phantom » 27 Jun 2006, 12:14

(moved this tread since it's not secret anymore; I think it contains some good info)
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby beason » 28 Jun 2006, 01:07

Toxie, are there unpublished tricks which you specifically know about but are not allowed to discuss, which will prevent us from matching your published results? Sorry, I just don't understand your comment about not being able to discuss details.

Phantom and fpsunflower, are the slowdowns you are seeing in scenes which are reportedly faster (according to the paper)? I thought there was one or two cases where BIH was slower so I'm just checking.
User avatar
beason
SNR police
 
Posts: 434
Location: Los Angeles, CA

Postby toxie » 28 Jun 2006, 07:07

If you take a look at the people/companies we thank, than this should become clear.
So i cannot deliver any suggestions or help that goes beyond the released paper (unfortunately).
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby Phantom » 28 Jun 2006, 10:38

beason wrote:Phantom and fpsunflower, are the slowdowns you are seeing in scenes which are reportedly faster (according to the paper)? I thought there was one or two cases where BIH was slower so I'm just checking.


I checked on my legocar, I got it to run at 70% of kd/sah now. I made some fixes to the ray/tri intersection code to get from 65% to 70%, so perhaps I should indeed reinsert Wald's code to see if that helps.

I also checked the transformed kitchen: It runs at 1/6th of the speed of kd/sah... It does compile in (far) less than a second though.
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby toxie » 28 Jun 2006, 11:03

Phatom, toxie, I'm curious to hear from you if I got all the traversal cases right.


It's right, but in the loop where you loop over the nodes you can actually insert a shortening of the ray interval in two cases to avoid extra work!
Maybe this is also the "problem" in your code, phantom?!
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby Phantom » 28 Jun 2006, 11:05

I'll check, thanks! Quite obscure hint btw. ;)
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby toxie » 28 Jun 2006, 11:07

Lets say 30% of the leaves are empty (what I typically see), then you are left with a bout a 10% memory savings from compressing the leaves. 16% at best. So unless there is a really clever way of doing the indexing, I would say its probably not really worthwile. toxie, am I missing something?


Now i get the point why this is so hard to achieve in your implementation! Cause in Java you cannot use as much "dirty" tricks as in C(++).
But as you pointed out this is more or less a cosmetic change to the code and was also not included in the numbers released in the paper (so leafs=12bytes in the statistics).
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby toxie » 28 Jun 2006, 11:08

@phantom: real sorry, but......! (see above)
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby Phantom » 28 Jun 2006, 11:27

OK it helps, but only marginally. Just to make sure:

You where hinting at shortening the ray segment in case you only traverse the near node or only the far node, right? If you skip the near node, tnear should become the intersection point of the ray with the 'far' split plane, and if you skip the far node, tfar should be the intersection point of the ray with the 'near' split plane.

You know, I once accused Wald of deliberately keeping away certain details. And look what has become of him. ;)

EDIT: Just checked Buddha again, looks like it is *very* close to kd/sah now, all of a sudden. Legocar still at 70%, Sponza & transformed kitchen suck completely, but Buddha rockz. Yay. :)
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby toxie » 28 Jun 2006, 12:03

Okay. For the sake of not being compared to Wald: ;)
In the case you traverse the near one: far = MIN(far,clip[0]);
In the case you traverse the far one: near = MAX(near,clip[1]);

Of course clip[0]/[1] are reversed when the ray direction on that axis is negative.
Also you have to keep the original far value on the stack if you need to go into both.
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby Phantom » 28 Jun 2006, 13:09

Thanks, yes that's what I added. Seems to help quite a lot for Buddha.

Buddha is now at ~90% (that's within 'bad implementation tolerance' limits);
Fairy Forest at ~75%;
Legocar at 70%.

I think a more careful implementation would improve that further (we need tbp to the rescue here). I don't compress leafs yet (not sure yet how to do that efficiently, just added code to skip empty nodes, but even that takes conditional code as far as I can see) and the traversal is condition hell at the moment. I'm quite sure I should be able to improve that.

Looks to me like it's time to ditch kd/sah. BIH is more fault-tolerant, and a lot simpler to implement / maintain. Traversal is spagetti though. Traversing 4x4 rays doesn't help that of course. :)
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby toxie » 28 Jun 2006, 13:22

yay! great implementation then!
looking forward when you reach the 100%! =)

btw: some guys here at the EGSR 06 still believe that the SAH is the holy grail for everything!
and i'm not talking about newbies, actually these guys were all into the GFX business for 15+ years!
(but a pixar guy actually liked the BIH paper, so i don't need to care 'bout the others ;))

and something for all guys that will be on a conference sometime soon:
if you participate in some public discussion, don't forget to start your critics
with one of these buzzwords: "great paper", "nice talk" or "i really liked this work".
quality of the paper doesn't matter, btw. ;)
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby Phantom » 28 Jun 2006, 13:54

Well I can imagine that people don't like to drop their kd-tree compilers for a 100-lines-in-C BIH compiler just that easy. :) It feels too easy. I'm not too sure that kd/sah will be gone by the way; the best implementations of kd/sah roughly take 10 times as much time as BIH (perhaps that's a bit optimistic, but still): That means that if you start rendering very complex scenes, the extra quality of kd/sah might actually become worthwhile again. Just imagine 8x8 samples per pixel, area lights with 64 samples, tons of recursive reflections and so on. If you toss enough rays at it, 9 seconds extra tree build time might not be a problem.

But for real-time rt in games, kd/sah should be a thing of the past for a while. I'm not so sure about full tree rebuilds though. Still feels like a waste.

Great paper btw. ;)
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby Phantom » 28 Jun 2006, 14:10

Buddha: ~92. :)

Need...to...focus on......REAL LIFE... or I'll get fired. :)

Ow btw some raw figures: 92% means I'm shooting 2336k rays/s at the Buddha (1024x1024, reference camera view), on a single core of a 1.66 Core Duo. That's 2.2fps.

Stats for other scenes:
Fairy forest: 1652k rays/s, ~1.6fps;
Scene 6: 3003k rays/s (!), 3fps;
Transformed kitchen: 422k rays/s (!), 0.4fps.

I obviously have a problem with those axis aligned scenes. In those scenes, it doesn't matter if I stop splitting at 2, 3 or 4 polygons, and it doesn't matter wether or not I refine split candidates that are outside the node box. So obviously I have a sub-optimal hierarchy builder.
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby toxi » 29 Jun 2006, 09:20

I agree that the kD is of course by no means dead. It can still amortize (as you said).
But we already work on speeding up scenes where the BIH fails, so.... :)

What i felt pissed off at (see above) that these guys actually didn't believe a word that other heuristics
could deliver reasonable performance. Just like i told: SAH = holy grail for -anything-.

And the problem with the axis aligned scenes could be a problem because of -large- axis aligned triangles?!
It's also a problem within my implementation (as you can tell from some of the statistics actually :(
Cause without the largeness "factor" it wouldn't make any sense as these triangles should/can be enclosed in flat voxels without any problem.

For the update of the tree:
Nobody did say that updates are not possible. In fact it should be fairly easy?!
(identify the leafs where stuff happens and update the splits recursively until you reach the nearest father of all these leafs (or nothing changes anymore))
toxi
 

Postby Phantom » 29 Jun 2006, 11:16

toxi wrote:But we already work on speeding up scenes where the BIH fails, so.... :)


I assume you can't elaborate on that. :(

And the problem with the axis aligned scenes could be a problem because of -large- axis aligned triangles?!
It's also a problem within my implementation (as you can tell from some of the statistics actually :(
Cause without the largeness "factor" it wouldn't make any sense as these triangles should/can be enclosed in flat voxels without any problem.


Perhaps it's an idea to clip large triangles to a coarse grid, doing the initial bucketing stage? That would filter out the really huge ones. I'm not sure if a test for 'really huge' would be sufficiently fast though.
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby toxie » 29 Jun 2006, 12:47

I assume you can't elaborate on that.


Unfortunately i don't even know if this will work, so.... (it's just an idea i got during one of the talks here at EGSR)
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby toxie » 07 Jul 2006, 11:18

If I remove the test for split candidates that are outside the node bbox, Buddha runs at 75% of kd/sah... So for me, that's an improvement. Also, tree size doesn't grow tremendously that way. Sponza is still pure crap though.


This actually only means that you cutoff empty volume on one side.
I myself added a simple test if there is a lot of empty volume in the current node (bbox of the included triangles vs. "real" node bbox as it is stored implicitly in the tree) and then insert an empty volume cutoff on the "best" side (=most volume or bbox surface area cut away). This also results in some percent speedup (but increases setup-time of course).

EDIT: In cases where there is no stuff to cut away (node bbox = tri bbox for example) "your version" (=removal of the test if split inside node bbox) could result in useless nodes created and thus harm some scenes?!
Better you leave here with your head still full of kitty cats and puppy dogs.
User avatar
toxie
Overlord
 
Posts: 1403
Location: Germany

Postby Phantom » 08 Jul 2006, 19:02

I took the plunge and decided to modify my tracer so that it builds a tree for each frame.

Results, on a meagre 1.4 Ghz Pentium-M, single core:

Legocar (10k polies), running at 10fps @ 800x500 pixels (less than 100ms for TTI);
Fairy Forest (174k polies), 800ms for TTI (full rebuild + render, same res).

That's pretty amazing. Raytracing 10k triangle scenes, completely dynamic, at 20fps or more on a decent dual core machine is just awesome. I'll try to spent some time to build a basic ray traced game, with plenty of scene changes.

By the way, my data structures are very easy on the input side right now: The only thing that needs to be 'precalculated' per triangle is the normal. Toxie, did you account for that? I mean, Wald's triaccel structure is not easy to fill, and could add considerably to the frame time if you have to update it for 1M triangles.

EDIT: For the record, the timing for the legocar means that I'm shooting 4.7M rays/s through a (potentially) completely random and dynamic 10k triangle soup. I still can't believe it.
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Postby macdonag » 09 Jul 2006, 09:00

Hi all,
I'm another new guy here, been following the devmaster/flipcode raytracing tutorial off & on while working on my own (basic so far) ray tracer. I've got to say this is an nice interesting, firendly forum. Very impressed with what I've seen, especially lately. Now the mention of 10fps for a dynamic scene of a 1.4 pentium is fantastic news! I'd love to see what could be done on a PS3.... ;-) Some day!

Anyway, I'l maybe stick a WIP image of my own once I've got something less derivative, but in the meantime, I'd love to hear what your thoughts are for a little raytraced game.

Cheers,
good luck,
Graham
macdonag
 
Posts: 4
Location: London, UK

Postby Phantom » 09 Jul 2006, 09:29

Hey MacDonag,

Welcome to the forum! Looking forward to those WIP images.

About ray tracing & games: Since the tutorials on flipcode/devmaster some of us focussed on fast ray tracing of static scenes, using kd-trees and the sah heuristic. Recently, Toxie wrote a paper on 'bounding interval hierarchies', which seem to be a bit less optimal for static scenes, but they can be build incredibly fast (less than a second for the 1 million triangle happy buddha model). This makes it possible to build a tree in 100ms or less for scenes consisting of 10k-100k triangles, which is easily sufficient for a nice looking ray traced game. So in short, there has been quite a breakthrough which took ray tracing from static scenes to dynamic scenes.

About this game: I have some small ideas, but it doesn't really matter; perhaps a 'Pang' clone, since I could use a ton of reflecting spheres for that. Most of the scenery would be dynamic, which is also something I want to show. But in case someone has a good idea (game must be simple, this is not really a game project) let me know.
--------------------------------------------------------------
Arauna - Game-oriented real-time ray tracing
http://igad.nhtv.nl/~bikker
Phantom
Overlord
 
Posts: 1188
Location: Houten, Netherlands

Next

Return to General Development

Who is online

Users browsing this forum: Yahoo [Bot] and 1 guest