Jump to content
The Dark Mod Forums
Sign in to follow this  
cabalistic

I'm working on a VR version - early alpha

Recommended Posts

There is no overhead - you have to transfer the model matrices, anyway. Whether by glUniform or buffer. In fact, buffer upload is most likely faster, because you only have a single upload for the batch instead of repeatedly setting the glUniform. It takes a little more GPU memory, but it's not dramatic.

Note that UBOs have an array size limit of I think around 500 or so. For larger collections, you want to use SSBOs, which is what I did in my experiments.

Share this post


Link to post
Share on other sites
10 minutes ago, cabalistic said:

There is no overhead - you have to transfer the model matrices, anyway. Whether by glUniform or buffer. In fact, buffer upload is most likely faster, because you only have a single upload for the batch instead of repeatedly setting the glUniform. It takes a little more GPU memory, but it's not dramatic.

Note that UBOs have an array size limit of I think around 500 or so. For larger collections, you want to use SSBOs, which is what I did in my experiments.

I think that even cap of 500 draws per call is not a problem. It would be awesome sauce to achieve even that.

But it will also require a separate depth shader for uniform blocks? Or unify depth pass so that it always uses a UBO, single or multi draw?

Now that we're using GLSL includes, that matrix stuff is shared across all shaders. I suppose we'll need some kind of #ifdef for shaders with UBO model/view matrices and regular uniforms? On the other hand, since it's already centralized like that, we might as well switch them all to UBO.


Amnesty for Bikerdude!

Share this post


Link to post
Share on other sites

Yes, you need a new set of shaders, no way around that. I'm not sure if you can make non-multipass use that same set on a GL3 feature base. Anyway, I used a single depth multi-draw command for all non-transparent objects, and then rendered the transparent ones classically on top. Since they use actual textures for drawing, it's not as easy to get them into a multi-draw call. But even if you do, since their fragment shader is much more costly, it still makes sense to have them separated from the solid ones and use a specialized fragment shader.

Share this post


Link to post
Share on other sites
1 minute ago, cabalistic said:

Yes, you need a new set of shaders, no way around that. I'm not sure if you can make non-multipass use that same set on a GL3 feature base. Anyway, I used a single depth multi-draw command for all non-transparent objects, and then rendered the transparent ones classically on top. Since they use actual textures for drawing, it's not as easy to get them into a multi-draw call. But even if you do, since their fragment shader is much more costly, it still makes sense to have them separated from the solid ones and use a specialized fragment shader.

What about per-material (and potentially per-surface) polygon offset?


Amnesty for Bikerdude!

Share this post


Link to post
Share on other sites

Anything that applied an offset, I rendered separately (classically). Eventually, I think we need to migrate away from using the GL functions and doing our own offsetting in vertex shaders based on parameters. For one, it enables to use them in multi draws, and for another, they would actually be predictable. The GL polygon offsets are not - their effects can vary among GPUs and drivers.

  • Like 1

Share this post


Link to post
Share on other sites

I finally fixed the speed bug and did a quick test with MultiDraw.

It seems the nVidia driver does a good job of batching draw calls internally. It took 50K draws per frame (and resolution down to 540p) to start seeing difference in fps. You might get a slightly different result on a 1070 but for now it looks like a micro optimization.


Amnesty for Bikerdude!

Share this post


Link to post
Share on other sites

It's not about batching - it's about avoiding driver call overhead. Just to be clear, what it achieves is save CPU time in the backend, which in turn will get data to render to the GPU faster and avoid that the GPU needlessly idles. In order to profit from this, the GPU needs to be starving in the first place. If you have a poor GPU and run it at demanding settings, then yeah, it's possible you won't see much of an effect, because it wasn't the bottleneck. (It's also possible you're doing something wrong - you'd have to measure carefully with nSight to try and catch what's going on.)

But in my experiments, where I replaced depth and stencil shadow draws with multi draws, the effects were pretty clear. They cut down CPU time for those areas significantly and as a result allowed the GPU to render more quickly. Of course, stencil (and particularly depth) are often not the most significant parts of the scene, so the overall effect on FPS is not gigantic (but it was still measurable).

  • Like 1

Share this post


Link to post
Share on other sites

How do you use nSight to profile? I tried 'Capture frame' but it's just stuck on 30% eating cpu forever. I tried leaving it for a day but found the laptop rebooted in the evening.


Amnesty for Bikerdude!

Share this post


Link to post
Share on other sites

Capture frame should just work. If it's not, I'm afraid I don't know how to help you with that. You could try the standalone version of nSight, it worked better for me than the Visual Studio integrated one.

  • Like 1

Share this post


Link to post
Share on other sites
15 hours ago, duzenko said:

How do you use nSight to profile? I tried 'Capture frame' but it's just stuck on 30% eating cpu forever. I tried leaving it for a day but found the laptop rebooted in the evening.

You can also try renderDoc it has the benefit that it is GPU and OS agnostic. 

 

Handmade hero coder experience with renderDoc 

 

Edited by HMart
  • Like 2

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...