Jump to content
The Dark Mod Forums
Sign in to follow this  
cabalistic

Testers and reviewers wanted: BFG-style vertex cache

Recommended Posts

Good to know, but I'd rather understand the actual problem and fix that :)

Well, I think I sorta mentioned this before but here goes...

 

r_postprocess used to be controlled primarily via game code and was sort of in the class of Doom 3 Mod API

components.

 

Duznenko moved it to the renderer (where it belongs) but left a stub function on the game side to check the cvar state.

 

Because of this, unlike other native render components, there are no startup and shutdown conditions in RenderSystem_Init

for r_postprocess. Likewise, r_useFBO seems to be missing these startup and shutdown definitions there. So I am guessing

that the true fix is to make these start and stop like native render features there.

 

I was hesitant to broach that since you already have some sorta wrapper approach to FBO management and the above changes

may duplicate your work in some ways.


Please visit TDM's IndieDB site and help promote the mod:

 

http://www.indiedb.com/mods/the-dark-mod

 

(Yeah, shameless promotion... but traffic is traffic folks...)

Share this post


Link to post
Share on other sites

As I said before, don't wait for my FBO refactor. I'm not currently making progress on it (due to little time and having to hunt the remaining bugs in the vertex cache). And even when I complete it, it is entirely possible that we postpone it after 2.07...

  • Like 1

Share this post


Link to post
Share on other sites

As I said before, don't wait for my FBO refactor. I'm not currently making progress on it (due to little time and having to hunt the remaining bugs in the vertex cache). And even when I complete it, it is entirely possible that we postpone it after 2.07...

What is our current renderer benchmark? I want to play with GPU-side interaction culling (potentially shadows as well) but on a humble Radeon 560 I am hitting the max fps cap everywhere in Rightful Property.

I'm talking about the driver/draw call/frontend limits, not the fillrate limit (although I have some ideas for that as well).


Amnesty for Bikerdude!

Share this post


Link to post
Share on other sites

Don't waste your time. My Gl4 branch has an experimental GPU occlusion culling feature. But even though it's currently culling too aggressively, there is no performance benefit at all. If we were strongly CPU bound by the frontend, it could replace some of the manual culling. But since we are not, it doesn't look like anything that would help us.

 

Also, keep in mind that such a feature really doesn't belong in 2.07.

  • Like 1

Share this post


Link to post
Share on other sites

Don't waste your time. My Gl4 branch has an experimental GPU occlusion culling feature. But even though it's currently culling too aggressively, there is no performance benefit at all. If we were strongly CPU bound by the frontend, it could replace some of the manual culling. But since we are not, it doesn't look like anything that would help us.

 

Also, keep in mind that such a feature really doesn't belong in 2.07.

It might help on IGP's.

But I also want to know how we test renderer efficiency these days.


Amnesty for Bikerdude!

Share this post


Link to post
Share on other sites

I can compile what I've done, but it doesn't work and I am out of ideas

about why it doesn't work.

 

(Well, roughly, I feel I'm missing something in the order of events stopping and starting

but I cannot identify the step that's out-of-order.)

 

I'll leave this to higher powers unless it's OK to use the toggle-hack for now.


Please visit TDM's IndieDB site and help promote the mod:

 

http://www.indiedb.com/mods/the-dark-mod

 

(Yeah, shameless promotion... but traffic is traffic folks...)

Share this post


Link to post
Share on other sites

It might help on IGP's.

But I also want to know how we test renderer efficiency these days.

Well it doesn't. I tried :)

Anyway, it's another feature that really needs GL4 shaders (because the older technoques to do it are hprribly inefficient in comparison). So really not the right time for that right now.

  • Like 1

Share this post


Link to post
Share on other sites

Well it doesn't. I tried :)

Anyway, it's another feature that really needs GL4 shaders (because the older technoques to do it are hprribly inefficient in comparison). So really not the right time for that right now.

That, again, prompts the question of how you tested it.


Amnesty for Bikerdude!

Share this post


Link to post
Share on other sites

By selecting a few spots in certain maps that bring down the framerate and comparing the before and after. Ideally by using r_logSmpTimings for short amounts at those spots to gain more insight into the effects on frontend and backend.

 

To see whether occlusion culling has positive effects, you'd obviously want a spot where backend dominates (which is almost everywhere for my Intel HD device).

 

And no, this isn't the most scientific way to do it. But given the effort and scope of the change necessary to get occlusion culling working, you'd expect a significant return of investment on those. And there was nothing. (Again, in the current state it even culls too much, so there are a few black spots here and there. Still no benefit.)

  • Like 1

Share this post


Link to post
Share on other sites

Sorry, this build has no VR support at all. The latest VR build can be found here: https://github.com/fholger/thedarkmodvr/releases

However, it is still based on version 2.05 and won't work with 2.06 assets...

 

Ah, thank you for the clarification. So I should basically get a hold of a copy of TDM 2.05 somehow (maybe also available on github?) and just paste the contents of the VR build into that?


My Fan Missions:

   Series:                                                                           Standalone:

Chronicles of Skulduggery 1: Pearls and Swine                     The Night of Reluctant Benefaction

Chronicles of Skulduggery 2: A Precarious Position              Langhorne Lodge

Chronicles of Skulduggery 3: Sacricide [WIP]

 

 

 

Share this post


Link to post
Share on other sites

The easiest way to get 2.05 is:

 

1) Download TDM 2.0 Full from Moddb

 

https://www.moddb.com/mods/the-dark-mod/downloads/the-dark-mod-20-standalone-full-installer

 

2) Run tdm_updater and watch it for each 2.0x to 2.0y package

3) Interrupt the updater after it has completed the 2.04 to 2.05 step

 

You can also manually download the 2.0x to 2.0y packages there:

 

https://www.moddb.com/mods/the-dark-mod/downloads

  • Like 1

Please visit TDM's IndieDB site and help promote the mod:

 

http://www.indiedb.com/mods/the-dark-mod

 

(Yeah, shameless promotion... but traffic is traffic folks...)

Share this post


Link to post
Share on other sites

Hm, why do we have a r_useFenceSync variable now? What would be a good reason not to use fence sync if they are available? If you disable it, you'll have to go through a glFinish call, and you really, really, really don't want to do that, ever?!

Share this post


Link to post
Share on other sites

Hm, why do we have a r_useFenceSync variable now? What would be a good reason not to use fence sync if they are available? If you disable it, you'll have to go through a glFinish call, and you really, really, really don't want to do that, ever?!

This is on by default. I added that for the testing purpose only


Amnesty for Bikerdude!

Share this post


Link to post
Share on other sites

I'm just concerned that we already have way too many cvars affecting rendering behaviour, and this could be confusing for some players experimenting with them. At the very least, you might want to add to the description that this should not be messed with.

 

May I ask what exactly you want to test without fence syncs?

Share this post


Link to post
Share on other sites

I'm just concerned that we already have way too many cvars affecting rendering behaviour, and this could be confusing for some players experimenting with them. At the very least, you might want to add to the description that this should not be messed with.

 

May I ask what exactly you want to test without fence syncs?

What opengl call is blocking CPU.

If some useful work can be done while TDM is waiting for sync fence.


Amnesty for Bikerdude!

Share this post


Link to post
Share on other sites

Basically my concern is this.

I have a scene rendering at 40 fps on my IGP setup.

On my setup backend stage itself is rather short. Intel in general is good at processing opengl commands quickly.

So much of the time CPU is sitting in frontend. But not most of it.

Every other frame the fence sync pauses and waits.

So the frame time is mostly split between waiting for frontend and waiting for GPU sync, which might be detected by driver or OS as low-load situation.

What's bugging me though is that the scene is rather simple - just a staircase. Why can't it do 60fps? Is it too many draw calls? Depth stage critical overdraw? Too much waiting causing CPU and/or GPU to go in power saving?

Where I am going with this? Probably nowhere. But maybe if I think long enough about it I can figure out a trick and get the simple scene render at 60 fps.


Amnesty for Bikerdude!

Share this post


Link to post
Share on other sites

The situation has a very simple answer: your GPU can't keep up.

 

What happens is that the backend submits OpenGL calls for one frame, then finishes that frame. The OpenGL driver, being asynchronous, accepts those requests and sends them to the GPU, which will try its best to render the commands. But the asynchronous nature means that, in principle, the CPU is free to do other stuff. In particular, you can start submitting draw calls for the second frame.

 

However, at the end of the second frame, the triple-buffered vertex cache needs to switch back to the first buffer. And it can only do that safely if the GPU is finished rendering from it. This is what the fence sync enforces. If the fence sync blocks, then it means that the GPU is lagging more than a full frame behind. The fence sync itself is innocent, it's just that there are too many draw commands in the driver's queue and the GPU wasn't fast enough to process them.

 

Imho, in this situation the only sensible thing to do is really to wait for the GPU to catch up. You don't want to run farther away from the GPU because that would just increase the discrepancy further and further. If you want to improve framerates in these situations, you should not look at the CPU, but at shaders (or rather, at the number and complexity of draw calls issued).

Share this post


Link to post
Share on other sites

By the way, this is a perfect situation for a graphical profiler and debugger. You can't use nSight with an Intel GPU, obviously, but there is an Intel equivalent: https://software.intel.com/en-us/gpa

I have no experience with it, but you might want to give it a try to gain insight into why that scene is slow on your GPU.

Share this post


Link to post
Share on other sites

Well, one thing you could do with this cvar is see if fence sync behaves poorly on some drivers

or test the strange case where the system has multi-core and modern GL support but no fence sync

extension and see if com_smp offers any improvement in spite of the glFinish operations.

 

Mostly for morbid curiosity I guess.

  • Like 1

Please visit TDM's IndieDB site and help promote the mod:

 

http://www.indiedb.com/mods/the-dark-mod

 

(Yeah, shameless promotion... but traffic is traffic folks...)

Share this post


Link to post
Share on other sites

com_smp no longer relies on the fence sync, because the frontend no longer accesses OpenGL functions. Still, fence syncs are core in GL 3.2, and even the old generation Intel cards which only go up to Gl 3.1 support it via extension. So that case you describe just doesn't exist :)

And there's no way the fence sync could be implemented so poorly that it is worse than glFinish :D

  • Like 1

Share this post


Link to post
Share on other sites

By the way, this is a perfect situation for a graphical profiler and debugger. You can't use nSight with an Intel GPU, obviously, but there is an Intel equivalent: https://software.intel.com/en-us/gpa

I have no experience with it, but you might want to give it a try to gain insight into why that scene is slow on your GPU.

First thing I tried two years ago

It's only profiling DirectX on Windows !11

 

In the case above it turned out to be low stencil fillrate, which is an actual problem on Intel.

I can't help but want add an option to render shadows in half of main resolution.


Amnesty for Bikerdude!

Share this post


Link to post
Share on other sites

Tested now the lastest vertex cache beta with a new shining Intel i3 8300 + Intel HD 630 iGPU + 2x4 GB Kingston Fury DDR4 2400 (=low-medium config use case)

It's working really well @1920x1080 and maxed-out quality (no FSAA), disabling FBO and soft shadows!

 

The_Dark_Modx64_2018_07_27_12_05_15_950.

Edited by lowenz

Task is not so much to see what no one has yet seen but to think what nobody has yet thought about that which everybody see. - E.S.

Share this post


Link to post
Share on other sites

Let me know if you need some Intel iGPU testing.


Task is not so much to see what no one has yet seen but to think what nobody has yet thought about that which everybody see. - E.S.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...