cabalistic

December 29, 2017

@lowenz: Great! Btw, am I mistaken, or is the performance now a bit better with VC?

@HMart: You were right on the money, it has something to do with fog lights. Setting r_skipFogLights to 1 "resolves" the problem. I'm looking into why this isn't working.

By the way, do you have the latest graphics drivers installed?

December 28, 2017

Ah, I can reproduce the sky flickering, at least. Hm, I'll look into it. Thanks for spotting it...

December 28, 2017

@HMart: Thanks. Unfortunately, I still can't reproduce it.

In any case, I've prepared a new release which should fix the lightgem. At least it looks good for me. I also slightly changed the way I update the buffers; perhaps it helps with the flickering. Please give it a try and let me know.

https://github.com/fholger/thedarkmod/releases/download/cache_v2/TheDarkMod_VertexCache_v2.zip

December 28, 2017

No, my Intel is fine in this build

Unless this is coupled to some specific settings. Perhaps you could attach your cfg?

December 28, 2017

That's super strange. Flickering is usually a sign that the buffers are written to while they are still rendering. But this really shouldn't be happening, and I've already made them more restrictive than Doom3 BFG is... I have a couple more ideas I can try, I will probably need you to test two specialized builds when I'm ready.

December 27, 2017

See no reason why it shouldn't. If the 2.06 beta works...

December 27, 2017

@nbohr1more: What kind of graphic artifacts? Are they restricted to the lightgem? Do they go away if you disable lightgem rendering with tdm_lg_weak 1?

After thinking about it, the lightgems are probably broken. I'll try to refactor them to fit in the backend/frontend split.

December 27, 2017

Thanks HMart. Is the menu flickering reproducible, i.e. does it happen on every start and go away after mission load?

December 26, 2017

Thanks for the report lowenz! I'll take a closer look into the lightgem. The lightgem is a little awkward because it renders between each frame and thus "violates" the frontend/backend split. So that might have some unintended consequences here.

Shame that the performance dropped, even if just a little. Just for the record, can you tell me your PC specs, and also roughly what graphics settings you are using?

December 26, 2017

@grayman: Fair enough. This is certainly aimed post-2.06.

@lowenz: Probably none. I included it just to be safe.

December 26, 2017

As part of my work on a VR port, I have been spending some time looking at the Doom3 BFG edition code for opportunities to improve performance. A core concept in BFG is a vastly different vertex cache, which I've been working on porting to TDM for a while. I believe it is a worthwhile change and will make porting additional improvements from BFG much easier in the future.

I finally think my port is ready and would like to merge it, but given the scope of the change, I'm looking for:

testers to rule out any unforeseen problems with the change and
reviewers from the team to actually approve or disapprove the change.

Help with testing

For the testers, I've prepared a 64bit Windows test build which can be downloaded here: https://github.com/fholger/thedarkmod/releases/download/cache_v3/TheDarkMod_VertexCache_v3.zip

It requires 2.06 beta (or trunk) assets to run and should be compared to the current beta build. I'm specifically looking for:

problems of any kind (particularly rendering artefacts or issues with custom maps) that are not present on the current beta build
performance changes. From my own testing on a couple of different machines and a few select scenes, I found that performance stays roughly the same with a minor framerate improvement here or there. However, duzenko reported a loss of performance, so I'm looking for a wider assessment of whether this change hurts or helps.

Help with reviewing the changes

You can review the proposed changes here: https://github.com/fholger/thedarkmod/pull/3/files

The change is based on the BFG source code, but I refactored it because the original implementation had a lot of code duplication and functionality that we don't need.

My motivations for changing the vertex cache is as follows:

It allows us to get rid of the secondary shared GL context for the frontend. Less syncing overhead that way.
It allows for further parallelization of the frontend. I've already experimented with this a bit, and I can indeed cut off more frontend drawing time, although the effects are currently limited in most cases because the backend then becomes the bottleneck.
My findings suggest a minor performance improvement from the new implementation
Although not a strict requirement, this change should make it easier to port further improvements from BFG, in particular GPU skinning, which would reduce the amount of vertex cache needed and thus further reduces unnecessary memory syncing between CPU and GPU.

There is, however, a potential drawback:

The new approach always allocates fixed size buffers on the GPU, which has to be sufficiently large to accomodate all maps and scenes. This means that, on average, GPU memory consumption is higher. At the same time, it also places a hard upper limit on the amount of static vertex data that can be used in maps, whereas the current implementation is more flexible. Changing the upper limit requires a code change.

Finally, two open points in the review:

I ported a function idInteraction::CreateStaticInteractions, which is supposed to offload static interactions into the static buffers to not have to (re-)generate them every frame when they are needed. It works in principle, however, when loading a savegame, these interactions are not working for some inexplicable reason. Therefore, this function is not currently called; performance-wise it didn't really make a difference, but I'm still investigating how I can get it to work with savegames.
There is also some ported code that is supposed to create static shadow caches for models. This is commented out right now, because it just doesn't work.

In any case, I'm looking for team member opinions on whether we want this change or not.

Thanks to everyone for helping out!

December 18, 2017

I believe we could implement MSAA for the fbo. Would be cheaper than supersampling and could respect the AA setting.

December 16, 2017

In the meantime, I finally took the time to port my actual VR code from the 2.05 codebase to 2.06. Performance is about the same, as far as I can tell. Not great, because it will cause reprojection at one point or another in most maps, but at least some lighter maps might be very playable.

I also experimented with changing the vertexcache buffers to persistent buffers, i.e. buffers you don't have to unmap to draw from, which eliminates driver syncing overhead. It doesn't improve performance per se, but with the syncing overhead of glUnmapBuffer gone, I can now actually reap some benefits from parallelizing the frontend. Nothing spectacular, but for VR, every bit counts.

Of course, persistent buffers are an OpenGL 4.4 feature, so this probably won't find its way back into trunk.

December 14, 2017

Afaik, uncapped fps sets com_fixedTic, correct? From my understanding, com_fixedTic does indeed result in one game tic run per each frame, which is undesirable. When I originally wrote the multi-threading patch, I did the fps uncapping in a different way, ensuring that game tics run at a constant 60 Hz and only rendering is uncapped. This is now also the behaviour you get when you enable both com_smp and com_fixedTic, and with both enabled I have not had any trouble at higher fps. Without com_smp, though, com_fixedTic is, imho, broken.

December 12, 2017

So I just tested this with the two devices I have with Intel graphics.One is my work notebook (it's technically an Optimus hybrid of onboard Intel 530 and a GTX960m), the other is my GPD Win (Intel Atom with onboard graphics).

First of all, I cannot reproduce any framerate drops in Rightful Property. The GPD Win goes from 19-20 fps to 21-23 fps. The notebook stays just under 50 fps in both cases. Looking at the logs (r_logSmpTimings), I can see a fairly significant reduction in frontend drawing duration on the notebook, which does not increase the frame rate because the backend stays the same and limits it.

Unfortunately, the notebook also shows some serious graphical artifacts with the vertexcache version. I tried different versions of the Intel driver to no avail. The GPD Win is fine. The Nvidia card on the notebook is also fine.

You did not see any visual artifacts, right? They were fairly noticable, so you'd probably have mentioned them. As for the performance, can you give me more information about the graphical settings you are using? Resolution, AA etc. as well as shadow maps or stencil, ... And what build are you using, release x86 or x64?

I'm not sure where to go from here. If the changes cause performance drops or even visual artifacts on some machines, then I'm not sure we want to apply them. But I don't really have a clue how to fix either.

December 11, 2017

Interesting - and disappointing. Thanks for testing. Was this on Rightful Property? What settings did you use for shadows etc.? There's also a commented line in RenderWorld.cpp:1498 that you might try to enable. It pregenerates some of the static interactions and might take some load of the system, but it causes problems with savegames I haven't yet tracked down.

As for the error, I'll have to look into that. Might be some missing cleanups, or just an oversight with the new interactionTable.

December 11, 2017

Well, here's my attempt at parallelizing R_AddModelSurfaces. It works fine, as far as I can tell. Just doesn't help, because the backend (or GL) will still block.

https://github.com/fholger/thedarkmod/pull/2/commits/87880984c894fd2dd68ba87e8bf34785ff5eaf1c

December 11, 2017

Some more careful timing analysis shows that the frontend time does indeed go down in the multithreading scenario. However, the time between two frames (i.e. after backend and frontend finish and until they start again) suddenly increases and eats up any time savings.

I investigated more closely, and what suddenly takes a lot more time is unmapping the vertex caches. If I move that inside the backend, then some other operations still seem to block. My current hypothesis is that even though the backend seemingly finishes early, the GL backend is actually still working. And the next (blocking) GL call thus has to wait. So my seemingly amazing savings after commenting out glGetError might just have been hiding the true backend costs, because that way the backend was waiting for frontend instead of GL...

At this point it's still an open question whether I can turn the multithreading savings into something tangible.

December 11, 2017

Actually, never mind. The scenes I was just testing that were strongly frontend-limited yesterday or suddenly backend-limited, so that's why I see no improvements. Something strange is going on with those backend timings, I don't get it.

Edit: So it seems Afterburner is another thing that somehow eats time on the backend. Very weird...

December 11, 2017

So, I've got a puzzle for you. I experimented some more with parallelization and made some experimental changes to call AddActiveInteraction in R_AddModelSurfaces in parallel or serially depending on a cvar switch. In each case I timed how long it takes. I can clearly see the times go down with the parallel version, yet the time the frontend takes in total hardly changes at all. Goes down slightly in some maps, actually seems to rise in others. Really don't get it.

December 11, 2017

Thanks Diego. I'm afraid a truly playable version is still far away, but I'll do my best If you still want to try and get it work, remember you'll need SteamVR running (no native Oculus support atm) and pay attention to the readme and the entry to autoexec.cfg mentioned there.

@duzenko: I tested briefly with the interaction tri culling, but for me it does not seem to make a discernable difference. It didn't show up in my profiling before, either. It's curious that there's such a difference in what requires processing time...

Also, I encountered a couple of crashes that seem to be related to shadow mapping. I guess I'll need to double-check that the vertexcache changes don't break shadow mapping.

December 10, 2017

Which cvar is interaction tri culling? Shadow maps don't seem to make much of a difference to the frame rate.

I'm currently testing with Rightful Property, Briarwood Mansion and A New Job.

December 10, 2017

In any case, we are now back to the frontend being the bottleneck, which is good. I did some experiments today trying to parallelize R_AddModelSurfaces (which in my profiling is the largest time consumer). However, all my attempts have led to exactly zero performance improvements. I don't know why; might be cache congestion or generally memory access. I read that one of BFG's design changes was to store less in memory and (re-)compute it when needed. I guess we have a lot of work ahead of us...

December 10, 2017

HOLY SHIT!

I just found the most ridiculous bottleneck in the backend, thanks to nSight!

It's glGetErrors. I'm serious. All those GL_CheckErrors() calls are incredibly costly. I just commented out the entire implementation of GL_CheckErrors, and in one of my go-to bottleneck scenes, the backend rendering dropped from taking 8 ms down to 2 ms!! as reported by r_logSmpTimings.

Didn't increase the framerate, because the frontend is still blocking, but now parallelizing the frontend is actually going to be worthwhile.

We should probably hide the implementation of GL_CheckErrors behind either a cvar, or a compiler flag...

This is absurd. I wonder if this is an Nvidia issue? Would be interested to hear if it has a similar impact with your Intel chip.

December 10, 2017

So, Afterburner says ~50% (or less) GPU usage in Rightful Property. To be fair, though, I have a reasonably powerful GPU (GTX 1070) and fairly low settings. Profiling does not reveal any obvious optimization potential in the backend path. The significant portion of CPU time is already taken by the Nvidia GL driver, which means that we would have to optimize/reduce the number of GL calls to get any significant gain.

I'm not entirely certain I understand what you are suggesting, but I think BFG does indeed do something like that, so it might be worth a try.

Sign In

cabalistic

Posts

Joined

Last visited

Days Won

Content Type

Profiles

Forums

Posts posted by cabalistic

Testers and reviewers wanted: BFG-style vertex cache

Testers and reviewers wanted: BFG-style vertex cache

Testers and reviewers wanted: BFG-style vertex cache

Testers and reviewers wanted: BFG-style vertex cache

Testers and reviewers wanted: BFG-style vertex cache

Testers and reviewers wanted: BFG-style vertex cache

Testers and reviewers wanted: BFG-style vertex cache

Testers and reviewers wanted: BFG-style vertex cache

Testers and reviewers wanted: BFG-style vertex cache

Testers and reviewers wanted: BFG-style vertex cache

Testers and reviewers wanted: BFG-style vertex cache

Beta Testing 2.06

I'm working on a VR version - early alpha

Beta Testing 2.06

I'm working on a VR version - early alpha

I'm working on a VR version - early alpha

I'm working on a VR version - early alpha

I'm working on a VR version - early alpha

I'm working on a VR version - early alpha

I'm working on a VR version - early alpha

I'm working on a VR version - early alpha

I'm working on a VR version - early alpha

I'm working on a VR version - early alpha

I'm working on a VR version - early alpha

I'm working on a VR version - early alpha

Browse

Activity