cabalistic
-
Posts
1579 -
Joined
-
Last visited
-
Days Won
47
Posts posted by cabalistic
-
-
Ah, I can reproduce the sky flickering, at least. Hm, I'll look into it. Thanks for spotting it...
- 1
-
@HMart: Thanks. Unfortunately, I still can't reproduce it.
In any case, I've prepared a new release which should fix the lightgem. At least it looks good for me. I also slightly changed the way I update the buffers; perhaps it helps with the flickering. Please give it a try and let me know.
https://github.com/fholger/thedarkmod/releases/download/cache_v2/TheDarkMod_VertexCache_v2.zip
- 1
-
No, my Intel is fine in this build
Unless this is coupled to some specific settings. Perhaps you could attach your cfg?
-
That's super strange. Flickering is usually a sign that the buffers are written to while they are still rendering. But this really shouldn't be happening, and I've already made them more restrictive than Doom3 BFG is... I have a couple more ideas I can try, I will probably need you to test two specialized builds when I'm ready.
- 2
-
See no reason why it shouldn't. If the 2.06 beta works...
- 1
-
@nbohr1more: What kind of graphic artifacts? Are they restricted to the lightgem? Do they go away if you disable lightgem rendering with tdm_lg_weak 1?
After thinking about it, the lightgems are probably broken. I'll try to refactor them to fit in the backend/frontend split.
- 1
-
Thanks HMart. Is the menu flickering reproducible, i.e. does it happen on every start and go away after mission load?
- 2
-
Thanks for the report lowenz! I'll take a closer look into the lightgem. The lightgem is a little awkward because it renders between each frame and thus "violates" the frontend/backend split. So that might have some unintended consequences here.
Shame that the performance dropped, even if just a little. Just for the record, can you tell me your PC specs, and also roughly what graphics settings you are using?
- 1
-
@grayman: Fair enough. This is certainly aimed post-2.06.
@lowenz: Probably none. I included it just to be safe.
- 1
-
- Popular Post
- Popular Post
As part of my work on a VR port, I have been spending some time looking at the Doom3 BFG edition code for opportunities to improve performance. A core concept in BFG is a vastly different vertex cache, which I've been working on porting to TDM for a while. I believe it is a worthwhile change and will make porting additional improvements from BFG much easier in the future.
I finally think my port is ready and would like to merge it, but given the scope of the change, I'm looking for:
- testers to rule out any unforeseen problems with the change and
- reviewers from the team to actually approve or disapprove the change.
Help with testing
For the testers, I've prepared a 64bit Windows test build which can be downloaded here: https://github.com/fholger/thedarkmod/releases/download/cache_v3/TheDarkMod_VertexCache_v3.zip
It requires 2.06 beta (or trunk) assets to run and should be compared to the current beta build. I'm specifically looking for:
- problems of any kind (particularly rendering artefacts or issues with custom maps) that are not present on the current beta build
- performance changes. From my own testing on a couple of different machines and a few select scenes, I found that performance stays roughly the same with a minor framerate improvement here or there. However, duzenko reported a loss of performance, so I'm looking for a wider assessment of whether this change hurts or helps.
Help with reviewing the changes
You can review the proposed changes here: https://github.com/fholger/thedarkmod/pull/3/files
The change is based on the BFG source code, but I refactored it because the original implementation had a lot of code duplication and functionality that we don't need.
My motivations for changing the vertex cache is as follows:
- It allows us to get rid of the secondary shared GL context for the frontend. Less syncing overhead that way.
- It allows for further parallelization of the frontend. I've already experimented with this a bit, and I can indeed cut off more frontend drawing time, although the effects are currently limited in most cases because the backend then becomes the bottleneck.
- My findings suggest a minor performance improvement from the new implementation
- Although not a strict requirement, this change should make it easier to port further improvements from BFG, in particular GPU skinning, which would reduce the amount of vertex cache needed and thus further reduces unnecessary memory syncing between CPU and GPU.
There is, however, a potential drawback:
- The new approach always allocates fixed size buffers on the GPU, which has to be sufficiently large to accomodate all maps and scenes. This means that, on average, GPU memory consumption is higher. At the same time, it also places a hard upper limit on the amount of static vertex data that can be used in maps, whereas the current implementation is more flexible. Changing the upper limit requires a code change.
Finally, two open points in the review:
- I ported a function idInteraction::CreateStaticInteractions, which is supposed to offload static interactions into the static buffers to not have to (re-)generate them every frame when they are needed. It works in principle, however, when loading a savegame, these interactions are not working for some inexplicable reason. Therefore, this function is not currently called; performance-wise it didn't really make a difference, but I'm still investigating how I can get it to work with savegames.
- There is also some ported code that is supposed to create static shadow caches for models. This is commented out right now, because it just doesn't work.
In any case, I'm looking for team member opinions on whether we want this change or not.
Thanks to everyone for helping out!
- 6
-
I believe we could implement MSAA for the fbo. Would be cheaper than supersampling and could respect the AA setting.
-
- Popular Post
- Popular Post
In the meantime, I finally took the time to port my actual VR code from the 2.05 codebase to 2.06. Performance is about the same, as far as I can tell. Not great, because it will cause reprojection at one point or another in most maps, but at least some lighter maps might be very playable.
I also experimented with changing the vertexcache buffers to persistent buffers, i.e. buffers you don't have to unmap to draw from, which eliminates driver syncing overhead. It doesn't improve performance per se, but with the syncing overhead of glUnmapBuffer gone, I can now actually reap some benefits from parallelizing the frontend. Nothing spectacular, but for VR, every bit counts.
Of course, persistent buffers are an OpenGL 4.4 feature, so this probably won't find its way back into trunk.
- 5
-
Afaik, uncapped fps sets com_fixedTic, correct? From my understanding, com_fixedTic does indeed result in one game tic run per each frame, which is undesirable. When I originally wrote the multi-threading patch, I did the fps uncapping in a different way, ensuring that game tics run at a constant 60 Hz and only rendering is uncapped. This is now also the behaviour you get when you enable both com_smp and com_fixedTic, and with both enabled I have not had any trouble at higher fps. Without com_smp, though, com_fixedTic is, imho, broken.
- 2
-
So I just tested this with the two devices I have with Intel graphics.One is my work notebook (it's technically an Optimus hybrid of onboard Intel 530 and a GTX960m), the other is my GPD Win (Intel Atom with onboard graphics).
First of all, I cannot reproduce any framerate drops in Rightful Property. The GPD Win goes from 19-20 fps to 21-23 fps. The notebook stays just under 50 fps in both cases. Looking at the logs (r_logSmpTimings), I can see a fairly significant reduction in frontend drawing duration on the notebook, which does not increase the frame rate because the backend stays the same and limits it.
Unfortunately, the notebook also shows some serious graphical artifacts with the vertexcache version. I tried different versions of the Intel driver to no avail. The GPD Win is fine. The Nvidia card on the notebook is also fine.
You did not see any visual artifacts, right? They were fairly noticable, so you'd probably have mentioned them. As for the performance, can you give me more information about the graphical settings you are using? Resolution, AA etc. as well as shadow maps or stencil, ... And what build are you using, release x86 or x64?
I'm not sure where to go from here. If the changes cause performance drops or even visual artifacts on some machines, then I'm not sure we want to apply them. But I don't really have a clue how to fix either.
- 1
-
Interesting - and disappointing. Thanks for testing. Was this on Rightful Property? What settings did you use for shadows etc.? There's also a commented line in RenderWorld.cpp:1498 that you might try to enable. It pregenerates some of the static interactions and might take some load of the system, but it causes problems with savegames I haven't yet tracked down.
As for the error, I'll have to look into that. Might be some missing cleanups, or just an oversight with the new interactionTable.
- 1
-
Well, here's my attempt at parallelizing R_AddModelSurfaces. It works fine, as far as I can tell. Just doesn't help, because the backend (or GL) will still block.
https://github.com/fholger/thedarkmod/pull/2/commits/87880984c894fd2dd68ba87e8bf34785ff5eaf1c
- 2
-
Some more careful timing analysis shows that the frontend time does indeed go down in the multithreading scenario. However, the time between two frames (i.e. after backend and frontend finish and until they start again) suddenly increases and eats up any time savings.
I investigated more closely, and what suddenly takes a lot more time is unmapping the vertex caches. If I move that inside the backend, then some other operations still seem to block. My current hypothesis is that even though the backend seemingly finishes early, the GL backend is actually still working. And the next (blocking) GL call thus has to wait. So my seemingly amazing savings after commenting out glGetError might just have been hiding the true backend costs, because that way the backend was waiting for frontend instead of GL...
At this point it's still an open question whether I can turn the multithreading savings into something tangible.
- 1
-
Actually, never mind. The scenes I was just testing that were strongly frontend-limited yesterday or suddenly backend-limited, so that's why I see no improvements. Something strange is going on with those backend timings, I don't get it.
Edit: So it seems Afterburner is another thing that somehow eats time on the backend. Very weird...
- 1
-
So, I've got a puzzle for you. I experimented some more with parallelization and made some experimental changes to call AddActiveInteraction in R_AddModelSurfaces in parallel or serially depending on a cvar switch. In each case I timed how long it takes. I can clearly see the times go down with the parallel version, yet the time the frontend takes in total hardly changes at all. Goes down slightly in some maps, actually seems to rise in others. Really don't get it.
- 1
-
Thanks Diego. I'm afraid a truly playable version is still far away, but I'll do my best If you still want to try and get it work, remember you'll need SteamVR running (no native Oculus support atm) and pay attention to the readme and the entry to autoexec.cfg mentioned there.
@duzenko: I tested briefly with the interaction tri culling, but for me it does not seem to make a discernable difference. It didn't show up in my profiling before, either. It's curious that there's such a difference in what requires processing time...
Also, I encountered a couple of crashes that seem to be related to shadow mapping. I guess I'll need to double-check that the vertexcache changes don't break shadow mapping.
-
Which cvar is interaction tri culling? Shadow maps don't seem to make much of a difference to the frame rate.
I'm currently testing with Rightful Property, Briarwood Mansion and A New Job.
-
In any case, we are now back to the frontend being the bottleneck, which is good. I did some experiments today trying to parallelize R_AddModelSurfaces (which in my profiling is the largest time consumer). However, all my attempts have led to exactly zero performance improvements. I don't know why; might be cache congestion or generally memory access. I read that one of BFG's design changes was to store less in memory and (re-)compute it when needed. I guess we have a lot of work ahead of us...
-
HOLY SHIT!
I just found the most ridiculous bottleneck in the backend, thanks to nSight!
It's glGetErrors. I'm serious. All those GL_CheckErrors() calls are incredibly costly. I just commented out the entire implementation of GL_CheckErrors, and in one of my go-to bottleneck scenes, the backend rendering dropped from taking 8 ms down to 2 ms!! as reported by r_logSmpTimings.
Didn't increase the framerate, because the frontend is still blocking, but now parallelizing the frontend is actually going to be worthwhile.
We should probably hide the implementation of GL_CheckErrors behind either a cvar, or a compiler flag...
This is absurd. I wonder if this is an Nvidia issue? Would be interested to hear if it has a similar impact with your Intel chip.
- 3
-
So, Afterburner says ~50% (or less) GPU usage in Rightful Property. To be fair, though, I have a reasonably powerful GPU (GTX 1070) and fairly low settings. Profiling does not reveal any obvious optimization potential in the backend path. The significant portion of CPU time is already taken by the Nvidia GL driver, which means that we would have to optimize/reduce the number of GL calls to get any significant gain.
I'm not entirely certain I understand what you are suggesting, but I think BFG does indeed do something like that, so it might be worth a try.
Testers and reviewers wanted: BFG-style vertex cache
in The Dark Mod
Posted
@lowenz: Great! Btw, am I mistaken, or is the performance now a bit better with VC?
@HMart: You were right on the money, it has something to do with fog lights. Setting r_skipFogLights to 1 "resolves" the problem. I'm looking into why this isn't working.
By the way, do you have the latest graphics drivers installed?