Jump to content
The Dark Mod Forums

cabalistic

Development Role
  • Posts

    1579
  • Joined

  • Last visited

  • Days Won

    47

Posts posted by cabalistic

  1. Thanks for the report lowenz! I'll take a closer look into the lightgem. The lightgem is a little awkward because it renders between each frame and thus "violates" the frontend/backend split. So that might have some unintended consequences here.

    Shame that the performance dropped, even if just a little. Just for the record, can you tell me your PC specs, and also roughly what graphics settings you are using?

    • Like 1
  2. Afaik, uncapped fps sets com_fixedTic, correct? From my understanding, com_fixedTic does indeed result in one game tic run per each frame, which is undesirable. When I originally wrote the multi-threading patch, I did the fps uncapping in a different way, ensuring that game tics run at a constant 60 Hz and only rendering is uncapped. This is now also the behaviour you get when you enable both com_smp and com_fixedTic, and with both enabled I have not had any trouble at higher fps. Without com_smp, though, com_fixedTic is, imho, broken.

    • Like 2
  3. So I just tested this with the two devices I have with Intel graphics.One is my work notebook (it's technically an Optimus hybrid of onboard Intel 530 and a GTX960m), the other is my GPD Win (Intel Atom with onboard graphics).

     

    First of all, I cannot reproduce any framerate drops in Rightful Property. The GPD Win goes from 19-20 fps to 21-23 fps. The notebook stays just under 50 fps in both cases. Looking at the logs (r_logSmpTimings), I can see a fairly significant reduction in frontend drawing duration on the notebook, which does not increase the frame rate because the backend stays the same and limits it.

     

    Unfortunately, the notebook also shows some serious graphical artifacts with the vertexcache version. I tried different versions of the Intel driver to no avail. The GPD Win is fine. The Nvidia card on the notebook is also fine.

     

    You did not see any visual artifacts, right? They were fairly noticable, so you'd probably have mentioned them. As for the performance, can you give me more information about the graphical settings you are using? Resolution, AA etc. as well as shadow maps or stencil, ... And what build are you using, release x86 or x64?

     

    I'm not sure where to go from here. If the changes cause performance drops or even visual artifacts on some machines, then I'm not sure we want to apply them. But I don't really have a clue how to fix either.

    • Like 1
  4. Interesting - and disappointing. Thanks for testing. Was this on Rightful Property? What settings did you use for shadows etc.? There's also a commented line in RenderWorld.cpp:1498 that you might try to enable. It pregenerates some of the static interactions and might take some load of the system, but it causes problems with savegames I haven't yet tracked down.

     

    As for the error, I'll have to look into that. Might be some missing cleanups, or just an oversight with the new interactionTable.

    • Like 1
  5. Some more careful timing analysis shows that the frontend time does indeed go down in the multithreading scenario. However, the time between two frames (i.e. after backend and frontend finish and until they start again) suddenly increases and eats up any time savings.

     

    I investigated more closely, and what suddenly takes a lot more time is unmapping the vertex caches. If I move that inside the backend, then some other operations still seem to block. My current hypothesis is that even though the backend seemingly finishes early, the GL backend is actually still working. And the next (blocking) GL call thus has to wait. So my seemingly amazing savings after commenting out glGetError might just have been hiding the true backend costs, because that way the backend was waiting for frontend instead of GL...

     

    At this point it's still an open question whether I can turn the multithreading savings into something tangible.

    • Like 1
  6. Actually, never mind. The scenes I was just testing that were strongly frontend-limited yesterday or suddenly backend-limited, so that's why I see no improvements. Something strange is going on with those backend timings, I don't get it.

     

    Edit: So it seems Afterburner is another thing that somehow eats time on the backend. Very weird...

    • Like 1
  7. So, I've got a puzzle for you. I experimented some more with parallelization and made some experimental changes to call AddActiveInteraction in R_AddModelSurfaces in parallel or serially depending on a cvar switch. In each case I timed how long it takes. I can clearly see the times go down with the parallel version, yet the time the frontend takes in total hardly changes at all. Goes down slightly in some maps, actually seems to rise in others. Really don't get it. :(

    • Like 1
  8. Thanks Diego. I'm afraid a truly playable version is still far away, but I'll do my best :) If you still want to try and get it work, remember you'll need SteamVR running (no native Oculus support atm) and pay attention to the readme and the entry to autoexec.cfg mentioned there.

     

    @duzenko: I tested briefly with the interaction tri culling, but for me it does not seem to make a discernable difference. It didn't show up in my profiling before, either. It's curious that there's such a difference in what requires processing time...

    Also, I encountered a couple of crashes that seem to be related to shadow mapping. I guess I'll need to double-check that the vertexcache changes don't break shadow mapping.

  9. In any case, we are now back to the frontend being the bottleneck, which is good. I did some experiments today trying to parallelize R_AddModelSurfaces (which in my profiling is the largest time consumer). However, all my attempts have led to exactly zero performance improvements. I don't know why; might be cache congestion or generally memory access. I read that one of BFG's design changes was to store less in memory and (re-)compute it when needed. I guess we have a lot of work ahead of us...

  10. HOLY SHIT!

     

    I just found the most ridiculous bottleneck in the backend, thanks to nSight!

     

    It's glGetErrors. I'm serious. All those GL_CheckErrors() calls are incredibly costly. I just commented out the entire implementation of GL_CheckErrors, and in one of my go-to bottleneck scenes, the backend rendering dropped from taking 8 ms down to 2 ms!! as reported by r_logSmpTimings.

     

    Didn't increase the framerate, because the frontend is still blocking, but now parallelizing the frontend is actually going to be worthwhile.

     

    We should probably hide the implementation of GL_CheckErrors behind either a cvar, or a compiler flag...

     

    This is absurd. I wonder if this is an Nvidia issue? Would be interested to hear if it has a similar impact with your Intel chip.

    • Like 3
  11. So, Afterburner says ~50% (or less) GPU usage in Rightful Property. To be fair, though, I have a reasonably powerful GPU (GTX 1070) and fairly low settings. Profiling does not reveal any obvious optimization potential in the backend path. The significant portion of CPU time is already taken by the Nvidia GL driver, which means that we would have to optimize/reduce the number of GL calls to get any significant gain.

     

    I'm not entirely certain I understand what you are suggesting, but I think BFG does indeed do something like that, so it might be worth a try.

×
×
  • Create New...