Jump to content
The Dark Mod Forums


Development Role
  • Content Count

  • Joined

  • Last visited

  • Days Won


cabalistic last won the day on September 1 2018

cabalistic had the most liked content!

Community Reputation

368 Legendary

1 Follower

About cabalistic

  • Rank
    Advanced Member

Profile Information

  • Gender

Recent Profile Visitors

130 profile views
  1. Capture frame should just work. If it's not, I'm afraid I don't know how to help you with that. You could try the standalone version of nSight, it worked better for me than the Visual Studio integrated one.
  2. It's not about batching - it's about avoiding driver call overhead. Just to be clear, what it achieves is save CPU time in the backend, which in turn will get data to render to the GPU faster and avoid that the GPU needlessly idles. In order to profit from this, the GPU needs to be starving in the first place. If you have a poor GPU and run it at demanding settings, then yeah, it's possible you won't see much of an effect, because it wasn't the bottleneck. (It's also possible you're doing something wrong - you'd have to measure carefully with nSight to try and catch what's going on.) But in my experiments, where I replaced depth and stencil shadow draws with multi draws, the effects were pretty clear. They cut down CPU time for those areas significantly and as a result allowed the GPU to render more quickly. Of course, stencil (and particularly depth) are often not the most significant parts of the scene, so the overall effect on FPS is not gigantic (but it was still measurable).
  3. Anything that applied an offset, I rendered separately (classically). Eventually, I think we need to migrate away from using the GL functions and doing our own offsetting in vertex shaders based on parameters. For one, it enables to use them in multi draws, and for another, they would actually be predictable. The GL polygon offsets are not - their effects can vary among GPUs and drivers.
  4. Yes, you need a new set of shaders, no way around that. I'm not sure if you can make non-multipass use that same set on a GL3 feature base. Anyway, I used a single depth multi-draw command for all non-transparent objects, and then rendered the transparent ones classically on top. Since they use actual textures for drawing, it's not as easy to get them into a multi-draw call. But even if you do, since their fragment shader is much more costly, it still makes sense to have them separated from the solid ones and use a specialized fragment shader.
  5. There is no overhead - you have to transfer the model matrices, anyway. Whether by glUniform or buffer. In fact, buffer upload is most likely faster, because you only have a single upload for the batch instead of repeatedly setting the glUniform. It takes a little more GPU memory, but it's not dramatic. Note that UBOs have an array size limit of I think around 500 or so. For larger collections, you want to use SSBOs, which is what I did in my experiments.
  6. I got no clue. As far as I know, SCons does not directly support it, so you'd have to hack it in, somehow. I had to do the same in CMake (their native support for precompiled headers will only be available in an upcoming version), but I'm not knowledgable enough about SCons to repeat the feat. Nor would I want to - I'd rather replace SCons with CMake entirely.
  7. Imho, your code should not rely on the precompiled header to function correctly, it's just a compile speed improvement. In fact, not every file currently even includes the precompiled header - on MSVC it is auto-included in every file, but with SCons precompiled headers are not used and not auto-included, and that's one of the reasons why these kinds of missing headers regularly crop up in the Linux build (My CMake build does use precompiled headers even with GCC, which significantly speeds up compile times and is more on par with the MSVC build.)
  8. Include needs to go right in the file where it's needed.
  9. Sorry, yes, I'm working on a CMake build system. It was just the first result from AppVeyor I found, wasn't intentional. Are you saying idlib/Thread.cpp should not be compiled on Linux? If so, it's my mistake. Otherwise it has nothing to do with CMake...
  10. According to AppVeyor: c:\projects\thedarkmod\typeinfo\main.cpp(203): error C2143: syntax error: missing ';' before '*' [C:\projects\thedarkmod\typeinfo.vcxproj] c:\projects\thedarkmod\typeinfo\main.cpp(203): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int [C:\projects\thedarkmod\typeinfo.vcxproj] c:\projects\thedarkmod\typeinfo\main.cpp(203): error C2065: 'MAX_THREADS': undeclared identifier [C:\projects\thedarkmod\typeinfo.vcxproj] c:\projects\thedarkmod\typeinfo\main.cpp(206): error C2061: syntax error: identifier 'xthreadInfo' [C:\projects\thedarkmod\typeinfo.vcxproj] c:\projects\thedarkmod\typeinfo\main.cpp(207): error C2065: 'info': undeclared identifier [C:\projects\thedarkmod\typeinfo.vcxproj] c:\projects\thedarkmod\typeinfo\main.cpp(207): error C2182: 'Sys_DestroyThread': illegal use of type 'void' [C:\projects\thedarkmod\typeinfo.vcxproj] c:\projects\thedarkmod\typeinfo\main.cpp(207): error C2365: 'Sys_DestroyThread': redefinition; previous definition was 'function' [C:\projects\thedarkmod\typeinfo.vcxproj] c:\projects\thedarkmod\idlib\sys\sys_threading.h(208): note: see declaration of 'Sys_DestroyThread' c:\projects\thedarkmod\typeinfo\main.cpp(207): error C2448: 'Sys_DestroyThread': function-style initializer appears to be a function definition [C:\projects\thedarkmod\typeinfo.vcxproj] I think there were additional errors, but this is the first one AppVeyor stumbles over. Also, on Linux: [ 27%] Building CXX object CMakeFiles/TheDarkMod.dir/idlib/Thread.cpp.o /home/appveyor/projects/thedarkmod/idlib/Thread.cpp: In static member function ‘static int idSysThread::ThreadProc(idSysThread*)’: /home/appveyor/projects/thedarkmod/idlib/Thread.cpp:218:3: error: ‘_exit’ was not declared in this scope _exit( 0 ); ^~~~~ /home/appveyor/projects/thedarkmod/idlib/Thread.cpp:218:3: note: suggested alternative: ‘_Exit’ _exit( 0 ); ^~~~~ _Exit
  11. At least yesterday evening trunk did not even compile for me...
  12. Yeah, but I have a feeling the challenges will be similar in some regards. Also, light interaction is one of the highest-scoring functions in the profiler from the frontend.
  13. Well, here's a relevant commit that shows what I did in my last attempt: https://github.com/fholger/thedarkmod/commit/87880984c894fd2dd68ba87e8bf34785ff5eaf1c Note, this is almost two years old, things may have changed. I also used TBB at the time instead of porting BFG job queues for easier development. You'll see that I had to replace a number of the allocators because they are inherently not thread-safe, and use atomics at several places. These are not light changes to be made; even though it worked fine in my tests, they may not be safe. That's another reason why I abandoned the effort.
  14. I never tried - the frontend changes from BFG were too drastic. I did manage to parallelize some parts of the frontend, but it was a challenge to get the relevant data structures thread-safe. In the end, while it increased the frontend performance a bit, it made the backend run worse. Either the additional threads were stealing CPU time from the backend (which is potentially fixable), or the changes to the data structures were detrimental to the backend. I abandoned the effort, since at the time at least I didn't have any scenes were the frontend was a significant bottleneck compared to the backend.
  15. Unmapping is unnecessary with persistent buffers. That's their whole spiel - they are persistently mapped As for flushing - if you created and mapped the buffer with the GL_MAP_COHERENT_BIT, then you don't need to flush explicitly, as it's taken care of automatically. If you don't, you have to flush so that OpenGL knows parts of the buffer were touched and need to be synchronized before the next GL read from that area. See here: https://www.khronos.org/opengl/wiki/Buffer_Object#Persistent_mapping
  • Create New...