Jump to content
The Dark Mod Forums

TDM svn Profiling Results


Recommended Posts

So I did it. TDM died when loading training missing with profiler evaluating the engine. Then I just ran profiler when it was already loaded. That was CPU profiling with Very Sleepy. It wasn't as bad as I thought at first.

 

Next step was running gDebugger. And that was horrible :( See results here:

 

https://drive.google...Rjg&usp=sharing

 

Oh, and that was with the same set up I used before for AMD fix thread:

  • System -- Nvidia GeForce 670 GTX 2Gb, AMD Penom x3 2.2Ghz, 8Gb DDR, xfx 750a, Sound: Asus Xonar D1, Win 7 Pro 64bit
  • Driver -- Nvidia's latest release
  • Screen res -- 1920x1200
  • Settings -- AAx8 ASx8, VSync off

Edited by motorsep
  • Like 1
Link to post
Share on other sites

Ha, I am not a coder either :) But there are a lot of depreciated calls and ARB shaders is where the issue is (or whatever part of the renderer dealing with ARB).

 

glProgramEnvParameter4fvARB makes crazy amount of calls.

 

Instead of speculating, I'd rather have SteveL, revelator and serpentine to analyze the stats and do whatever needs to be done, if anything.

Edited by motorsep
Link to post
Share on other sites

Aye loads of deprecation in vanilla, worst offenders from my own profiling where the old opengl 1.4 calls like glColor4f (makes litterally thousands of calls each frame yuck) but also other parts that need

attention like thread handling and some of the AI code also seems to be a huge drag on resources.

 

To get a reliable profile make a debug version and then try running that in gDebugger, remember to set it to render windowed in gDebuggers params for Doom3.exe, you can also hook up to msvc's

debug symbols for a little more in depth analysis.

 

You can get rid of quite a load of them by removing the fallbacks and cvars (even though modern cards have no problem with what these cvars originally turned off and can use the hardware based functions instead), still even with those parts gone theres still some functions left that need updating badly like the fog function and a few other parts.

Link to post
Share on other sites

Interesting topic. I'll contribute some profiling results too today.

 

First a suggestion on producing meaningful results: Don't mix up map loading, main menu use, running tdm minimized, and ordinary gameplay in the same profiling report. The engine is doing very different stuff in those situations. For example, those 5k calls to glTexImage2D in some of the gdebugger reports are the engine generating downsized versions of textures while loading a map. That process takes a minute or two of solid gpu thrashing, but it happens during map load, not during gameplay.

 

Deprecation reports are pointless, quite frankly. They tell us only that our code pre-dates openGL3 and that openGL3 changed a lot of stuff. We knew that already. They're useful for new code, to identify coders who're not using the new way of doing things, or for code migration projects where everything needs upgrading. For older code, they're not indicative of problems and they don't point to solutions. A lot of calls to a deprecated function doesn't mean that there's an opportunity to speed things up. The new coding model might be no faster than the old one for a particular function. As for glProgramEnvParameter4fvARB and glColor4f, they are two of the basic switches on the openGL "control panel" that get set for every draw call, so we might have a problem if they didn't get called thousands of times per frame.

 

That said, there are more calls to glProgramEnvParameter4fvARB than I'd expect (it sets the "uniform" constants for shader programs), and the state change report is interesting. The majority of calls to glColor4fv don't change the state -- they just set the switch to what it was already. We could doubtless save some of the calls by ordering the draws in such a way that we know we don't need to set the switch for a particular draw. But we need to know how much time is spent in those calls before we think about complicating the code to avoid them. The current method -- simply confirming the setting every draw -- might well be the fastest, most efficient way to handle it.

 

Right, I'm off to do a spot of profiling of my own to see if I can find anything interesting :). I'll do what I know how to use for now: CPU profiling, which will also provide time spent in the openGL functions. I'm not familiar with gdebugger. Can we use that to find bottlenecks in the rendering pipeline? Obviously, we don't want to fix stuff that isn't causing a bottleneck as we'd end up complicating the code for zero gain. I'd like to learn how to do identify those spots if anyone has any advice. The concept of bottlenecks won't apply to the CPU part, I think. The critical path is single-threaded so speeding up any part of it would speed up the whole.

  • Like 2
Link to post
Share on other sites

I can just underwrite what Steve wrote. "outdated" calls are not nec. a problem, and rewriting things might not make it faster in a way that is user-visible. Care must be taken.

 

However, I applaud the topic, because taking a profile is the first step to find the bottlenecks, which than later can be fixed.

"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man." -- George Bernard Shaw (1856 - 1950)

 

"Remember: If the game lets you do it, it's not cheating." -- Xarax

Link to post
Share on other sites

Correct outdated functions might not nessesarily mean bad performance :) but in a few cases its worth taking a look at.

For instance to bring down the calls to glColor3fv 4fv or whatever i created classes in revelation that are able to handle most color calls as one function insted of 4 or more,

did it help then you might ask ? yes but not as much as i had hoped for, still all things count in love and war. I also convoluted a few other functions in classes.

 

as for glProgramEnvParameter4fvARB (OpenGL 2.0 spec) its not even close to the same age as the glColor calls (OpenGL 1.1 spec) but it can probably be optimized some anyway :)

Link to post
Share on other sites

I've tried gDebugger now and I'm loving it :)

 

I didn't come up with anything majorly interesting from my cpu profiling. The lightgem render passes take up a surprising amount of time -- more than drawing the screen -- but they have a job to do that's just as complicated as drawing the screen, i.e. working out much light falls on a 3d object (the player) taking into account all the (potentially moving) shadows in the vicinity. But we might be able to save some cpu time there now that we have access to the engine by allowing those passes to share some results with one another and the screen draw. Not necessarily though: the list of shadows that can be cast on the player isn't the same as the list of shadows that the player can see, and the engine already caches results for future render passes. I found that turning off that code makes 10fps difference in outdoor areas that are currently 30fps for me, so it's worth a look.

 

Also, our buffer->image captures are eating about 8% of cpu time. Not the new depth capture: thats only 0.4% -- but the original ones to _currentRender. Worth checking whether FBOs can cut that down.

 

The redundant glColor calls were nowhere to be seen in my profile reports. I used a sampling method to start with, while running about in some highly detailed maps and capturing the function that was occupying the CPU every one millionth clock cycle. Evidently, at the CPU end, those functions are so fast that they were never in the frame, not even once, when the snapshot got taken. So CPU effect is negligible. I haven't tried to work out yet whether they're having a deleterious effect on the GPU. I tried to use VS2013 pro's instrumentation method instead to capture their timings (it injects logging code into every function call, so you never miss any call no matter how fast it returns), but I got 0.5fps after the code injections and gave up the attempt.

 

Using gDebugger for the first time was an illuminating experience. I set it to do one-draw-per-click and to draw into the front buffer instead of the back buffer so I got to see the map drawn one draw call at a time. There were a couple of surprising results just from that exercise: bits of objects that I'd expect to see drawn in a single call were taking multiple calls, and identical models with same texture, size, and alignment were not drawn in one call as I thought they'd be. Plenty of calls sent only about 6 verts to the GPU. The text on the wooden signposts in that water shader test map I committed (prefab entity GUIs) were drawn in several calls per sign, some of the calls painting only a single letter. Obviously we need to look at a lot more examples, but there could well be room for taking out some draw calls there. Exciting stuff.

 

I couldn't get it to show me buffers or textures from the GPU memory though: it hangs if I try. I have an AMD GPU so I went on to try AMD's version "OpenXL" but although I could see the buffers and textures, its interface with VS2013 is so disruptive and horrible that I uninstalled it. It kills menus before you get a chance to click on them and I couldn't get it to display the performance counters at all. Has anyone got that working?

 

Likewise AMD's GPU PerfStudio. I can start up TDM in the server ok but the program hangs when I try to hook up the client. Can anyone with a bit of experience in these tools help me out? In the meantime I'll go back to the bits of gDebugger that are working for me.

 

NB: the % cpu time results above were from a debug build. The effect on the release build that people actually use to play the game would be around half that, because the debug build slows down the cpu stuff while leaving the gpu driver code running at full speed, so it skews the relative timings in the cpu direction.

Link to post
Share on other sites

Yeah as an addendum to this, traditional profiling doesn't tell the whole story here since it generally doesn't cover bus bandwidth issues.

Sending new skinned triangles over the bus all the time is not a behavior that is widely in-use so testing suites generally don't cover that

use case.

 

As for draw counts, please keep in mind that each material, UV alignment region, and light will provoke a new draw. Anything beyond those

attributes would be an area to study...

Please visit TDM's IndieDB site and help promote the mod:

 

http://www.indiedb.com/mods/the-dark-mod

 

(Yeah, shameless promotion... but traffic is traffic folks...)

Link to post
Share on other sites
UV alignment region

I didn't realise that bit. I'd expected the others to trigger a draw, but thought that vertexes would be passed along with UV coords. I guess that wouldn't cover discontinous regions as you'd need 2 different uv coords for one vertex. Is that the issue?

 

On sending skinned triangles over the bus, does *everyone* else use gpu skinning for animated models then?

Link to post
Share on other sites

I don't believe UV alignment region shifts are applicable to models but it is well known in the black art of "brush carving" that change the texture alignment slightly

will cause dmap to split tris and correspondingly split draws when rendered. This is a boon in cases where you don't want a surface to be hit by too many

light passes but obviously a problem where texture alignment is changed for artistic reasons and there's no rendering reason why new draws should be

generated.

 

I'm still looking for more CPU skinning engines in the wild. From what I can see, there's a few more out there than I originally reckoned but the examples thus

far are mired in proprietary Intel SSE dependencies. :(

Please visit TDM's IndieDB site and help promote the mod:

 

http://www.indiedb.com/mods/the-dark-mod

 

(Yeah, shameless promotion... but traffic is traffic folks...)

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...