Jump to content
The Dark Mod Forums

Redundant calculations in some code (Optimized away by compiler?)


Recommended Posts

New here. I'm a data analyst by trade (sql, etc), but dabble in game dev stuff in my spare time. So, I'm def not a developer. Hence my question may sound dumb to seasoned devs...

I dl'ed the Dark Mod source, and just scanned through some of it, mainly seeing if I could help tweak, optimize or help on the GLSL. (I worked on an HLSL mod for a game, so spent a lot of time learning shaders and tweaking them.)

I looked at some of the .h and .cpp files, though, and noticed some redundant calculations...

EG:

In the /renderer/MegaTexture.cpp, there's several functions that have things like this... (I copied this from the "GenerateMegaMipMaps" func)...

// mip map the new pixels
for ( int yyy = 0 ; yyy < TILE_SIZE / 2 ; yyy++ ) {
	for ( int xxx = 0 ; xxx < TILE_SIZE / 2 ; xxx++ ) {
		byte *in = &oldBlock[ ( yyy * 2 * TILE_SIZE + xxx * 2 ) * 4 ];
		byte *out = &newBlock[ ( ( ( TILE_SIZE/2 * yy ) + yyy ) * TILE_SIZE + ( TILE_SIZE/2 * xx ) + xxx ) * 4 ];
		out[0] = ( in[0] + in[4] + in[0+TILE_SIZE*4] + in[4+TILE_SIZE*4] ) >> 2;
		out[1] = ( in[1] + in[5] + in[1+TILE_SIZE*4] + in[5+TILE_SIZE*4] ) >> 2;
		out[2] = ( in[2] + in[6] + in[2+TILE_SIZE*4] + in[6+TILE_SIZE*4] ) >> 2;
		out[3] = ( in[3] + in[7] + in[3+TILE_SIZE*4] + in[7+TILE_SIZE*4] ) >> 2;
	}
}

I'm a noob at understanding the optimizations compilers do.. In working with java & python, I'm pretty sure when calculations are done in a loop call, the compiler is smart enough to calculate something once. (So, I'm pretty sure a C-compiler is smart enough to do that, too).

IE: the TILE_SIZE / 2 in the "for" statements will get calculated once, not each time the loops loop and the statement is evaluated.

(I may be wrong, though. I'm projecting my experience from java / python after I ran tests when coding various projects. I've never done C++ development, so unsure how the C++ compiler would function.)

But, in the body of the code after the 2nd "for", the TILE_SIZE/2 is calculated 2 more times on the byte out. So, pre-calc once for the first "for" to use over and over. Another time (?) for the next "for" to use over and over, then 2 more times in the body each time the 2nd for loop runs. And, I'm assuming each time that block is ran, it's calculating those 2 calc's each time. Also have TILE_SIZE * 4 calculating 8 times in the body, which, if those are re-calculated each time that block of code runs, would be a lot of redundant calculation.

Does the compiler notice this, and optimize it down to a single one-time pre-calculation, or is it actually calculating this stuff over and over?

I guess I'm wondering if a variable should get declared, and the TILE_SIZE * 8 and the TILE_SIZE / 2 should get calculated once and used wherever those parts are.

Maybe it's written the way it is, b/c it's a case of losing precision due to floating point error? (IE: creating a var would "round" the result and create erroneous results, which is why the calculation is done each time it's needed?) I don't know. I'm a nub on that, too.. but from doing shader work, I learned that sometimes it's better to let a redundant calc run multiple times to avoid precision error.

I'm not even sure if the code is being used.. but I was just looking at various files, and noticed some had redundant calcs. I was wondering if this was something I could tweak and submit changes for?

Coming from the shader side of things, there's the push to reduce calculations as much as possible to prevent shaders from bogging down the FPS. IE: reduce instruction set use, use intrinsic functions, move as much stuff to vertex side instead of pixel/fragment. Having some redundant calculations in a global var the game engine calc's once before piping to shaders or something doesn't make any impact. But, having a shader-like function called many times to generate vertex / pixel could create performance impact with redundant calc's.

  • Like 1
Link to post
Share on other sites

It sounds like you are on the right track.

That said, TDM has never used the megatexture feature so we never contributed code to it.

I think there was some interest in evaluating the Doom 3 mod that "completes" the feature but nobody got around to it

and now the shaders from that mod would need to be converted to GLSL and made to conform to UBO.

If you are interested in this feature, I would suggest you check in the Doom 3 BFG sources and see if the code looks better there.

Edit:

Nope... no megatexture in BFG...

I guess Partially Resident Textures would be what you would replace this with:

https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_sparse_texture.txt

not sure how well this plays with bindless textures though...

Please visit TDM's IndieDB site and help promote the mod:

 

http://www.indiedb.com/mods/the-dark-mod

 

(Yeah, shameless promotion... but traffic is traffic folks...)

Link to post
Share on other sites
5 hours ago, totallytubular said:

New here. I'm a data analyst by trade (sql, etc), but dabble in game dev stuff in my spare time. So, I'm def not a developer. Hence my question may sound dumb to seasoned devs...

No problem. I guess most of our programmers don't do any gamedev on their daily jobs 😃
Knowledge of C++ language is more important, I would say.

Quote

I dl'ed the Dark Mod source, and just scanned through some of it, mainly seeing if I could help tweak, optimize or help
I looked at some of the .h and .cpp files, though, and noticed some redundant calculations...

In general, it is not a good idea to span through text and try to optimize everything you don't like.
Because most of the code should not be optimized: you will not improve performance in any way, but you can introduce errors, make it less readable, etc.

It is better to use profiler to see which parts actually take time, then optimize them. Without profiler you cannot even be sure you make the code snippet faster (at least not slower). It is known that programmers are pretty bad at predicting low-level performance.

Quote

IE: the TILE_SIZE / 2 in the "for" statements will get calculated once, not each time the loops loop and the statement is evaluated.

Yes, compiler knows both arguments are compile-time constants, so it computes this value during compilation.

Quote

(I may be wrong, though. I'm projecting my experience from java / python after I ran tests when coding various projects. I've never done C++ development, so unsure how the C++ compiler would function.)

While I have heard from many people that Java compiler can optimize code more aggressively because of high-level nature of the language, my experience shows that C++ compiler is much better at optimizing things.

Quote

But, in the body of the code after the 2nd "for", the TILE_SIZE/2 is calculated 2 more times on the byte out. So, pre-calc once for the first "for" to use over and over. Another time (?) for the next "for" to use over and over, then 2 more times in the body each time the 2nd for loop runs. And, I'm assuming each time that block is ran, it's calculating those 2 calc's each time. Also have TILE_SIZE * 4 calculating 8 times in the body, which, if those are re-calculated each time that block of code runs, would be a lot of redundant calculation.

Does the compiler notice this, and optimize it down to a single one-time pre-calculation, or is it actually calculating this stuff over and over?

I guess I'm wondering if a variable should get declared, and the TILE_SIZE * 8 and the TILE_SIZE / 2 should get calculated once and used wherever those parts are.

Compiler does not compute any of these values during runtime. It computes them during compilation and inserts final constants directly into the assembly code (as "immediate" values).

Quote

Maybe it's written the way it is, b/c it's a case of losing precision due to floating point error? (IE: creating a var would "round" the result and create erroneous results, which is why the calculation is done each time it's needed?) I don't know.

You should get at least some knowledge about C, I would say.
In C/C++, if you divide integer by integer, then no floating point numbers are involved at all.

Quote

I'm not even sure if the code is being used.. but I was just looking at various files, and noticed some had redundant calcs. I was wondering if this was something I could tweak and submit changes for?

As @nbohr1more says, this whole file is not used in TDM.
You can of course play with source if you like, but if you don't see a function in profiler, then all optimization efforts are useless (counter-productive, in fact).

Quote

Coming from the shader side of things, there's the push to reduce calculations as much as possible to prevent shaders from bogging down the FPS. IE: reduce instruction set use, use intrinsic functions, move as much stuff to vertex side instead of pixel/fragment. Having some redundant calculations in a global var the game engine calc's once before piping to shaders or something doesn't make any impact. But, having a shader-like function called many times to generate vertex / pixel could create performance impact with redundant calc's.

I'm not sure you will be able to get noticeable performance improvement this way.

I think the toughest parts of our shaders are bound by memory access (e.g. texture fetch in soft shadows, or raw bandwidth in drawing shadow volumes). Due to massively parallel nature of GPU, it means that even if you make all computations 2 times faster, the actual performance won't change at all.

At least that's my experience with soft shadows. There are a lot of computations there, but adding more will not worsen performance, because texture fetches are slower anyway. And given that fanatic optimization of computations often make them much harder to read and understand, it sounds counter-productive again...

Link to post
Share on other sites
17 hours ago, totallytubular said:

I'm pretty sure when calculations are done in a loop call, the compiler is smart enough to calculate something once. (So, I'm pretty sure a C-compiler is smart enough to do that, too).

IE: the TILE_SIZE / 2 in the "for" statements will get calculated once, not each time the loops loop and the statement is evaluated.

But, in the body of the code after the 2nd "for", the TILE_SIZE/2 is calculated 2 more times on the byte out. So, pre-calc once for the first "for" to use over and over. Another time (?) for the next "for" to use over and over, then 2 more times in the body each time the 2nd for loop runs.

Your misconception here is that there is something special about loops which means that the compiler will optimise constant expressions used within the loop, but it will only perform this optimisation for the first usage of the expression within the loop body.

This is not how C or C++ compilers work. There is nothing special about loops, and no limit on the number of compile time optimisations that can be made. If you have a numeric expression which can be evaluated at compile time, e.g.

int x = 4 / 2;

then that compile time calculation will be performed every single time. So you can happily write constant expressions like "TILE_SIZE * 4" as many times as you like and the compiler will substitute the constant value result each and every time (provided that TILE_SIZE is indeed a constant, not a variable which can be modified elsewhere).

Link to post
Share on other sites
8 hours ago, OrbWeaver said:

Your misconception here is that there is something special about loops which means that the compiler will optimise constant expressions used within the loop, but it will only perform this optimisation for the first usage of the expression within the loop body.

This is not how C or C++ compilers work. There is nothing special about loops, and no limit on the number of compile time optimisations that can be made. If you have a numeric expression which can be evaluated at compile time, e.g.


int x = 4 / 2;

then that compile time calculation will be performed every single time. So you can happily write constant expressions like "TILE_SIZE * 4" as many times as you like and the compiler will substitute the constant value result each and every time (provided that TILE_SIZE is indeed a constant, not a variable which can be modified elsewhere).

Yeah, where to shove the load on shaders is the aggravating part. The shaders I worked on, I experimented with euclidean vs. manhattan distance to cut-down partial derivative decompression calculation of normals. Looked fine on some textures, but others it did not. But, performance didn't change noticeably, b/c...like you said.. texture calls are expensive, still need to get called, and it comes down to memory amount and bandwidth mostly.

Link to post
Share on other sites

One thing that has been on the horizon for awhile is exploring the possible performance impact of GPU skinning.

It's unclear now whether there will be much advantage to this since Shadow Maps mode significantly reduces the CPU load

but more CPU performance is always welcome.

Another area of opportunity is r_shadowMapCullFront. Currently it both offers better performance and a better match to Stencil Shadowing

but has a few deal-breaker light leak artifacts. Fixing these edge cases and making this mode usable would be a huge boon.

Please visit TDM's IndieDB site and help promote the mod:

 

http://www.indiedb.com/mods/the-dark-mod

 

(Yeah, shameless promotion... but traffic is traffic folks...)

Link to post
Share on other sites

I'm not a very good coder, so I can't say if the guy is right or wrong but that most be code made by John Carmack, when he was testing the first version of the MegaTexture tech (virtual texturing like is called in other engines), so is hard for me to believe he did that with no intention, that guy is a genius coder he doesn't do anything without reason or sheer stupidity, so if true that code is not well optimized than most be just fast test code that he thrown together to make the MT feature work for early testing. 

Btw the doom 3 mod that makes the feature work, shows that this early version is not usable at all, first because "terrain" mesh's in this engine, are not optimized at all (no dynamic tessellation) second and more important fact, because .mega files in doom 3 are not compressed at all and can take GB of drive space, while in Quake Wars .mega files are a couple of MB. 

Edited by HMart
Link to post
Share on other sites
On 2/10/2021 at 11:41 AM, HMart said:

I'm not a very good coder, so I can't say if the guy is right or wrong but that most be code made by John Carmack, when he was testing the first version of the MegaTexture tech (virtual texturing like is called in other engines), so is hard for me to believe he did that with no intention, that guy is a genius coder he doesn't do anything without reason or sheer stupidity, so if true that code is not well optimized than most be just fast test code that he thrown together to make the MT feature work for early testing. 

Btw the doom 3 mod that makes the feature work, shows that this early version is not usable at all, first because "terrain" mesh's in this engine, are not optimized at all (no dynamic tessellation) second and more important fact, because .mega files in doom 3 are not compressed at all and can take GB of drive space, while in Quake Wars .mega files are a couple of MB. 

I just copy-pasted the mega code snippet as an example of what I thought might be redundant code calculations in the code base.

As others pointed out, the compiler seems to be smart enough to optimize that stuff away at compile time. And, even if there's redundant calculations, chances are they're not making that big of a difference. And, as you pointed out, Carmack, et.al., coded this stuff up to begin with, so chances are it's probably already good.

I'm a nub when it comes to C++ programming, so I don't have the experience to know how the compiler optimizes, etc.

But, I stared at some shaders.

A lot of them just have complex stuff going on that has to get done, and no way around it. (EG shadows, SSAO, etc).

But, things like Tonemap.fs.. it's taking a color sample, and running each RGB component through a color mapping function separately as individual floats. I plugged the code into shader playground (where I just mess with small shader code for quick testing) and turned it into a vec3 function to do it in one-shot, and it seemed to cut the instruction set use in half. I have no idea if that shader's used, though. And, as another said, the big issue with shaders is texture pulls and memory management these days. So, meh.

 

  • Like 2
Link to post
Share on other sites

Aren't all modern GPUs scalar?
Unlike old AMD GPUs which actually had vector compute units.

If they are scalar, then replacing three operations on floats with one vec3 operation should do nothing performance-wise. Even if the intermediate assembly shows less instructions due to using vector-based ones.

On the other hand, it may be that same operation is executed three times (once per component) instead of one. But 1) I don't see much of such computations in tonemap shader, and 2) it should be very easy for shader compiler to eliminate such duplicate computations. Given that shaders have no notion of pointers, no aliasing, and no recursion, the compiler most likely inlines "mapColorComponent" function and does common subexpression elimination.

I have no objections to making this function work on vec3 instead of float. Just don't see the point.

Link to post
Share on other sites

How about some fresnel highlights then?

Red-shaded areas show the highlight...

Pic below is a velour-style that highlights more as the surface becomes more parallel to the view angle.

6D4CPRH.jpg

Here's the effect applied to .rgb

Y9Z0t6F.jpg

 

Next is an ambient "spotlight / backlight" effect that slightly brightens the area around the player based on surfaces perpendicular to the viewing angle

WTYul8W.jpg

Here it is applied to .rgb

W1fIugh.jpg

It's a subtle effect. It simulates ambient light reflecting off the player back onto surfaces. When combined with velour fresnel, it helps add depth/body to surfaces as you move around them.

Some folks may not like the backlight effect. Thief 4 did a similar thing. When you'd crouch to go into stealth mode it lit up a hidden "lantern" light to light up the area around the player to help offset the darkening stealth haze. Some folks hated it and found ways to disable it. So, this might be something to add an option to enable/disable as the user wishes.

When combining the effects the ambient lighting has more depth.

~~~~~~~~~~~~~~~~~~

Issue with shadows is they tend to flatten out normals. So, adding in highlights and such helps bring some body back to them.

I applied the fresnel effects to the "tdm_interaction.fs.glsl" and "ambientinteraction.fs" files. It helps the most in the ambient lighting, b/c that's where it helps things stand out. Gets too washed out in the tdm_interaction lighting.

Might want to disable the fresnel effects from tdm_interaction.fs.glsl, though, b/c not very noticeable in lamp/torchlight, and the ambient fresnel already adds some. (I added them to tdm_interaction to see if they'd boost the highlights in lights. They do, but can over-bright the lights.)

~~~~~~~~~~~~~~~~~~~~~~~~

I messed around with specular in tdm_interaction.fs.glsl to try to make it pop more. I noticed you had a work-around for specular. I re-worked the algorithm to add specular later in the chain, and, while it pops more now, it gets a stark cut-off ... which I'm guessing you noticed and were trying to sort out with work-arounds (?)

9NsCvBw.jpg

(gamma room in Training Mission.. floor is shiney & wet now from specular, but shadow is cutting specular off unnaturally)

It does it for Blinn and Phong dots either way. Not sure what's going on there or how to fix it. The cut off moves as the camera moves, so it feels like it has somethign to do with view direction. I don't know.

qSIId8s.jpg

I made tdm_interaction specular = 0.1 if there was no specular texture in order to highlight speculars more. Image above has torches to the left and right off-screen.. middle shows a cut-off of specular into the shadows. It's like the light blending is not working for specular.

7p1y29D.jpg

But when the cut-off isn't in the way, the specular looks better now. The wood is unnaturally shiney due to the 0.1 default I put in place. But, the metal rings on the barrels shine and reflect now.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I've attached the two files I messed with, and also a 3rd file I used for #include to chuck glsl cheat-sheet stuff for myself, and a normal map partial derivative equation function to single-source on things I was working on.

shaderedits.zip

  • Like 4
Link to post
Share on other sites

If you plan to work on correcting speculars, don't base this on what you see in TDM. Non-pbr materials are lighting dependent, and TDM typically uses very strong lights and too contrasty and saturated diffuse textures. Instead, you might want to prepare a test scene of your own.

  • Like 2
Link to post
Share on other sites
41 minutes ago, peter_spy said:

If you plan to work on correcting speculars, don't base this on what you see in TDM. Non-pbr materials are lighting dependent, and TDM typically uses very strong lights and too contrasty and saturated diffuse textures.

Yeah, and the problem is that if something "works for you" on one scene, it will most likely stop working in some other.

The existing code has one major benefit: it has been here for some time, and authors test their maps against it.
I recall @duzenko removed/simplified some parts of shader code when he migrated it from ARB to GLSL, but in the end we returned everything back to how things were in 2.05. And in most cases, this return was done because something was found by mappers to be broken during 2.06 beta.

By the way, existing shaders already have something called "rim" and "fresnel", most likely added by previous adventurers. But unfortunately people like to write more than read. Not that I understand myself why it works as it works... I'm afraid I'm too fond of mathemathically correct methods.

  • Like 1
Link to post
Share on other sites

That rim / fresnel kinda seems to have a life of its own ;) I see it on my models sometimes, even when I didn't do anything out of ordinary in terms of modeling or texturing.

Oh, and also, @totallytubular sorry if you already know this, but there a few more quirks about this system. First of all, it's not just greyscale values, it uses the whole RGB range. Greyscale values are good for surfaces like silver, water and glass, and that's basically it. The rest needs to use this workflow. So in essence, for conductors, colors from diffuse to modify the hotspot color; for dielectrics, inverted colors from diffuse to make it still look white.

Link to post
Share on other sites
4 hours ago, stgatilov said:

I recall @duzenko removed/simplified some parts of shader code when he migrated it from ARB to GLSL, but in the end we returned everything back to how things were in 2.05. And in most cases, this return was done because something was found by mappers to be broken during 2.06 beta.

Really? Even the "headlamp" ambient specular?

Spoiler

Картинки по запросу "what meme"

 

Link to post
Share on other sites
14 hours ago, nbohr1more said:

Hmm...

Those cutoff behaviors look like:

https://bugs.thedarkmod.com/view.php?id=4634

If you can solve that it would be fantastic!

I saw a similar problem working on the FUEL: Refueled shaders. It happened when trying to do a half lambert on the light dot. 

IE

sundot*0.5+0.5

This is valve's half lambert technique. It alters the light dot from 0-1 to 0.5-1 to keep shadow side of objects from being pitch black. Sort of acts like pre-adding ambient light. 

In Fuel, it caused a, stark cut off / seam on objects in mid distance. Not sure why. They did some janky math to try to smooth it out. 

I simply switched back to using full light dot, and added ambient light value as final addition to light. 

So, they had this... 

Sun color * (sundot*0.5+0.5) * shadow + sky color (color of shadows) 

And i switched it to this... 

Sun color * sundot * shadow + sky color * sundot * ao + ambient color = light color

Got rid of seam, and gave me more control over shadow and ao impact. The sky color by sun dot also prevented normals from flattening out in shadows. 

Then the final color was... 

Diffuse * light color + specular * specdot * sun color * shadow * ao

Which prevented sundot from killing speculars. With shadow and ao as separate vars, i could max(0.2,shadow) and such to soften shadows and let a bit of specular show up in shadows to simulate ambient light in shadows creating faint shines. 

Speculars have to be treated as a separate "light" added in doing their own thing. If added to diffuse too early they get severely toned down from other light processing that, shouldnt impact them. 

Fresnel i've found works best as a final modifier to a final light color. Otherwise it too gets washed out too much in the mix. 

Ideall with, fresnels youd use R0, but if you dont know or have an R0, you can, just assume R0 of 1.0 for air which cancels out the R0 materials part of the equation leaving you with the pow(1-x, power) part. Then you plug in various modificationa of dots to see what results you get. 

Fresnel is also good for blending reflective cubemaps to surfaces. Fuel did it with water and clear coats /chrome. But, i added rain sheen to things by just using a set value based on how much it was raining. 

Anyways, power's out here in texas, so it will be a while before i can explore more. Ideally would like to solve that specular cut off bug before trying to enhance anything else. I looked for some half lambert stuff, but nothing stood out. And its only speculars, which is odd. Theres a lot of shader code to dig into and wrap my head around. Gonna be a while before i can stare at stuff more. 

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...