Jump to content
The Dark Mod Forums

Bugs, patches, etc,.


CodeMonkey

Recommended Posts

I'm just saying that without proof (profiling), optimizations can easily backfire. I wouldn't accept any 'performance optimization' that used parallelism without actual proof in the form of profiler logs before and after and testing the same thing.

 

Here, use gprof

http://stackoverflow...for-c-profilers

 

Also i suggest you want a repeatable piece of code to test. A test case where you have no human input would help, not just loading a level and looking around nilly-willy.

 

Not to mention that parallelism is a (very large) source of horrible bugs itself. At least you're not using threads.

Edited by i30817
Link to comment
Share on other sites

@Tels, I'll post the OpenMP code a bit later(a day or so?), it's not quite done yet. (I still need to run some tools that can detect possible bugs in the parallel code, etc,.)

 

Did you see the code I posted for input, it's on the last page too? That should help with the bugs with the lantern\spyglass toggle.

 

@i30817, I'll check out gprof, and there are some tools made just for this I'm looking into as well. (Right now, I use FPS, and a standard CTime based benchmark for testing, I also use a sandbox when possible, so I can really see the results.)

 

About the loading bottleneck, even John Carmack said, it's due to using plain text formats, and parsing them during level load, short of changing to a binary format, there isn't much that can be done there. However, loading is more complex than that, that's just one step, I'm trying to catch the later steps, and speed them up, since that doesn't require any huge change in formats.

 

---

 

Edit:

 

GProf is for GCC, I'm using Visual Studio 2010.

 

I found a few free profilers to test out: "Valgrind" and "Shiny"

 

I'm also looking for some free static analyzers for parallel code, I've found PayWare versions, but, I'm not really looking to pay money for something I likely won't use that much, they do have trials, but, I'm not sure how well that will work with something as big as TDM. (Usually trials are limited by size, etc,.)

 

Anyways, I've been practicing in my sandbox with OpenMP, and I'm seeing impressive results.

 

#pragma optimize("", off)
void Test1(bool useOMP)
{
   if (useOMP)
   {
       long double i = 0;

       #pragma omp parallel for reduction(+:i)
       for(int t=1; t < 60000000; t++)
       {	   
           long double k = 0.7;
           for(int n=1; n < 16; n++)
           {
               i += pow(k,n);
           }
       }

       Print("Value: " + ToString(i));
   }
   else
   {
       long double i = 0;

       for(int t=1; t < 60000000; t++)
       {	   
           long double k = 0.7;
           for(int n=1; n < 16; n++)
           {
               i += pow(k,n);
           }
       }

       Print("Value: " + ToString(i));
   }
}
#pragma optimize("", on)

 

The OpenMP version of the code is winning by a large margin. (Edit: Revised results)

 

Normal: 9,000 (Rough average)

OpenMP: 2,500 (Rough average)

 

That's using CTime, and difftime, which I think is reporting the difference in milliseconds? (You can see, I'm also printing the output of the function to verify both are getting the same result.)

 

Here's another useful guide for OpenMP. (It's on the PayWare page for VivaMP, which is part of PVS-Studio now.)

 

http://www.viva64.com/en/a/0054/

Edited by CodeMonkey
Link to comment
Share on other sites

That's not darkmod code.

 

Do ya think? I'm sure glad captain obvious is here to point out things like this.

 

Here I thought that I'd made the game run 4 times faster, and it turns out, it was just test code from my sandbox. (I'm so inept that I can't tell the difference between the codebases, thanks for the assist, guy.)

 

Also, I'd have never thought to test my code, I mean, I've only been programming for over 15 years, I'm a reverse engineer, I know ASM, C\C++\CLI\C#\Etc, and about 20 scripting languages, etc,. But, thanks for that hot tip. (Gets notebook, scribbles, "test code"...)

 

Any other gems you wanna lay on me?

 

Besides being pretty stupid, since the outer cycle is useless (and probably the inner too, with some fancy mathmagic - it's a geometric progression).

 

What it does isn't the point...

 

It's designed to take a measurable amount of time to complete, that's it. (As in, I can measure it, use OpenMP, measure it again, and see the different results.)

 

The code does exactly what it's designed to do...

 

Toy examples do not generalize to real code.

 

Running your mouth doesn't equate to helping either, but, that sure isn't stopping you... Don't you have some crappy Java code to be writing?

Link to comment
Share on other sites

Codemonkey, I'm not able to respond the next two days, but please do continue your testing. In the iDict code I already fixed a few gross cases (like a O(N*N) if you added one dict to another), so I know the code has inefficiencies.

 

What would really helpful here would be to run a profile on a real mission (one of the bigger ones with lots of AI) and figure out which parts are taking a lot of CPU (prefable in a 640x480 resolution or something to rule out rendering). Then we can look into improving these code places.

 

In (I believe) v1.06 we had a silly bug where every entity spawned generated lots of silly events, this slowed map loading (all entities are spawned!) as well as dynamic entities down a lot. It's now fixed, but I bet we have similiar mistakes lurking in the code somewhere. So please continue!

"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man." -- George Bernard Shaw (1856 - 1950)

 

"Remember: If the game lets you do it, it's not cheating." -- Xarax

Link to comment
Share on other sites

Lol, if you want to play at rudeness rude boy, go on - play by yourself. As far as i am concerned, this discussion is over. Good luck with your hacking, even if i don't think you will.

 

I think you should check yourself before getting a bit too high and mighty. You were being unnecessarily aggressive and bitchy in your comments. You have an opinion of how things should be done? Good for you, but at least be polite if you're 'offering' advice.

Link to comment
Share on other sites

@Everyone, thanks for coming to my defense, I appreciate it. :)

 

Codemonkey, I'm not able to respond the next two days, but please do continue your testing. In the iDict code I already fixed a few gross cases (like a O(N*N) if you added one dict to another), so I know the code has inefficiencies.

 

What would really helpful here would be to run a profile on a real mission (one of the bigger ones with lots of AI) and figure out which parts are taking a lot of CPU (prefable in a 640x480 resolution or something to rule out rendering). Then we can look into improving these code places.

 

In (I believe) v1.06 we had a silly bug where every entity spawned generated lots of silly events, this slowed map loading (all entities are spawned!) as well as dynamic entities down a lot. It's now fixed, but I bet we have similiar mistakes lurking in the code somewhere. So please continue!

 

Ah, so you've went to work on stuff like that, good to hear, base classes like that tend to get overlooked. :)

 

You might like this article, it details the creation of StarCraft. (Has a similar LinkedList implementation that's worth checking out.)

http://www.codeofhon...to-linked-lists

 

Check these out, it's static analysis of Doom3. (Saves us a little trouble, though we probably still need to run our own..)

http://www.viva64.com/en/b/0120/

http://www.viva64.com/en/b/0151/

 

Another thing I want to mention, Carmack hints at the script interpreter in another interview as being a large cause of issues, and I've already pointed out how he said it's related to slow loading times(lex\parse). I have experience with integrating .Net as a scripting language, and many modern games are doing this over Lua\Python. (That may be something to consider in the future, it's actually fairly easy to do, and I already have some code from another project.)

 

http://www.codingthe...sidered-harmful

 

Anyways, yeah, if you want me to focus on hunting down performance issues, I can do that. It would probably be logical to find existing issues, and fix them, then, if necessary go further with any code rewrites, OpenMP, etc,.

 

Alright, I'll see what I can do, and see I'll you in a few days. :)

 

----

 

Edit: I've found some free static analyzers.

 

http://stackoverflow...s-are-available

 

I'm using "C++ Check" and "Code Analyst" (from AMD), definitely worth checking out, I'm running the first as we speak, and it's already pointing out some issues that might cause problems with portability, etc,. (Also, performance, and logic errors, etc,.)

 

The other tool, I believe, does live profiling, and should be more in line with what you wanted me to look into. :P

Edited by CodeMonkey
Link to comment
Share on other sites

Last post is a bit full, so, I'm making a new one. (Sorry.)

 

I've noticed that the game seems to be limited FPS wise, on the main menu it always floats around 60-63fps which points to a frame limiter of some sorts. (I see the same in game, if I look at the ground in a high FPS zone, I hit the same limit.)

 

When combined with V-Sync, it really seems to hurt performance, so, does the game have a frame limiter? Or is it just coincidence? (I've disabled V-Sync, because it had dropped some areas from 40-60fps to the 20's when enabled..)

 

Something feels a bit odd here, though, I have seen much higher FPS for a split second during loading, in the 100's, so it seems odd that it feels limited elsewhere. (Does anyone get steady FPS above the limit I listed(60-63fps) during actual gameplay?)

 

--

 

Btw, some live profiling of missions has revealed the following. (Very basic profiling.)

 

CPU Usage: 25% (ie, %25 of each of my four cores, in other words, it would be maxing out a single core CPU.)

Mem Usage: 30% (I have 4gb, 1gb is assigned to my onboard GPU(shared), so, 30% of 3gb, is roughly 1gb)

 

So, there is definitely room for improvement, it's barely using my hardware, I'll try to run some GPU profilers to figure out what sort of usage it's seeing. (Load\VRam, etc,.)

 

For the sake of clarity, just because the game would be maxing out a single core processor does not mean that it's using the power efficiently, or at all. If you loop any code repeatedly, no matter how simple\complex it is, you will see the same result. (ie, that's normal, and doesn't mean there isn't any room for improvement on single core CPU's. Didn't want to scare any single core users out there. :P)

 

Further profiling will reveal how it spends it's time, and tell us whether the power is being used efficiently or not. (It should also lead us to problem areas in the code.)

 

This result also tells me we could stand to take better advantage of modern high memory systems. (Maybe some sort of precache for common objects?)

Edited by CodeMonkey
Link to comment
Share on other sites

CodeMonkey, yes, the frame limiter has a hard-coded limit of 60 fps. This is needed for internal synchronization. The limiter can be disabled, but it seriously screws up the game's mechanisms (as for example in AI). However, there is a benchmark demo map that you use to measure performance. It has no AI, I think, and is only used to return raw fps. Maybe this will help you a bit.

My Eigenvalue is bigger than your Eigenvalue.

Link to comment
Share on other sites

Thanks for the replies. :) I thought it felt frame limited.

 

That can be the cause of many issues, but, changing it can be quite difficult depending on how they implemented it and the features that rely on it. (AI timing, physics timing, etc,.)

 

The majority of games aren't using this method, and even if they have FPS limiters, they can be disabled, and still function correctly. (Ideally, you want to limit time sensitive systems, but still keep most of the games updating unbound, as well as the rendering.)

 

You can look into "fixed timestep" and "variable timestep" and find many implementations, and discussions on the matter. (Including issues related to using VSync in conjunction with them, etc,.)

 

I'll have to see if I can locate the code, and figure out exactly what's happening, because this could be a major cause for bad performance in many of the games systems. (ie: If level loading is tied to this limit, it could be artificially increasing load times, etc,.)

Link to comment
Share on other sites

Yeah, I don't doubt that using binary formats sped it up dramatically, and I've already suggested TDM do that at some point. :)

 

My concern here is that code is being limited when it shouldn't be.

 

You want your level loading code to execute as many times as it can per frame, and a frame limiter could prevent this, it can only execute x times per frame, generally, x will be the same across systems. (The same code that keeps animations in sync, could be working against us, and keeping loading in sync, when you really want it out of sync and running as fast as possible.)

 

I've tried finding the code, but can't, the source is massive, you said it can be disabled, can you tell me how? (If it's a CVar, I can hunt down usage of the CVar to find the code more quickly, etc,.)

Edited by CodeMonkey
Link to comment
Share on other sites

com_preciseTic is the cvar.

 

Thanks. :)

 

-----------------

 

So, I got my profiler plugged in, and have started running some test, here's my first result.

 

idDict::FindKey

Hits: 1,075,226

Average Time: 114 ms (Not bad across a million executions..)

Percentage: 41%

 

idDict::Clear

Hits: 1,252

Average Time: 95 ms (This is kind of slow considering FindKey had a million hits.)

Percentage: 34%

 

These were the most time consuming functions that I tested in idDict, the percentages are related to the functions tested total, so, out of the following functions, the two above took up about 75% of their total execution time. (I hope that makes sense?)

 

idDict::Clear
idDict::Set
idDict::FindKeyIndex
idDict::FindKey
idDict::MatchPrefix
idDict::Delete
idDict::SetDefaults
idDict::operator =

 

The rest of the functions were mostly negligible, Set was the next highest, at 16%, taking 44ms across 70,000 hits. (Basically, idDict looks to be fairly efficient, note, I won't report everything I find in the future, only problem areas, this was more of a practice run, for testing, reporting, and formatting, etc,.)

 

----

 

The profiler I'm using is "Shiny" it's more of a targeted profiler than a global one. A global profiler would work a bit better for finding hotspots, but, "Shiny" is quite good for quickly checking "suspect" code. (I'm still looking for a good global profiler that's free, and easy to use, etc,. That will speed up the process quite a bit, then "Shiny" can be used for more detailed work after locating any problem areas, etc,.)

Edited by CodeMonkey
Link to comment
Share on other sites

Yeah, they're still plain text after they've been extracted from the binary .resources file, but I would have expected that. The .resources files being binary help speed up load times. I've read that BFG has threading as well, so that's probably speeding things up too.

 

I would advise waiting until the BFG source code has been made public before starting any real work on stuff like this. If we're able to back port the new binary systems and threading into our code base, that will save a lot of work.

Link to comment
Share on other sites

Yeah, Carmack mentions the mesh\animation files as being some of the biggest offenders in that interview I posted.

 

I'm not sure I'd count on BFG's source getting released. (Carmack wants to release it from my understanding, but, it sounded like he wasn't the only one involved in making that decision.)

 

Anyways, no decisions have been made, and certainly not by me, I've merely made suggestions for improving performance, and the team has been kind enough to humor me. :P (In the end, I imagine they'll compare all the options available, and make the best choices for TDM, and it's users.)

Link to comment
Share on other sites

(Carmack wants to release it from my understanding, but, it sounded like he wasn't the only one involved in making that decision.)

 

He already announced last week that the release was given the greenlight and most of the preparation was finished. Word is that it will be released soon. That's why I was saying it would probably be a good idea not to get ahead of ourselves and reinvent too much stuff when it has already been done.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recent Status Updates

    • The Black Arrow

      Hope everyone has the blessing of undying motivation for "The Dark Mod 15th Anniversary Contest". Can't wait to see the many magnificent missions you all may have planned. Good luck, with an Ace!
      · 0 replies
    • Ansome

      Finally got my PC back from the shop after my SSD got corrupted a week ago and damaged my motherboard. Scary stuff, but thank goodness it happened right after two months of FM development instead of wiping all my work before I could release it. New SSD, repaired Motherboard and BIOS, and we're ready to start working on my second FM with some added version control in the cloud just to be safe!
      · 2 replies
    • Petike the Taffer  »  DeTeEff

      I've updated the articles for your FMs and your author category at the wiki. Your newer nickname (DeTeEff) now comes first, and the one in parentheses is your older nickname (Fieldmedic). Just to avoid confusing people who played your FMs years ago and remember your older nickname. I've added a wiki article for your latest FM, Who Watches the Watcher?, as part of my current updating efforts. Unless I overlooked something, you have five different FMs so far.
      · 0 replies
    • Petike the Taffer

      I've finally managed to log in to The Dark Mod Wiki. I'm back in the saddle and before the holidays start in full, I'll be adding a few new FM articles and doing other updates. Written in Stone is already done.
      · 4 replies
    • nbohr1more

      TDM 15th Anniversary Contest is now active! Please declare your participation: https://forums.thedarkmod.com/index.php?/topic/22413-the-dark-mod-15th-anniversary-contest-entry-thread/
       
      · 0 replies
×
×
  • Create New...