Jump to content
The Dark Mod Forums


  • Posts

  • Joined

  • Last visited

  • Days Won


CodeMonkey last won the day on October 5 2012

CodeMonkey had the most liked content!


3 Neutral

Recent Profile Visitors

215 profile views
  1. Yeah, Carmack mentions the mesh\animation files as being some of the biggest offenders in that interview I posted. I'm not sure I'd count on BFG's source getting released. (Carmack wants to release it from my understanding, but, it sounded like he wasn't the only one involved in making that decision.) Anyways, no decisions have been made, and certainly not by me, I've merely made suggestions for improving performance, and the team has been kind enough to humor me. (In the end, I imagine they'll compare all the options available, and make the best choices for TDM, and it's users.)
  2. Thanks. ----------------- So, I got my profiler plugged in, and have started running some test, here's my first result. idDict::FindKey Hits: 1,075,226 Average Time: 114 ms (Not bad across a million executions..) Percentage: 41% idDict::Clear Hits: 1,252 Average Time: 95 ms (This is kind of slow considering FindKey had a million hits.) Percentage: 34% These were the most time consuming functions that I tested in idDict, the percentages are related to the functions tested total, so, out of the following functions, the two above took up about 75% of their total execution time. (I hope that makes sense?) idDict::Clear idDict::Set idDict::FindKeyIndex idDict::FindKey idDict::MatchPrefix idDict::Delete idDict::SetDefaults idDict::operator = The rest of the functions were mostly negligible, Set was the next highest, at 16%, taking 44ms across 70,000 hits. (Basically, idDict looks to be fairly efficient, note, I won't report everything I find in the future, only problem areas, this was more of a practice run, for testing, reporting, and formatting, etc,.) ---- The profiler I'm using is "Shiny" it's more of a targeted profiler than a global one. A global profiler would work a bit better for finding hotspots, but, "Shiny" is quite good for quickly checking "suspect" code. (I'm still looking for a good global profiler that's free, and easy to use, etc,. That will speed up the process quite a bit, then "Shiny" can be used for more detailed work after locating any problem areas, etc,.)
  3. Yeah, I don't doubt that using binary formats sped it up dramatically, and I've already suggested TDM do that at some point. My concern here is that code is being limited when it shouldn't be. You want your level loading code to execute as many times as it can per frame, and a frame limiter could prevent this, it can only execute x times per frame, generally, x will be the same across systems. (The same code that keeps animations in sync, could be working against us, and keeping loading in sync, when you really want it out of sync and running as fast as possible.) I've tried finding the code, but can't, the source is massive, you said it can be disabled, can you tell me how? (If it's a CVar, I can hunt down usage of the CVar to find the code more quickly, etc,.)
  4. Thanks for the replies. I thought it felt frame limited. That can be the cause of many issues, but, changing it can be quite difficult depending on how they implemented it and the features that rely on it. (AI timing, physics timing, etc,.) The majority of games aren't using this method, and even if they have FPS limiters, they can be disabled, and still function correctly. (Ideally, you want to limit time sensitive systems, but still keep most of the games updating unbound, as well as the rendering.) You can look into "fixed timestep" and "variable timestep" and find many implementations, and discussions on the matter. (Including issues related to using VSync in conjunction with them, etc,.) I'll have to see if I can locate the code, and figure out exactly what's happening, because this could be a major cause for bad performance in many of the games systems. (ie: If level loading is tied to this limit, it could be artificially increasing load times, etc,.)
  5. Last post is a bit full, so, I'm making a new one. (Sorry.) I've noticed that the game seems to be limited FPS wise, on the main menu it always floats around 60-63fps which points to a frame limiter of some sorts. (I see the same in game, if I look at the ground in a high FPS zone, I hit the same limit.) When combined with V-Sync, it really seems to hurt performance, so, does the game have a frame limiter? Or is it just coincidence? (I've disabled V-Sync, because it had dropped some areas from 40-60fps to the 20's when enabled..) Something feels a bit odd here, though, I have seen much higher FPS for a split second during loading, in the 100's, so it seems odd that it feels limited elsewhere. (Does anyone get steady FPS above the limit I listed(60-63fps) during actual gameplay?) -- Btw, some live profiling of missions has revealed the following. (Very basic profiling.) CPU Usage: 25% (ie, %25 of each of my four cores, in other words, it would be maxing out a single core CPU.) Mem Usage: 30% (I have 4gb, 1gb is assigned to my onboard GPU(shared), so, 30% of 3gb, is roughly 1gb) So, there is definitely room for improvement, it's barely using my hardware, I'll try to run some GPU profilers to figure out what sort of usage it's seeing. (Load\VRam, etc,.) For the sake of clarity, just because the game would be maxing out a single core processor does not mean that it's using the power efficiently, or at all. If you loop any code repeatedly, no matter how simple\complex it is, you will see the same result. (ie, that's normal, and doesn't mean there isn't any room for improvement on single core CPU's. Didn't want to scare any single core users out there. ) Further profiling will reveal how it spends it's time, and tell us whether the power is being used efficiently or not. (It should also lead us to problem areas in the code.) This result also tells me we could stand to take better advantage of modern high memory systems. (Maybe some sort of precache for common objects?)
  6. @Everyone, thanks for coming to my defense, I appreciate it. Ah, so you've went to work on stuff like that, good to hear, base classes like that tend to get overlooked. You might like this article, it details the creation of StarCraft. (Has a similar LinkedList implementation that's worth checking out.) http://www.codeofhon...to-linked-lists Check these out, it's static analysis of Doom3. (Saves us a little trouble, though we probably still need to run our own..) http://www.viva64.com/en/b/0120/ http://www.viva64.com/en/b/0151/ Another thing I want to mention, Carmack hints at the script interpreter in another interview as being a large cause of issues, and I've already pointed out how he said it's related to slow loading times(lex\parse). I have experience with integrating .Net as a scripting language, and many modern games are doing this over Lua\Python. (That may be something to consider in the future, it's actually fairly easy to do, and I already have some code from another project.) http://www.codingthe...sidered-harmful Anyways, yeah, if you want me to focus on hunting down performance issues, I can do that. It would probably be logical to find existing issues, and fix them, then, if necessary go further with any code rewrites, OpenMP, etc,. Alright, I'll see what I can do, and see I'll you in a few days. ---- Edit: I've found some free static analyzers. http://stackoverflow...s-are-available I'm using "C++ Check" and "Code Analyst" (from AMD), definitely worth checking out, I'm running the first as we speak, and it's already pointing out some issues that might cause problems with portability, etc,. (Also, performance, and logic errors, etc,.) The other tool, I believe, does live profiling, and should be more in line with what you wanted me to look into.
  7. Do ya think? I'm sure glad captain obvious is here to point out things like this. Here I thought that I'd made the game run 4 times faster, and it turns out, it was just test code from my sandbox. (I'm so inept that I can't tell the difference between the codebases, thanks for the assist, guy.) Also, I'd have never thought to test my code, I mean, I've only been programming for over 15 years, I'm a reverse engineer, I know ASM, C\C++\CLI\C#\Etc, and about 20 scripting languages, etc,. But, thanks for that hot tip. (Gets notebook, scribbles, "test code"...) Any other gems you wanna lay on me? What it does isn't the point... It's designed to take a measurable amount of time to complete, that's it. (As in, I can measure it, use OpenMP, measure it again, and see the different results.) The code does exactly what it's designed to do... Running your mouth doesn't equate to helping either, but, that sure isn't stopping you... Don't you have some crappy Java code to be writing?
  8. @Tels, I'll post the OpenMP code a bit later(a day or so?), it's not quite done yet. (I still need to run some tools that can detect possible bugs in the parallel code, etc,.) Did you see the code I posted for input, it's on the last page too? That should help with the bugs with the lantern\spyglass toggle. @i30817, I'll check out gprof, and there are some tools made just for this I'm looking into as well. (Right now, I use FPS, and a standard CTime based benchmark for testing, I also use a sandbox when possible, so I can really see the results.) About the loading bottleneck, even John Carmack said, it's due to using plain text formats, and parsing them during level load, short of changing to a binary format, there isn't much that can be done there. However, loading is more complex than that, that's just one step, I'm trying to catch the later steps, and speed them up, since that doesn't require any huge change in formats. --- Edit: GProf is for GCC, I'm using Visual Studio 2010. I found a few free profilers to test out: "Valgrind" and "Shiny" I'm also looking for some free static analyzers for parallel code, I've found PayWare versions, but, I'm not really looking to pay money for something I likely won't use that much, they do have trials, but, I'm not sure how well that will work with something as big as TDM. (Usually trials are limited by size, etc,.) Anyways, I've been practicing in my sandbox with OpenMP, and I'm seeing impressive results. #pragma optimize("", off) void Test1(bool useOMP) { if (useOMP) { long double i = 0; #pragma omp parallel for reduction(+:i) for(int t=1; t < 60000000; t++) { long double k = 0.7; for(int n=1; n < 16; n++) { i += pow(k,n); } } Print("Value: " + ToString(i)); } else { long double i = 0; for(int t=1; t < 60000000; t++) { long double k = 0.7; for(int n=1; n < 16; n++) { i += pow(k,n); } } Print("Value: " + ToString(i)); } } #pragma optimize("", on) The OpenMP version of the code is winning by a large margin. (Edit: Revised results) Normal: 9,000 (Rough average) OpenMP: 2,500 (Rough average) That's using CTime, and difftime, which I think is reporting the difference in milliseconds? (You can see, I'm also printing the output of the function to verify both are getting the same result.) Here's another useful guide for OpenMP. (It's on the PayWare page for VivaMP, which is part of PVS-Studio now.) http://www.viva64.com/en/a/0054/
  9. The most intensive programs in the world are scientific, and they rely on multiple CPU's, multiple cores, and GPU stream processors, etc,. What does any of that have to do with parallel processing?
  10. Don't be that guy.... ------ Parallelism has been used in many games, and applications with great success, it's all about implementation, and code design, etc,. Anyways, if you have any real knowledge of parallel programming, feel free to contribute that knowledge to the project in a more beneficial way than, meh, it probably won't work.
  11. Yeah, it's mainly a personal issue, nothing to do with the team, etc,. I normally only do Windows programming, and rarely in C++, and, having to account for several builds, across several OS's, and compilers, etc, not to mention working with such a large unknown codebase(Read: I'm NOT very familiar with the game engine), is making it hard to find a place to contribute directly. Anyways, the code above is pretty cool, it can be externally updated, and initialized, thus, no changes are really needed to the class itself, and you can retrofit it to work with about any system since it uses a generic type T, so, hopefully it can be useful in cutting down on the number of input related issues. As for getting on the team, etc, we'll just have to see how it goes, either way works for me, I'm just trying to help out if I can.
  12. I think I'll just go back to offering code that I think will be useful, you can use it at your discretion, modify it as needed, etc,. Here is a SUPER generic class for handling input, it should work with about any code, if properly fitted. (You'll need to update it somehow, as shown in my last post.) #include <map> template <class T> class InputManager { public: // Previous State std::map<T, bool> PreviousState; // Current State std::map<T, bool> CurrentState; // Key Pressed bool KeyPressed(T key) { if (PreviousState.at(key) == false && CurrentState.at(key) == true) return true; else return false; } // Key Released bool KeyReleased(T key) { if (PreviousState.at(key) == true && CurrentState.at(key) == false) return true; else return false; } // Key Held bool KeyHeld(T key) { if (PreviousState.at(key) == true && CurrentState.at(key) == true) return true; else return false; } }; Example Usage: #include "InputManager.h" int _tmain(int argc, _TCHAR* argv[]) { InputManager<int> input; input.PreviousState[1] = false; input.CurrentState[1] = true; Print(input.KeyHeld(1)); Print(input.KeyPressed(1)); Print(input.KeyReleased(1)); system("pause"); return 0; }
  13. MPI is another possibility, it's part of Boost, which is already being used by TDM, however, it looks a bit harder to use. ---- Anyways, I've got v1.08 compiling now, and I'm taking some time to see how it all works, there were a LOT of changes since v1.07. As far as contributing goes, hopefully now that I have a newer version it will become a bit easier. (There weren't many issues related to v1.07 that I could just jump in on, most were too complex(AI, etc,), or required input from the team about design choices, etc,.) I'd be better suited to fixing actual broken code logic, at least until I'm on the same page as everyone else. (Even then, it can be an issue.) Here's an example: There seems to be issues with controls, for example, holding the "toggle" button for the lantern causes it to repeatedly turn on\off. (That's just one example, there are many little control issues, I've gotten stuck in "crouch" mode, etc,.) Part of the problem seems to stem from not having a fully integrated key state tracking mechanism. In C#, I would do something like this. Dictionary<key, bool> PreviousState; Dictionary<key, bool> CurrentState; void UpdateStates() { PreviousState = CurrentState; // Check the current states CurrentStates = some method of updating them. (Likely, iterating, and testing each key.) // Example Update foreach (Key k in BoundKeys) if (ReadRawState(k) == true) // Key down, tested with input API. (GetAsyncKeyState\DX\GL\Whatever.) CurrentState[k] = true; else CurrentState[k] = false; } // AKA: KeyDown bool KeyPressed(Key k) { // If the key wasn't pressed in the last state, but is pressed now, it was just "pressed". if (PreviousState[k] == false && CurrentState[k] == true) return true; else return false; } bool KeyHeld(Key k) { // If the key was down in both states, it's being held. if (PreviousState[k] == true && CurrentState[k] == true) return true; else return false; } That's rough code off the top of my head, but, it shows a mechanism for more easily tracking key states. As you can see, while fixing the control issue would be nice, my idea of fixing it, would require a massive change to how keys are handled, and could have negative effects on the project. (ie, I don't feel comfortable doing it, since I don't fully understand the ramifications of altering the current key handling mechanism.) This would however make the bug with lantern\compass\spyglass toggles easy to fix. if (KeyPressed(toggleKey)) // Do action. (ie, toggle lantern\compass\spyglass, etc,.) This code wouldn't allow the bug to occur, since the state tracking code can tell the difference between pressed, and held keys. Edit: Speaking of broken code, why are the code tags ruining the tabbing on my code samples. ? (It's fine in the editor view.)
  14. I haven't fully tested it yet, but, at a glance, it's a great update. The audio has improved greatly, the graphics quality seems better, and it runs much smoother now, etc,. One thing I've noted, is that it doesn't seem possible to unbind a key via the settings menu. (I'm aware of alternate means to accomplish this, but, I haven't been able to figure out how to do it from the menu, so, if it's already possible, let me know, if not, I suggest adding that in a future update.) Also, I did have an NPC rush into combat after getting a glance of me, which I thought was fixed? (I might have been being too obvious, I'll have to test it some more.) Anyways, I'd like to say thanks to everyone involved with making it all happen, I really appreciate you doing this, and sharing it with us.
  15. The best approach would be to carefully pick out the most used, or most intensive code to implement it on, rather than trying to put it everywhere. idDict, is a good example, it's used by many of the games systems, so, a few simple edits there, can affect large portions of the game, yet, it's easier to track changes here, than it would be if you were making thousands of broad changes to random code, etc,. So, I don't think there will be too many problems, and the potential gains are amazing. (OpenMP let's you run ANY code you want across multiple cores on a CPU in parallel.) Here's how easy it is to implement. for (int i=0; i < 1000; i++) { // Do something.. } That's an example of a normal "for" loop. (This code does not take advantage of multicore CPU's, it will run on a single core, regardless of how many you have.) #pragma omp parallel for for (int i=0; i < 1000; i++) { // Do something.. } This is how you implement OpenMP support. This code will now run much faster because it will be executed on multiple threads across mutliple cores on your CPU. (ie, a quad core might run this code 3-4 times faster since it's now being executed in parallel across all your cores.) OpenMP automatically figures out how many threads to make, and how many cores you have, etc, so, the code will only get faster as more cores are added. (It doesn't hurt single core performance either, it simply executes the normal loop in that case.) ---- I should note, OpenMP has many directives for use in specific scenarios, but, most are just as easy to use as shown above. Here's a guide I've been using. http://bisqwit.iki.fi/story/howto/openmp/
  • Create New...