Exceptions in DarkRadiant

stgatilov · August 7, 2021

I was trying to understand why DarkRadiant camera is so damn slow even in release, and it turned out that exceptions are the problem.

Visual Studio debugger breaks program execution on EVERY exception thrown, even if the exception is caught immediately. Then it checks Exception Settings: did user want to stop on this particular type of exception? If yes, then execution breaks, and if not, debugger continues execution.
The "break on exceptions of type" feature of debugger can be quite useful. However, having lots of exceptions thrown all the time turns it useless.

Breaking and resuming debuggee all the time is VERY slow.
So when something throws std::bad_cast every frame, rendering become 20 times slower.

Here is the pull request: https://github.com/codereader/DarkRadiant/pull/16

I think the same problem happens on map load: std::invalid_argument is thrown in huge quantities.
And it is easy to fix it: just add an utility function "bool ParseFloat(const std::string& text, float& result);" implemented via std::stdtof, and call it everywhere instead of calling std::atof and catching exception.

I'm afraid C++ is not like Python, and exceptions should be used only in exceptional situations.

stgatilov · August 7, 2021

Regarding actual rendering performance in Release build, the following is probably not a good idea:

std::map<Shader*, LitRenderables> _litRenderables;

Better have one large std::vector and push_back records to it.
When collection is over, std::sort it by shader, then extract intervals with same shader into something like std::vector<std::pair<int, int>>.
And do not shrink these vectors between frames, of course.

Most likely it will help... but can't be sure of course.

HMart · August 7, 2021

I'm not a programmer of the caliber of any of you, being coding in c++ on and off for 3 years only but I've heard from expert programmers that I have total confidence (Jonathan Blow and Casey Muratori) that exceptions are bad and they do more harm than they help, John Carmack does seem to use them, as anyone can see by looking at the idtech 4 code but is also apparently used very sparingly, like you advise.

But your find reinforces the advice from Jonathan and Casey, don't use them at all, so personally even thou Jonh Carmack may have used them and he is a master programmer, personally I haven't been using exceptions at all, only assert or even just a simple goto (this last one I've read, some high level languages like JAVA, masquerade as exceptions).

https://youtu.be/UNXHK8O-B_g?t=5630

http://xahlee.info/comp/why_i_hate_exceptions.html

Jonathan Blow about his JAI language

Quote

Jai will not have:

Smart pointers

Garbage collection

Automatic memory management of any kind

Templates or template metaprogramming

RAII

Subtype polymorphism

Exceptions

References

A virtual machine (at least, not usually—see below)

A preprocessor (at least, not one resembling C’s—see below)

Header files

If it sounds odd to you that Jai is a modern high-level language but does not have some of the above features, then consider that Jai is not trying to be as high-level as Java or C#. It is better described as trying to be a better C. It wants to allow programmers to get as low-level as they desire. Features like garbage collection and exceptions stand as obstacles to low-level programming.

cabalistic · August 7, 2021

42 minutes ago, HMart said:

personally I haven't been using exceptions at all, only assert or even just a simple goto

Careful, assertions and exceptions are meant for very different things, and you shouldn't mix those. Assertions are meant, for example, to check preconditions and contracts for your code, where a failing assertion almost certainly means a programmer error. Exceptions, on the other hand, are one way to perform error handling during runtime, where errors can occur due to user input or system failures and different reasons.

And yeah, exceptions certainly have their problems and, outside of Python, should not be used for simple flow control. But the alternatives are not any less controversial, either.

stgatilov · August 7, 2021

1 hour ago, HMart said:

John Carmack does seem to use them, as anyone can see by looking at the idtech 4 code but is also apparently used very sparingly, like you advise.

Looking at TDM code, exceptions are only used when hard ERROR happens.
At this moment all code execution is stopped regardless of where common::Error was called (stack unwinding is the main profit of exceptions).

But if you look closer, Carmack never cared about freeing resources when exception triggers
Some memory that was allocated directly is leaked, and the same probably happens for some resources.
For modern C++ programming, that is considered completely unacceptable but I have never seen any real problem for TDM user due to it

Also, you should note that programmers do change.
At the moment of writing idTech4, Carmack started using C++ without any prior experience or learning. idTech4 is "C with classes" partly because its authors were C programmers without enough C++ experience. I would be happy to see how his code style evolved when he earned experience, but unfortunately no later idTech engine is open source

P.S. By the way, Google has banned exceptions in their code, as far as I remember.
And it seems gamedev companies sometimes even completely disable support for exceptions in build settings --- but that's too radical for normal C++ programming.

HMart · August 7, 2021

7 hours ago, cabalistic said:

Careful, assertions and exceptions are meant for very different things, and you shouldn't mix those. Assertions are meant, for example, to check preconditions and contracts for your code, where a failing assertion almost certainly means a programmer error. Exceptions, on the other hand, are one way to perform error handling during runtime, where errors can occur due to user input or system failures and different reasons.

And yeah, exceptions certainly have their problems and, outside of Python, should not be used for simple flow control. But the alternatives are not any less controversial, either.

You are certainly right, but I was not saying that assert substitutes code for runtime user error handling, it can be used for such a thing (in debug builds) but I use it mostly to catch my own errors, for frontend user errors I use mostly goto;

I know goto is hill advised and disliked by many programmers but I use it and I've yet to see any problem with it, but in no way am I saying is the best way to program.

b32 somefunc(void)
{

    if(this check fails) goto Error;

       ....

Error:
     {
        print("Error message!")
        return 0;
      }
}

I know is not always the best option but it works and idtech 4 besides exceptions also seems to use goto for error handling, personally, I've yet to find the need to use exceptions.

Edited August 7, 2021 by HMart

**greebo** · August 8, 2021

I'm happy to merge that pull request, thanks for setting it up. While it doesn't affect end-users, it's still worthwile to make dev life easier.

I'm curious though how you found out which exception was slowing down the movement? Can you observe such slowdowns in the profiler? Or is it the Output window where studio logs the thrown exceptions?

edit: ah, nevermind, I just figured it out. Was not too hard to find it, after all. The output window is spammed with the std::bad_cast messages when you move around.

stgatilov · August 8, 2021

37 minutes ago, greebo said:

edit: ah, nevermind, I just figured it out. Was not too hard to find it, after all. The output window is spammed with the std::bad_cast messages when you move around.

Yes. Also, you can enable all exceptions in Exception settings at some moment and see where it breaks.

stgatilov · August 8, 2021

By the way, what about std::stof during map load?

Should I refactor it too?
If yes, then where to put ParseFloat wrapper?

**greebo** · August 8, 2021

I'd prefer to have a look at that one myself, if you don't mind. But you can point me to any bad locations you stumble over.

I already had a look at the other exceptions myself when loading a smaller map and those are usually thrown in RotationMatrix::setFromAngleString, where an empty spawnarg value is attempted to be processed. I'm pretty sure we can circumvent most if not all of the exceptions by just adding a "value.empty()" check before trying to parse. If there are any more parsing failures causing exceptions, I'd like to see and count those occurrences first, maybe we don't need to change anything.

OrbWeaver · August 9, 2021

Let's not turn this into a long-winded "one true way to use C++" discussion. There is no way in hell we're going to start reverting to C89 idioms like using goto everywhere just because some programmer somewhere said "Exceptions are always bad and you don't need them, OK?". DarkRadiant may be a game editor but it is not itself a game, and we lean towards high-level C++ techniques not gamedev microoptimisations.

As Stgatilov observes, Carmack might be a graphics programming wizard but he is no way an expert on C++ techniques (or at least wasn't, when the D3 code was written), and I certainly wouldn't be relying on the idTech 4 coding style as a guide to building a C++ application.

However in DR we should still only be using exceptions for signalling actual errors, so if there are cases where exceptions are being thrown repeatedly because of normal parsing issues then we should definitely fix that. This would also help with debugging on Linux because "break on exception" will actually catch real errors instead of hundreds of useless internal exceptions.

Aside: I'm actually dabbling with Rust, which I find a very well-designed and pleasant modern language, and Rust has no exceptions at all by design. The approach they use is for functions to return a Result<Type, Err> object which contains both a return value an error flag to indicate success, and the language specifically requires you to handle the error condition otherwise it's a compilation error. Something similar could be achieved in C++ for parsing methods (without the compile-time enforcement of course) instead of using exceptions; personally I would prefer this to using "out parameters" which I find very cumbersome to use.

On 8/7/2021 at 12:13 PM, stgatilov said:
Regarding actual rendering performance in Release build, the following is probably not a good idea:
std::map<Shader*, LitRenderables> _litRenderables;
Better have one large std::vector and push_back records to it.
When collection is over, std::sort it by shader, then extract intervals with same shader into something like std::vector<std::pair<int, int>>.
And do not shrink these vectors between frames, of course.

Most likely it will help... but can't be sure of course.

I'd want to see hard profiling data before potentially complexifying the code with multiple data structures and sorting passes. When I've looked at rendering performance, I don't recall ever seeing a significant chunk of time taken up by basic data structure traversal. Most of the time is spent in math operations like applying and calculating texture matrices, and the hopelessly inefficient use of OpenGL (no VBOs for storing brush face vertices, for example).

stgatilov · August 9, 2021

I don't think messing with OpenGL commands is a good idea.
You won't win much easily, and redoing all the rendering is too complicated (and the more advanced OpenGL features you use, the more driver bugs you suffer from).

Note: I'm not sure there is much to win here. Maybe 20%... not more.

Here is profiling data from Behind Closed Doors (worst scene):

It seems that CollectRenderablesInScene does not do any OpenGL calls.
It is a bit sad to see std::map destructor eating 5% here (out of total 88%).

Here some time is spent on searching in map (1.3%), reserving 1024 elements (4.3%), and push_back (2.2%-??3.5%??) --- out of 10% spend in addRenderable.

This is from within submitToShaders (6.1%) --- 5% is spent on various data structures.

Once again: changing this won't immediately turn 5 FPS into 20 FPS.

OrbWeaver · August 9, 2021

Right, that matches what I remember from my last adventure with perf. A couple of percent here or there by bit-twiddling with data structures or implementing custom vector allocators, vs 40% in the backend render or 36% traversing the scenegraph.

The OpenGL time can't be reduced to 0 of course, but I find it hard to believe we would gain absolutely nothing from using VBOs to store brush faces and model geometry, and re-use these if possible rather than submitting everything completely from scratch on every frame using ancient and slow glBegin/glEnd calls.

HMart · August 9, 2021

3 hours ago, OrbWeaver said:

Let's not turn this into a long-winded "one true way to use C++" discussion. There is no way in hell we're going to start reverting to C89 idioms like using goto everywhere just because some programmer somewhere said "Exceptions are always bad and you don't need them, OK?". DarkRadiant may be a game editor but it is not itself a game, and we lean towards high-level C++ techniques not gamedev microoptimisations.

R: That was not my intention at all OrbWeaver! Like I said I'm a noob programmer, I also don't code in pure c++, is a mix of C99 and c++11 stuff, my code is far from pretty and well constructed and I'm lucky if it even works at all 90% of the time... so, far from me from trying to give you guys a lesson! If you think exceptions help, who am I to say you are wrong!

I was just passing the word of some very experienced programmers, that I personally admire, to me they are not simply "some programmer", and IMO we should take them more seriously than that, they have 30+ years of experience, in high performance professional software.

Casey having worked for RAD Game Tools and the main developer of Granny and bink2 , and helped with telemetry. He also worked in the AAA gaming industry, helped develop the tech for Age of Empires, Ultima, Guild Wars, Destiny, Gears of War, The Witness and many others. Is also the developer of refterm a "non serious" terminal that he wrote in some weeks, after he made a bug report to Microsoft, about abismal windows terminal performance and add a Microsoft employee tell him that "making a fast terminal would need a PHD and be very hard and extremely time consuming" so he did it...

Jonathan Blow IMO is a genius (Casey as well), at the level of John Carmack and Tim Sweeney, he maybe way less known but when you listen to him and see him code, he is way way above average, even I being a noob programer, I like to think I can detect when some programmers are worth listen too. About what he has done, this explains better than me.

You mentioned Rust, JB is also making his own low level programing language to replace c++ (how many programmers do that?), that even in (closed) beta, based on those that tried it, seems to be a fantastic language and way way better than C/C++, but obviously I have not tried it, for now, JB only lets professional programmers into the JAI beta. If you care about it, and want to try it, send him a email, I think he will let you in and you can see for yourself.

Just to put things in perspective JAI can run full programs (like a complex 2D game)... while the language itself is being compiled!! Can rust do that?

About goto, I totally gave the wrong idea! Sorry for that, I also don't use goto's everywhere! Specially not for simple errors that have no big impact on the program functionality or stability! That would be very over kill indeed!

As Stgatilov observes, Carmack might be a graphics programming wizard but he is no way an expert on C++ techniques (or at least wasn't, when the D3 code was written), and I certainly wouldn't be relying on the idTech 4 coding style as a guide to building a C++ application.

R: I don't think anyone is a expert at C++ the language is f complex but I know what you mean.

I do rely on the idtech 4 code style more than I should but I've been slowly growing to like the functional programing "data oriented design" style that Casey Muratori uses, more so than the OOP, like Stgatilove well said "C with classes", style that idtech 4 uses. I don't even think I'm learning to be a C++ programmer but a C one that uses some C++ features. Btw in no way, I'm saying this the best style to program in or that you or anyone, should change how you program, I hope that is clear.

Edited August 9, 2021 by HMart

stgatilov · August 9, 2021

54 minutes ago, OrbWeaver said:

The OpenGL time can't be reduced to 0 of course, but I find it hard to believe we would gain absolutely nothing from using VBOs to store brush faces and model geometry, and re-use these if possible rather than submitting everything completely from scratch on every frame using ancient and slow glBegin/glEnd calls.

If you don't need dynamicity, then you can compile a display list and render it when camera moves.
It is a very simple approach from base OpenGL, and it makes wonders (I remember I saw it working even faster than VBOs on NVIDIA).

Of course, if you change some settings or filters, then you need to recompile display list, which will take some time.

OrbWeaver · August 9, 2021

3 hours ago, stgatilov said:

If you don't need dynamicity, then you can compile a display list and render it when camera moves.
It is a very simple approach from base OpenGL, and it makes wonders (I remember I saw it working even faster than VBOs on NVIDIA).

Of course, if you change some settings or filters, then you need to recompile display list, which will take some time.

I think I did experiment with display lists at one point. They can be fast, but these days they are considered part of "legacy" OpenGL and not really recommended, so I probably wouldn't choose to switch to them unless the performance advantage was spectacular.

This is my latest profile from perf (I think it's missing most of the time spent in OpenGL drivers, hence only 56% time in total appears in the profile). What this seems to be saying is that repeatedly calculating light AABBs is a big waste of time, and we could probably achieve a noticeable speedup by caching them in the light object rather than recalculating them on every frame.

stgatilov · August 10, 2021

7 hours ago, OrbWeaver said:

I think I did experiment with display lists at one point. They can be fast, but these days they are considered part of "legacy" OpenGL and not really recommended, so I probably wouldn't choose to switch to them unless the performance advantage was spectacular.

Legacy OpenGL? I guess you should ask @cabalistic about it, but to me it's total bullshit.

Of course, you need a "fat" driver with deprecated stuff into order to use them. But I think it is only a problem if you want to use newer features (GL3 and later) at the same time. And I think even Linux drivers support deprecated profile today. As far as I understand, you use GL 2.1 and no more than that, and you rely on glBegin and lots of other stuff anyway.
In my opinion it is not wise trying to clean everything to Core profile. It will take a lot of time, result in complicated code, and will most likely be slower than it is now.
Did you ever try to reimplement immediate-mode rendering (glBegin/glEnd) in Core GL3 ? It is very hard to achieve the same speed as driver shows.

If you manage to put everything into one display list, performance improvement should be crazy, at least on NVIDIA driver. Of course, the results are heavily dependent on the driver. The reason they are rarely used today is that they are not flexible: you cannot modify parameters and enable/disable anything. Display list only fits if you have static sequence of OpenGL calls. Perhaps it won't help with lit preview, but basic "draw things where they are" view is OK.

Quote

This is my latest profile from perf (I think it's missing most of the time spent in OpenGL drivers, hence only 56% time in total appears in the profile). What this seems to be saying is that repeatedly calculating light AABBs is a big waste of time, and we could probably achieve a noticeable speedup by caching them in the light object rather than recalculating them on every frame.

Oh, I did not enable lit mode and profiled the basic one.

Yes, calculateLightIntersections is very slow here.
With L lights and E entities, you have O(L E) intersections in the loop here. That's a hot loop which should contain only a bare minimum of calculations in it.
The problem is that you compute light's bbox O(E) times per frame, and entity's bbox O(L) times per frame. Compute boxes once per frame for each thing, and the problem should go away.

After that you will have another problem:

Quote

// All lights we have received from the scene
std::list<const RendererLight*> _sceneLights;

I understand that you want to have stable pointers when you put objects into container by value.
But the truth is, linked lists are very slow to traverse (The Myth Of Ram should be read by every programmer who cares about performance). Unfortunately, DarkRadiant relies of linked lists in many places, and the same problem happens e.g. during scene traversal. Going through entities/brushes is very slow when they are in a linked list. Maybe use std::vector? Or at least std::deque if addresses must not be change?

In this particular case, you can copy _sceneLights into temporary std::vector and use vector in the loop (in calculateLightIntersections).
Same applies to bounding boxes: precompute them into std::vector-s before quadratic loop.

OrbWeaver · August 10, 2021

3 hours ago, stgatilov said:

If you manage to put everything into one display list, performance improvement should be crazy, at least on NVIDIA driver. Of course, the results are heavily dependent on the driver. The reason they are rarely used today is that they are not flexible: you cannot modify parameters and enable/disable anything. Display list only fits if you have static sequence of OpenGL calls. Perhaps it won't help with lit preview, but basic "draw things where they are" view is OK.

Well I'm not completely averse to using display lists if they would help. In fact I think we might already be using them for static models, which would explain why I never see static model geometry rendering appear in profiles. The advantage of display lists is that they are very easy to use — just "wrap" your drawing commands in a begin/end list command, and the whole thing is stored for later use.

We might not see a huge benefit from using one display list per brush winding though, because there would be lots of lists with only a handful of vertices. More useful would be wrapping the entire set of geometry for a single shader in a list, but that requires more complicated management.

3 hours ago, stgatilov said:

The problem is that you compute light's bbox O(E) times per frame, and entity's bbox O(L) times per frame. Compute boxes once per frame for each thing, and the problem should go away.

I didn't notice that it was multiple times per frame, not just once per frame. That would explain why calculating light AABBs takes up nearly 30% of my profile time.

3 hours ago, stgatilov said:

Going through entities/brushes is very slow when they are in a linked list. Maybe use std::vector? Or at least std::deque if addresses must not be change?

No problem in theory, that's a very simple change. But again, it's not appearing in the profile right now so I doubt it will make a noticeable difference. Perhaps once other things are optimised the structure traversal will start appearing as a larger performance cost.

Exceptions in DarkRadiant

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recent Status Updates