Jump to content
The Dark Mod Forums

Recommended Posts

Posted

Fulllbright Approach

The front-end render pass is reduced to a single call: onPreRender(const VolumeTest&) - when invoked, each node has the ability to check for any updates that have been happening since the last frame, like material changes or changed texture coordinates, new target lines.

Nodes are no longer submitting renderables to the collector. Instead, they grab a reference to the Shader from the RenderSystem (like before), and attach their geometry to it. The geometry will stay attached to the shader until it is updated or removed by the Node during a future onPreRender call or if it's removed from the scene.

Shaders provide a specialised API for the most common use cases: an API for brush windings (IWindingRenderer), an API for general purpose geometry (path boxes, target lines, vertices, quads) called IGeometryRenderer and an API for triangulated, oriented surfaces (models) called ISurfaceRenderer. The Nodes will not know how the shader is dealing with their data, but they will receive a numeric Slot Handle that will allow them to update or remove their geometry later. The above IWhateverRenderer implementations are designed to internally combine as many objects as possible.

No distinction between Orthoview rendering and Camera rendering (renderWireframe and renderSolid are gone). It's all about the shaders, they know whether they are suitable for rendering in one of these view types, or both.

The Shader implementation provide a drawSurfaces() method that is invoked by a shader pass during the back end rendering phase. This will set up the glEnableClientState() calls and submit the data through glDrawElements.

Windings

To achieve fewer draw calls, all windings of a given size (more than 90% of the faces have 4 vertices) will be packed together into a single CompactWindingVertexBuffer that stores all windings of that material into a single large, indexed vertex array. Winding removal and re-addition is fast the buffer will keep track of empty slots and is able to re-fill them quickly with a new winding of the same size. Index generation is using a templated WindingIndexer class that is creating indices for GL_LINES, GL_POLYGON and GL_TRIANGLES. It is up to the Shader to decide which indexing method is used, orthoview shaders are using GL_LINES, while camera preview is using GL_TRIANGLES.

Every winding is specified in world coordinates.

Geometry

This is the API used by patches, entity boxes, light volumes, vertices, etc. Objects can choose the GeometryType they are rendering: Lines, Points, Triangles and Quads. The Shader will internally sort the objects into separate buffers for each primitive type, to submit a single draw call for all the objects sharing the same type. All Geometry is using world coordinates.

Surfaces

This API is similar to the Geometry API, but here no data is actually submitted to the shader. Instead, IRenderableSurface objects are attached to the shader, which provide a getSurfaceTransform() method that will be used to set up the model matrix before submitting the draw calls. Surface vertices are specified in local coordinates.

Highlighting

The shader API provides an entry point to render a single object when it is selected. This is going to be much slower than the usual draw calls, but the assumption is that only a small portion of all map objects is selected at the same time.

Vertex Storage

While the data is now stored in the shader, it's still in the main RAM. No VBOs have been used yet, that would be a logical next optimisation step.

Results

With the above changes, the amount of draw calls in a fairly sized map when from 80k down to a few hundred. While the first attempts of combining the brushes doubled the frame rate of my benchmark map (using the same position and view angles, drawing it 100 times), this later went down to a 30% speed improvement after migrating the model surfaces. It turns out that rendering the models using display lists is really fast, but it violated the principle of moving the calls to the backend. It has to be taken into account that after the changes, the vertex data is still stored the main memory, not in the VBO.

grafik.png

  • Like 1
Posted

Overall these changes sound excellent. You have correctly (as far as I can tell) identified the major issues with the DR renderer and proposed sensible solutions that should improve performance considerably and leave room for future optimisations. In particular, trying to place as much as possible in a big chunk of contiguous RAM is exactly the sort of thing that GPUs should handle well.

Some general, high-level comments (since I probably haven't even fully understood the whole design yet, much less looked at the code).

Wireframe versus 3D

I always thought it was dumb that we had different methods to handle these: at most it should have been an enum/bool parameter. So it's good to see that you're getting rid of this distinction.

Unlit versus lit renders

As you correctly point out, these are different, particularly in terms of light intersections and entity-based render parameters (neither of which need to be handled in the unlit renderer), so it makes sense to separate them and not have a load of if/then statements in backend render methods which just slow things down.

However, if I'm understanding correctly, in the new implementation almost every aspect will be separate, including the backend data storage. Surely a lot of this is going to be the same in both cases — if a brush needs to submit a bunch of quads defined by their vertices, this operation would be the same regardless of whatever light intersection or GLSL setup calculations were performed first? Even if lighting mode needs extra operations to handle lighting-specific tasks, couldn't the actual low-level vertex sorting and submission code be shared? If double RAM buffers and glFenceSync improves performance in lit mode, wouldn't unlit mode also benefit from the same strategy?

I guess another way of looking at is is: could "unlit mode" actually be a form of lit mode where lighting intersections were skipped, submitted lights were ignored, and the shader was changed to return full RGB values for every fragment? Or does this introduce performance problems of its own?

Non-const shaders

I've never liked the fact that Shaders are global (non-threadsafe) modifiable state — it seems to me that a Shader should know how to render things but should not in itself track what is being rendered. Your changes did not introduce this problem and they don't make it any worse, so it's not a criticism of your design at all, but I wonder if there would be scope to move towards a setup whereby the Shaders themselves were const, and all of the state associating shaders with their rendered objects was held locally to the render operation (or maybe the window/view)?

This might enable features like a scrollable grid of model previews in the Model Selector, which I've seen used very effectively in other editors. But perhaps that is a problem for the future rather than today.

Winding/Geometry/Surface

Nothing wrong with the backend having more knowledge about what is being rendered if it helps optimisation, but I'm a little unclear on the precise division of responsibilities between these various geometry types.

A Winding is an arbitrary convex polygon which can be rendered with either GL_LINES or GL_POLYGON depending on whether this is a 2D or 3D view (I think), and most of these polygons are expected to be quads. But Geometry can also contain quads, and is used by patches which also need to switch between wireframe and solid rendering, so I guess I'm not clear on where the boundary lies between a Winding and Geometry. 

Surface, on the other hand, I think is used for models, but in this case the backend just delegates to the Model object for rendering, rather than collating the triangles itself? Is this because models can have a large variation in the number of vertices, and trying to allocate "slots" for them in a big buffer would be more trouble than it's worth? I've never had to write a memory allocator myself so I can can certainly understand the problems that might arise with fragmentation etc, but I wonder if these same problems won't rear their heads even with relatively simple Windings.

Render light by light

Perfect. This is exactly what we need to be able to implement things like shadows, fog lights etc (if/when anybody wishes to work on this), so this is definitely a step in the right direction.

Overall, these seem like major improvements and the initial performance figures you quote are considerable, so I look forward to checking things out when it's ready.

  • Thanks 1
Posted

First of all, thanks for taking the time to respond, it has been getting wordier than I anticipated.

7 hours ago, OrbWeaver said:

Wireframe versus 3D

I always thought it was dumb that we had different methods to handle these: at most it should have been an enum/bool parameter. So it's good to see that you're getting rid of this distinction.

Yes, the distinction is in the shaders now. There is still a possibility to distinguish these two, since the VolumeTest reference provides the fill() check, so some onPreRender() methods are reacting to this and prepare different renderables. It's still necessary at this point, since some wireframe renderables are calling for a different appearance. This doesn't mean that it can't get any simpler though.

The objects are still calling for a coloured line shader, like <0 0 1> for a blue one. In principle, now that the vertex colour is shipped along with the geometry data, the colour distinction in the shader itself is maybe not even necessary anymore. There could be a single line shader, used to draw stuff in the orthoview.

Posted
7 hours ago, OrbWeaver said:

Unlit versus lit renders

....

However, if I'm understanding correctly, in the new implementation almost every aspect will be separate, including the backend data storage. Surely a lot of this is going to be the same in both cases — if a brush needs to submit a bunch of quads defined by their vertices, this operation would be the same regardless of whatever light intersection or GLSL setup calculations were performed first? Even if lighting mode needs extra operations to handle lighting-specific tasks, couldn't the actual low-level vertex sorting and submission code be shared? If double RAM buffers and glFenceSync improves performance in lit mode, wouldn't unlit mode also benefit from the same strategy?

I guess another way of looking at is is: could "unlit mode" actually be a form of lit mode where lighting intersections were skipped, submitted lights were ignored, and the shader was changed to return full RGB values for every fragment? Or does this introduce performance problems of its own?

I suspect it's all about the draw calls. In fullbright mode DR is now invoking much fewer GL calls compared to lit render mode. There's not so much difference when it comes to the oriented model surfaces, these are really almost the same (and are using the same vertex storage too), here it's the brushes and patches.

Lit mode is grouping by entity - to regain the advantage of submitting everything in one go, it would need to dissolve that grouping information, which has to happen every frame. I think this is going to be too taxing. Maybe when lit mode is more optimised, we can try to merge the two modes.

You've made a correct observation about the backend data storage though: the geometry and winding renderers are (at the moment) not sharing their vertex data between the two modes, memory is duplicated and copied around often. That's not good.

The main reason for this duplication is the chronological order I adjusted the renderer. I was chewing by this starting with fullbright mode, first brushes, then patches, then models, finally the visual aids like lines and points. After that I was moving forward to do the research on lit mode, and all that reflects in the code. I admit that I took this approach on purpose: when starting, I didn't have the full grasp of what is going to be necessary, I had to learn along the way (and aim for not getting burnt out half-way through). Now the full picture is available, the thing can be further improved, and the storage is probably among the first things that need to be optimised.

Posted
7 hours ago, OrbWeaver said:

Non-const shaders

I've never liked the fact that Shaders are global (non-threadsafe) modifiable state — it seems to me that a Shader should know how to render things but should not in itself track what is being rendered. Your changes did not introduce this problem and they don't make it any worse, so it's not a criticism of your design at all, but I wonder if there would be scope to move towards a setup whereby the Shaders themselves were const, and all of the state associating shaders with their rendered objects was held locally to the render operation (or maybe the window/view)?

This might enable features like a scrollable grid of model previews in the Model Selector, which I've seen used very effectively in other editors. But perhaps that is a problem for the future rather than today.

Yes, this is interesting. It's achievable, with some cost, of course. Right now, the Shaders themselves implement the interfaces IWindingRenderer, IGeometryRenderer and ISurfaceRenderer. A different authority could implement these interfaces, but it needs to map the objects to the Shaders somehow (likely by using a few std::maps). The renderer then calls that authority to deliver that information, this way we can separate that information.

The fullbright backend renderer needs that info when processing the sorted shader passes. Currently they ask their owning shader to draw its surfaces, this has to be moved elsewhere.

The lighting mode renderer is using the objects as delivered by the render entities, this is not involving the Shader doing the housekeeping. So this renderer is already heading more towards this direction.

Posted
8 hours ago, OrbWeaver said:

Winding/Geometry/Surface

Nothing wrong with the backend having more knowledge about what is being rendered if it helps optimisation, but I'm a little unclear on the precise division of responsibilities between these various geometry types.

A Winding is an arbitrary convex polygon which can be rendered with either GL_LINES or GL_POLYGON depending on whether this is a 2D or 3D view (I think), and most of these polygons are expected to be quads. But Geometry can also contain quads, and is used by patches which also need to switch between wireframe and solid rendering, so I guess I'm not clear on where the boundary lies between a Winding and Geometry. 

It's the way they are internally stored to reduce draw calls, but they are indeed similar.

I implemented the IWindingRenderer first, since that was the most painful spot, and I tailored it exactly for that purpose. The CompactWindingVertexBuffer template is specialised to the needs of fixed-size Windings, and the buffer is designed to support fast insertions, updates and (deferred) deletions. I guess it's not very useful for the other Geometry types, but I admit that I didn't even try to merge the two use cases. I tackled one field after the other, it's possible that the CompactWindingVertexBuffer can now be replaced to use some of the pieces I implemented for the lit render mode - there is another ContinuousBuffer<> template that might be suitable by the IWindingRenderer, for example.

It's very well possible that the optimisation I made for brush windings was premature and that parts of it can be handled by the less specialised structures without sacrificing much performance.

8 hours ago, OrbWeaver said:

Surface, on the other hand, I think is used for models, but in this case the backend just delegates to the Model object for rendering, rather than collating the triangles itself? Is this because models can have a large variation in the number of vertices, and trying to allocate "slots" for them in a big buffer would be more trouble than it's worth? I've never had to write a memory allocator myself so I can can certainly understand the problems that might arise with fragmentation etc, but I wonder if these same problems won't rear their heads even with relatively simple Windings.

The model object is not involved in any rendering anymore, it just creates and registers the IRenderableSurface object. The SurfaceRenderer is then copying the model vertices in the large GeometryStore - memory duplication again (the model node needs to keep the data around for model scaling). The size of the memory doesn't seem to be a problem, the data is static and is not updated very often (except when scaling, but the number of vertices and indices stays the same). The thing that makes surfaces special is their orientation, they have to be rendered one after the other, separated by glMultMatrix() calls.

Speaking about writing the memory allocator: I was quite reluctant to write all that memory management code, but I saw no escape routes for me. It must have been the billionth time this has been done on this planet. Definitely not claiming that I did a good job on any of those, but at least it doesn't appear in the profiler traces.

  • Like 1
Posted
16 hours ago, greebo said:

The objects are still calling for a coloured line shader, like <0 0 1> for a blue one. In principle, now that the vertex colour is shipped along with the geometry data, the colour distinction in the shader itself is maybe not even necessary anymore. There could be a single line shader, used to draw stuff in the orthoview.

That would be something worth profiling, for sure. I actually have no idea what is better for performance: setting a single glColor and then rendering all vertices without colours, or passing each colour per-vertex even if they are all the same colour. Perhaps it varies based on the GPU hardware.

16 hours ago, greebo said:

The main reason for this duplication is the chronological order I adjusted the renderer. I was chewing by this starting with fullbright mode, first brushes, then patches, then models, finally the visual aids like lines and points. After that I was moving forward to do the research on lit mode, and all that reflects in the code. I admit that I took this approach on purpose: when starting, I didn't have the full grasp of what is going to be necessary, I had to learn along the way (and aim for not getting burnt out half-way through). Now the full picture is available, the thing can be further improved, and the storage is probably among the first things that need to be optimised.

That's perfectly reasonable of course. I probably would have approached things the same way. Minimising divergent code paths is good for future maintainability but it doesn't need to happen right away, and can be implemented piecemeal if necessary (e.g. the Brush class still has separate methods for lit vs unlit rendering, but they can delegate parts of their functionality to a common private method).

15 hours ago, greebo said:

Yes, this is interesting. It's achievable, with some cost, of course. Right now, the Shaders themselves implement the interfaces IWindingRenderer, IGeometryRenderer and ISurfaceRenderer. A different authority could implement these interfaces, but it needs to map the objects to the Shaders somehow (likely by using a few std::maps). The renderer then calls that authority to deliver that information, this way we can separate that information.

Yes, that's what I would imagine to be the hurdle with const shaders — the mapping between Shader and objects has to happen somewhere, and if it isn't in the shader itself then some external map needs to be maintained, which might be a performance issue if relatively heavyweight structures like std::maps need to be modified thousands of times per frame.

15 hours ago, greebo said:

It's the way they are internally stored to reduce draw calls, but they are indeed similar.

I implemented the IWindingRenderer first, since that was the most painful spot, and I tailored it exactly for that purpose. The CompactWindingVertexBuffer template is specialised to the needs of fixed-size Windings, and the buffer is designed to support fast insertions, updates and (deferred) deletions. I guess it's not very useful for the other Geometry types, but I admit that I didn't even try to merge the two use cases. I tackled one field after the other, it's possible that the CompactWindingVertexBuffer can now be replaced to use some of the pieces I implemented for the lit render mode - there is another ContinuousBuffer<> template that might be suitable by the IWindingRenderer, for example.

I would certainly give consideration to whether the windings and geometry could use the same implementation, because it does seem to me that their roles are more or less the same: a buffer of vertices in world space which can be tied together into various primitive types. This is something that VBOs will handle well — it should be possible to upload all the vertex data into a single buffer, then dispatch as many draw calls using whatever primitive types are desired, making reference to particular subsets of the vertices. This could make a huge difference to performance because once the data is in the VBO, you don't need to send it again until something changes (and even then you can map just a subset of the buffer and update that, rather than refreshing the whole thing).

15 hours ago, greebo said:

The model object is not involved in any rendering anymore, it just creates and registers the IRenderableSurface object. The SurfaceRenderer is then copying the model vertices in the large GeometryStore - memory duplication again (the model node needs to keep the data around for model scaling). The size of the memory doesn't seem to be a problem, the data is static and is not updated very often (except when scaling, but the number of vertices and indices stays the same). The thing that makes surfaces special is their orientation, they have to be rendered one after the other, separated by glMultMatrix() calls.

Ah, I didn't spot the difference in coordinate spaces. That is one fundamental difference between models and other geometry which might merit keeping a separate implementation. So I guess we might end up with a TransformedMeshRenderer for models and a WorldSpacePrimitiveRenderer for everything else, or some distinction like that.

15 hours ago, greebo said:

Speaking about writing the memory allocator: I was quite reluctant to write all that memory management code, but I saw no escape routes for me. It must have been the billionth time this has been done on this planet. Definitely not claiming that I did a good job on any of those, but at least it doesn't appear in the profiler traces.

Unfortunately this is one of the times when manual memory management really is necessary: if we want to (eventually) put things in a VBO, the buffer has to be managed C-style with byte pointers, offsets and the like. I certainly don't envy you having to deal with it, but the work should be valuable because it will transition very neatly into the sort of operations needed for managing VBO memory.

  • 4 weeks later...
Posted

This looks impressive. It's really going to make designing my lighting much easier.

My missions:           Stand-alone                                                      Duncan Lynch series                              

                                      Down and Out on Newford Road              the Factory Heist

                                The Wizard's Treasure                             A House Call

                                                                                                  The House of deLisle                                                                                                  

                              

Posted

An amazing leap forward for the DR renderer.

Although all of this manual synchronisation work makes me think that it would be really nice to have some of the common code split into a DLL which could be used from DR as well as the game engine, allowing both editor and game to behave the same without needing a whole bunch of duplicated code. But of course that introduces difficulties of its own, especially when the two projects are using entirely different source control systems.

  • Thanks 1
Posted

Not only the version control system, also the data structures and coding paradigms are like from two different planets. I could merely use the engine code as rough blueprint, and it was immensely helpful for me, I could learn a lot about more modern openGL.

Speaking about sharing code, what would be really nice would be to have a plugin containing the DMAP algorithm. But from what I remember when trying to port this over to DarkRadiant years ago, that code is also tied to the decl system and the materials and even image loading. Maybe @stgatilov, having worked on dmap recently, might share some insight on whether this piece of code would be feasible to isolate and move to a DLL with a nice interface.

Having leak detection and portalisation code available to DarkRadiant would be beneficial for renderer performance too. Right now, it's completely unportalised and slow as heck.

 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recent Status Updates

    • nbohr1more

      TDM is now in the Top 100 for Mod of the Year! Please vote again: https://www.moddb.com/mods/the-dark-mod
      · 1 reply
    • snatcher

      Author of the Visible Player Hands mod: please come forth and collect a round of applause!
      · 2 replies
    • nbohr1more

      Holiday TDM Moddb article is now up: https://www.moddb.com/mods/the-dark-mod/news/happy-holidays-from-the-dark-mod
      · 0 replies
    • nbohr1more

      Cool thing: Thanksgiving break means I don't have to get my son up before dawn

      Not cool thing: My son stays up all night on my PC so I don't get much TDM time...
      · 3 replies
    • datiswous

      Does anyone know if the mission/map in this video posted by @Springheel can still be found somewhere and played? Looks like fun.
       
      · 2 replies
×
×
  • Create New...