Jump to content
The Dark Mod Forums

A New Render For Radiant (suggestion)


Recommended Posts

Ok, first things first. Here is a „small“ UML-diagramm showing the overall structure that I imagined for the renderer. Please note that this diagramm is incomplete and should be seen as a rought "sketch" of the final product. Most of the classes in this diagramm don't have all the needed Get/Set-Methods, you have to imagine them :P. Furthermore are all objects passed with call by reference, I omited the & and * chars for readability.

CLICK ME

 

As you can see it consits of 3 parts or modules:

Cullsystem, does Culling and converts nodes to rendable objects

Frontend, sorting and batching

Backend, GL-state managment

 

Some rules to make everything more solid:

neither frontend nor cullsystem contain actual rendering commands, all rendering commands are done by the backend

the backend is the only, i repeat ONLY part where OpenGL statechanges are allowed to happen

neither frontend or backend have anything to do with classes like Brush, Patch etc. front- and backend only work with dataclasses that store triangles or represent a rendable mesh, nothing more.

all openglresources are created by the backend

 

 

Ok, lets get started:

 

The Problem:

The current renderer uses an outdated codepath to send geometry to the card (vertexarrays). Vertexarrays create very high cpu load and lots of pipeline delays. Everytime you do a glDraw call the driver has to loop over the arrays to check if the data has changed. Since we have to draw every object n+1 times when it gets hit by n lights, the workload sums up very quickly.

The second problem is batching. Every draw call, even without the slow vertexarrays, is cpu intensive. This is an extreme problem in dx9 where a switch between kernel and userspace is done for each call and a moderate problem in opengl which stays in userspace all the time.

The rule of thumb is that every draw call should atleast send 200 or more triangles down the pipe, the more the better. This is a realistic value for games which deal with optimized that but very utopic for radiant.

 

The suggested solution:

Switch to vertex buffer objects and/or displaylists for vertexstorage. Vertex buffer objects are a state of the art OpenGL extension and enables the application to store any data (not just vertices) on the graphicscard. VBOs support multiple usage patterns, including „change once, draw many times“, „change multiple times, but draw many times for each change“, „one draw for each change“ etc. Displaylists are an alternative, kind of old school way to store static vertexdata. When dealing with static meshes both solutions are equivalent.

I would suggest to create a static VBO or displaylist for each static mesh and create a reference to this VBO for each instance, no vertexdata needs to be stored in sys memory then.

 

Ok, that was easy, time to deal with the real problem: number of draw calls / batching.

This is the part where the frontend kicks in, the frontend does the overall optimizations whereas the backend filters redundant state changes.

The first thing the frontend should do is sort all static objects (meshes) by renderstate. This reduces the overall costs of uneccessary shader and texture switches.

The next and more complicated job would be to fill a dynamic VBO with all brush and patch triangles needed for the current frame. The VBO should be filled in a „intelligent“ way, meaning that faces which use the same shader and get hit by at least one shared light get stored next to each other in the buffer. With this method all brush/patch-triangles still get send to the card each frame, but they only get send once and not for each draw call.

 

Small example:

Lets say we have 3 brushes and we render in textured-mode. All brushes have 2 faces with shader A, 2 with shader B and 2 with shader C.

A „per brush“ drawmechanism would do this:

foreach brush

Switch to shader A

Draw 4 triangles

Switch to shader B

Draw 4 triangles

Switch to shader C

Draw 4 triangles

 

9 draw calls and state changes

 

My mechanism instead:

Switch to shader A

Draw 12 triangles

Switch to shader B

Draw 12 triangles

Switch to shader C

Draw 12 triangles

 

3 draw calls and state changes

 

For lighting-mode things get a bit more ugly:

The Depth-pass can ge done very quick, since this shader does not require any textures, the renderer can send the complete dynamicVBO to the card in one call, very fast.

The problems start with the actual lighting of the scene. In order to do batching, faces with the same shader and the same lightinteraction (they get lit by this light) must lie next to each other in the dynamic VBO. Thats close to impossible to do in reallife scenes for all faces. However it could be done if the vertexdata is stored multiple times in the buffer, but I doubt that this waste of memory is worth the effort.

The second way to avoid this would be a renderer that uses a hybrid approach of doom3-style and deferrered rendering. I won't go into details here since I haven't thought this through to the end, but I'm very optimistic. :)

 

Ok, still have many more things to say (concerings shader handling for example) but my fingers hurt :)

Link to comment
Share on other sites

Some interesting ideas there. Couple of comments:

 

1. Personally I would vote for Display Lists rather than VBOs, since they are a tried-and-tested and widely-supported technology which will not depend on particular extensions (thus excluding older GFX cards), and also I have seen a benchmark where DLs and VBOs were compared, and although the VBOs were faster in some situations, they were dramatically slower in others (while the DLs had a consistent performance in all situations).

2. I didn't examine your design in detail, but I didn't notice any mention of how light volumes would be sorted for shadow purposes. My understanding is that rendering light-by-light is necessary to render stencil shadows, but maybe you know a better way of doing this.

 

As regards putting static meshes into DLs/VBOs, this seems like a very good idea and a fairly easy win performance-wise.

Link to comment
Share on other sites

1. Personally I would vote for Display Lists rather than VBOs, since they are a tried-and-tested and widely-supported technology which will not depend on particular extensions (thus excluding older GFX cards), and also I have seen a benchmark where DLs and VBOs were compared, and although the VBOs were faster in some situations, they were dramatically slower in others (while the DLs had a consistent performance in all situations).

VBOs have nothing to do with the gfx card, its a extension that is only driverversion dependent. In fact, even

TNT2 cards support VBOs with modernd drivers. Heres a list. VBOs are now implemented in drivers

for over a year now and are very mature and stable, even Doom3 uses them.

The big problem with displaylists is that there is absolutly no way to change a created displaylist.

If you want to change any data in it you have rebuild it from scratch. Modern drivers usually optimizes

the meshes for data locality to get better results from the TnL caches and compute a AABB for the list

for frustum culling (thats right, both nvidia and ati do automatic frustum culling with displaylists).

This means that you will get better performance for static meshes with displaylists if your sourcedata

isn't pre optimized, however this also means that creating such a displaylist is very expensive.

In my engine I exclusivly use VBOs, since all my meshes get pre optimized by my modelconverter so

displaylists wouldn't give me any benefits and I prefer to stay consistent in the way I use the API.

Don't get me wrong displaylists are cool ;) , just make sure to use them the right way.

Dlists aren't as clear in their design as VBOs, a VBOs is just a chunk of data, dlists "seem" to be a

compiled set of OpenGL-calls. Sadly thats not the case, like VBOs, dlists are only data. I see many people

who do state changes during dlist compilation. Thats horrible and will force the driver to split your

list into smaller parts, one for each state.

 

2. I didn't examine your design in detail, but I didn't notice any mention of how light volumes would be sorted for shadow purposes. My understanding is that rendering light-by-light is necessary to render stencil shadows, but maybe you know a better way of doing this.

Well for a Doom3 style renderer you have to do a lights-by-lights approach to get shadows.

I have never implemented stencil shadows, my engine uses shadowmaps, so I can't help you that

much with shadow volumes. The principle is that you render the shadowvolumes for one light into the stencilbuffer and then render everything that gets lit by that light with enabled stenciltest.

The stenciltest will discard all pixels that set are to specific value in the stencilbuffers and therefore

create lit and unlit areas.

Shadowmaps work different although the idea is fairly the same: to memorize where no lighting should be applied

Both approaches use something like this:

foreach light

prepare shadows (render shadowvolume / shadowmap)

render stuff

 

The hard part for shadowvolumes is not the actual rendering, but the generation of the shadowvolumes,

or leats say, the reasonably fast generation of these volumes out of random data ;)

Link to comment
Share on other sites

  • 2 months later...
*bump*

 

Are you still there, namespace? Any news on this?

Yes, I'm alive... kind of.

 

Things that have happened lately:

- qeradiant.com has a new webpage without any content 'cause TTimo can't find

any backups *sigh*. Anyway its a new page and I have write access, so it got a little better.

 

- Ported GtkR 1.5 to Vista. (Make sure you get gtk >= 2.8

anything below wont work corretly with Vista).

 

- GPLed GtkR 1.4. Its not out in the public repository yet, but the license change is

already done. I have some very serious problems with GtkR 1.4 and Gcc4.

GtkR uses C++ in a way that only compiles when using the "-fpermissive" flag.

I spent a hole week on it and wasn't able to get a build that doesn't crash when

loading. Guess I'll just release the source as is. GtkR 1.4 still is a very decent

source for information. (Example: I ripped Select_Inside/Touching out of it to get it back into 1.5)

 

Oh yeah, the latested and greatest GtkR 1.5 is over here:

http://zerowing.idsoftware.com/files/radia...8-GODFATHER.msi

 

Features a new gtk-theme to get consistent look-and-feel across all windows

platforms: I did this because the gtk "windows-native" theme generates drawerrors on Vista

and getting the user to change the theme is way to complicated.

 

Anyway, the previous lines were totally unrelated to your question :laugh:

 

The sad thing is that my spare time decreased even more with the new semester.

I simply can't develop for DarkRd with the same efford I put into my study, my own game and my job. Its better to be realistic now instead of having a subproject die a long death

because of illusoric ambitions. Don't get me wrong, I would love to put some work into

this project, esp. as a thief fan, but its close to impossible :( .

 

Fortunaly, its not all lost:

I already wrote a significant amount of code for a new renderer for my engine

that resembles the suggested design up to 80% (shadowmaps instead of stencilshadows,

some design improvements here and there).

All I can offer you is a complete snapshot of my renderer including cullsystem, front- and backend when its done.

My renderer uses the Doom3 Lightingmodel with a small variation (better use of the

Z-Falloffmap) so it could serve as a knowledge-source for the GtkR renderer.

If you don't want to wait for that you can start implementing the suggested design

yourself. I reevaluated it a couple of times and I'm still very certain that its the most

performant solution. Try switching to displaylists / VBOs as a first step, that alone

should give you a nice perf boost, the batching is just a bonus and no necessity right now.

Furthermore I can help you or any other dev with OpenGL related questions.

Answering a email or posting about shaders, GL extenions or GL in general is no big deal.

 

Sadly, thats all I can do for you guys.

Link to comment
Share on other sites

Well it sounds like you've got a lot going on. Any knowledge or information you can send our way will be much appreciated -- the actual coding is not a problem but neither of us have a great deal of 3D experience.

 

I was thinking of experimenting with VBOs/DLs as a first step in the model plugin -- potentially it could be a quick win if each model object held its own display list rather than submitting its geometry as a vertex array each frame.

Link to comment
Share on other sites

the actual coding is not a problem
Thats good to hear, I was afraid that I would slow down the project.

 

I was thinking of experimenting with VBOs/DLs as a first step in the model plugin -- potentially it could be a quick win if each model object held its own display list rather than submitting its geometry as a vertex array each frame.

Absolutly!

Just make sure you only put the geometrydata in to the DL and you should be fine.

I searched through the code abit and found this:

 

  glVertexPointer(3, GL_FLOAT, sizeof(WindingVertex), &winding.points.data()->vertex);

 if((state & RENDER_BUMP) != 0)
 {
   Vector3 normals[c_brush_maxFaces];
   typedef Vector3* Vector3Iter;
   for(Vector3Iter i = normals, end = normals + winding.numpoints; i != end; ++i)
   {
     *i = normal;
   }
   if(GlobalShaderCache().useShaderLanguage
())
   {
     glNormalPointer(GL_FLOAT, sizeof(Vector3), normals);
     glVertexAttribPointerARB(c_attr_TexCoord
0, 2, GL_FLOAT, 0, sizeof(WindingVertex), &winding.points.data()->texcoord);
     glVertexAttribPointerARB(c_attr_Tangent, 3, GL_FLOAT, 0, sizeof(WindingVertex), &winding.points.data()->tangent);
     glVertexAttribPointerARB(c_attr_Binormal
, 3, GL_FLOAT, 0, sizeof(WindingVertex), &winding.points.data()->bitangent);
   }
   else
   {
     glVertexAttribPointerARB(11, 3, GL_FLOAT, 0, sizeof(Vector3), normals);
     glVertexAttribPointerARB(8, 2, GL_FLOAT, 0, sizeof(WindingVertex), &winding.points.data()->texcoord);
     glVertexAttribPointerARB(9, 3, GL_FLOAT, 0, sizeof(WindingVertex), &winding.points.data()->tangent);
     glVertexAttribPointerARB(10, 3, GL_FLOAT, 0, sizeof(WindingVertex), &winding.points.data()->bitangent);
   }
 }
 else
 {
   if (state & RENDER_LIGHTING)
   {
     Vector3 normals[c_brush_maxFaces];
     typedef Vector3* Vector3Iter;
     for(Vector3Iter i = normals, last = normals + winding.numpoints; i != last; ++i)
     {
       *i = normal;
     }
     glNormalPointer(GL_FLOAT, sizeof(Vector3), normals);
   }

   if (state & RENDER_TEXTURE)
   {
     glTexCoordPointer(2, GL_FLOAT, sizeof(WindingVertex), &winding.points.data()->texcoord);
   }
 }
 glDrawArrays(GL_POLYGON, 0, GLsizei(winding.numpoints));

This horrible chunk of code renders the windings, the polygon representation of brushes.

I didn't find the code which deals with the real models (hint please) but I guess

its very similar to this.

Lets go trough the code to see what must be done to make it fit into a displaylist

and where performance wins can be achieved:

 

The first problem I see here is this:

Vector3 normals[c_brush_maxFaces];
   typedef Vector3* Vector3Iter;
   for(Vector3Iter i = normals, end = normals + winding.numpoints; i != end; ++i)
   {
     *i = normal;
   }

This means that _everytime_ the a brush gets rendered in bumpmode, an array of vectors is created on the stack (slow) and radiant iterates over many of these vectors (sloooow).

If a brush gets hit by n lights, it will be rendered n+1 times (first one is the depth pass, the others are the lighting-passes), so yeah thats very unperformant.

This code is also very dangerous, the array will be destroyed when the execution

exits its scope. The reason why this code doesn't lead to a segmentation fault is

that OpenGL actually copies all the contents of the arrays into internal buffers

during the glDraw-call.

 

Ok, so why is this code there?

The problem is, once again, that GtkR uses vertex arrays. Without vertex arrays

this hole problem would not arise:

OpenGL is a statemachine. That means that everytime you set a state, it will

stay in that state unless you change it (OpenGL does not change its state it self,

all statechanges are exlicit).

So in theory, when rendering a tesselated plane, setting the normal once

with glNormal3f() would be enough, since all vertices on the plane have the same normal.

The reason why a hole array is build is that OpenGL expects for all

array-inputs an array of length n, where n is the number of vertices you render.

Thats my this one normal has be to expanded into an array.

 

Ok moving along:

if(GlobalShaderCache().useShaderLanguage

())

This statement is always false since the GLSL code got removed from gtkr.

which means that the else part is executed:

else
   {
     glVertexAttribPointerARB(11, 3, GL_FLOAT, 0, sizeof(Vector3), normals);
     glVertexAttribPointerARB(8, 2, GL_FLOAT, 0, sizeof(WindingVertex), &winding.points.data()->texcoord);
     glVertexAttribPointerARB(9, 3, GL_FLOAT, 0, sizeof(WindingVertex), &winding.points.data()->tangent);
     glVertexAttribPointerARB(10, 3, GL_FLOAT, 0, sizeof(WindingVertex), &winding.points.data()->bitangent);
   }

Ok, whats that?

With the introduction of shaders, OpenGL needed a way to pass additional data to

the current shader. This special data is called "vertex attribute". As the name implies,

its a per vertex attribute which can consist of multiple variables.

The code above binds arrays as input to various vertexattributes:

glVertexAttribPointerARB(11, 3, GL_FLOAT, 0, sizeof(Vector3), normals)

The parameters are:

The attributeindex or id, the number of variables for this attribute, the type of the variables, a boolean which indicated if the data should be normalized, the stride and the pointer.

This small excerpt from the shader shows how this vertexattribute is later accessed:

MOV result.texcoord[6].xyz, vertex.attrib[11];

 

To sum it up: normal, texcoords, tangent and bitangent are send as vertex attributes to the card when rendering in bump mode.

The next call would then be

glDrawArrays(GL_POLYGON, 0, GLsizei(winding.numpoints));

 

Two things make this call slow:

- The use of GL_POLYGON

- No indexing

 

GL_POLYGON means that the input data can consist of a huge n sided polygon.

Since graphicscards can only render triangles the driver has to triangulate the

input data. Winding polygons are always convex, the triangulation of

a convex polygon is a very trivial task and can be done very fast, so it should

been done by gtkr before passing the data to GL.

The driver in contrast can't know that the input data is convex, it will use

a algorithm which can triangulate non convex polyons. This algorithm is not trivial

and a lot slower.

 

So to summerize:

When drawing just one winding in bumpmode,

every normal is touched n times, the complete buffers

get copied in RAM at least once and each vertex will be processed

multiple times by the cpu for triangulation. Very very bad.

 

Can it get worse? Yes it can!

My previous run through the code assumed that we are rendreing in bumpmode,

lets see what happens in normalmode.

At first, we generate a normal array as we did in bumpmode.

But instead of using vertex attributes, we now use the old common

array inputs:

glNormalPointer(GL_FLOAT, sizeof(Vector3), normals);

The reason for that is that radiant turns shaders off when rendering

in the old-school lighting mode.

Without shaders there are no vertex attributes (remember, they are part of the shader-extension).

 

This means that radiant actually uses two different vertexformats, one for bumpmode

and one for lightingmode.

As a conclusion we have to generate _two_ displaylists.

 

Here is some pseudo code that would generate these lists:

glNewList(m_draw_bump_list, GL_COMPILE);
glBegin(GL_POLYGON);
glVertexAttrib3fARB(11, plane->normal.x, plane->normal.y, plane->normal.z);
foreach vertex v
{
	glVertexAttrib2fARB(8, v->texcoord.s, v->texcoords.t);
	glVertexAttrib3fARB(9, v->tangent.x, v->tangent.y, v->tangent.z);
	glVertexAttrib3fARB(10, v->bintangent.x, v->bitangent.y, v->bitangent.z);
	glVertex3f(v->x, v->y, v->z);
}
glEnd();
glEndList();



glNewList(m_draw_lighting_list, GL_COMPILE);
glBegin(GL_POLYGON);
glNormal3f(plane->normal.x, plane->normal.y, plane->normal.z);
foreach vertex v
{
	glTexCoord2f(8, v->texcoord.s, v->texcoords.t);
	glVertex3f(v->x, v->y, v->z);
}
glEnd();
glEndList();

I too use GL_POLYGON here, but its not that bad as it is in the code above.

The vertexdata will be triangulated just once during the build process of the list,

not once per rendering.

Furthermore is this code just an example, its not very clever to generate displaylists

for each winding since every brush consits of multiple windings.

 

What a long post, anyway I hope it was of use. ^_^

Link to comment
Share on other sites

Thanks, that's very illuminating.

 

The model rendering code in DarkRadiant is in RenderablePicoSurface in the model plugin (this is different from GtkR). Each model object consists of one or more such surfaces, each of which has its own material shader. In this case the data is already triangulated and is just sent to the GPU:

 

	// Use Vertex Arrays to submit data to the GL. We will assume that it is
// acceptable to perform pointer arithmetic over the elements of a 
// std::vector, starting from the address of element 0.

if(flags & RENDER_BUMP) {
	// Bump mode, we are using an ARB shader so set the correct parameters
	glVertexAttribPointerARB(
		ATTR_TEXCOORD, 2, GL_FLOAT, 0, 
		sizeof(ArbitraryMeshVertex), &_vertices[0].texcoord);
	glVertexAttribPointerARB(
		ATTR_TANGENT, 3, GL_FLOAT, 0, 
		sizeof(ArbitraryMeshVertex), &_vertices[0].tangent);
	glVertexAttribPointerARB(
		ATTR_BITANGENT, 3, GL_FLOAT, 0, 
		sizeof(ArbitraryMeshVertex), &_vertices[0].bitangent);
	glVertexAttribPointerARB(
		ATTR_NORMAL, 3, GL_FLOAT, 0, 
		sizeof(ArbitraryMeshVertex), &_vertices[0].normal);
}
else {
	// Standard GL calls
	glNormalPointer(
		GL_FLOAT, sizeof(ArbitraryMeshVertex), &_vertices[0].normal);
	glTexCoordPointer(
		2, GL_FLOAT, sizeof(ArbitraryMeshVertex), &_vertices[0].texcoord);
}

// Vertex pointer is invariant over bump/nobump render modes
glVertexPointer(
	3, GL_FLOAT, sizeof(ArbitraryMeshVertex), &_vertices[0].vertex);

// Draw the elements
glDrawElements(GL_TRIANGLES, _nIndices, GL_UNSIGNED_INT, &_indices[0]);

 

 

Again, as you say two DLs will be needed for bump and non-bump modes, however this should be pretty easy to manage, particularly since every Instance of the same model uses the same RenderablePicoModel object, meaning that the surfaces themselves could maintain their own displaylists which never need to change. I was thinking that when a RenderablePicoSurface is constructed, it compiles the display list after it is copied the model data into its internal arrays, and in future when its render() method is called the DL will be called instead of the normal vertex array code.

 

The brush code I'm guessing will need slightly more managing code to handle the situation where the brush changes -- obviously the old DLs will need to be released and new ones calculated. It doesn't surprise me in the least that the existing brush code is that crap -- that fixed-size stack-allocated array is just ghastly.

Link to comment
Share on other sites

I changed the RenderablePicoSurface code to create display lists, but it keeps crashing (actually on the second model creation -- the first one works but there is no material on the displayed model). Is there some problem with using glDrawElements() inside a Display List compile stage? Most of the examples seem to use standard glBegin()/glEnd() calls to create DLs.

Link to comment
Share on other sites

I've already taken a look at the winding rendering code a while ago and could figure out the part with normals and binormals but I really had no idea how to start to optimise this bit - so thanks for this explanation, this is appreciated!

Link to comment
Share on other sites

All data must be passed in immediate mode to displaylists, so you always need glBegin()/glEnd().

 

Ah, I guess that explains the crashes. I was hoping I could just wrap the existing render code in a conditional DL compilation stage, but apparently I will have to re-write this to use immediate mode instead.

Link to comment
Share on other sites

Well I've now implemented Display Lists for model rendering, and it certainly does feel faster to me. Especially in the Bonehoard entrance where all of those rocks are -- there is no obvious slowdown even when all of the rocks are visible.

 

I haven't actually done any benchmarks though.

Link to comment
Share on other sites

I really had no idea how to start to optimise this bit

Brushes are very unoptimizable in terms of rendering.

 

I would like to rule out displaylists for them since building these dlists is quite expensive

and abit of overkill for an object that consits of only ~12 triangles.

If you try to be nice to the driver (i.e. only geometry in dlists) then you would have

to build two dlists for each winding, where each winding just holds 2 triangles (at least for cubes). :wacko:

 

With VBOs, things would look a bit better:

It would be possible to build just one VBO for each brush.

Thats still very silly but a modern driver should be able to handle 2000+ vbos.

So its at least realistic to give it a try and see what happens.

 

Quick introduction to VBOs:

What are they?

Basically VBOs are the fast version of vertexarrays (although you can do a lot more with them, but thats not of concern right now). The big difference to common vertex arrays is that

VBOs are stored on the graphics card, not the system memory.

This increases the rendering speed but comes with the disadvantage that the

application can' modify the buffer directly, it has to use OpenGL functions for that.

Furthermore there are two types of buffer objects, objects for vertices and objects for indices.

 

Lets create buffer object for vertexdata and bind it:

uint id;
glGenBuffers(1, &id);
glBindBuffer(GL_ARRAY_BUFFER, id);

GL_ARRAY_BUFFER indicates that its a buffer for vertices,

a buffer for indices would require the parameter GL_ELEMENT_ARRAY_BUFFER.

Now lets fill it with data:

glBufferData(GL_ARRAY_BUFFER, size_of_data_in_bytes, pointer_to_data, GL_STATIC_DRAW);

The first parameter should be very self explanatory. The last parameter indicates

the usage pattern OpenGL has to expect for this VBO.

GL_STATIC_DRAW means that the data will be specified once, will never be requested by the CPU and will be used many times for rendering.

There are alot of other usage-pattern for things like streaming etc, but lets keep it simple for now.

Now we have a VBO which is filled with data and is bound.

By binding a VBO we change the semantics of the gl*Pointer functions.

With vertexarrays, these functions take a pointer into system memory, with

VBOs they take offsets into the currently bound VBO.

This implies that the current VBO will be the source for ALL our geometry data during

a glDraw-call. This is a important change compared to vertex arrays.

 

Lets assume I would like to render a VBO with this interleaved memory layout:

(Interleaved formats are faster then seperated formats since they increase cache hits.)

 

[x,y,z][s,t][nx,ny,nz][x,y,z][s,t][nx,ny

,nz]...

 

With this format the gl*Pointer calls would look like this:

#define BUFFER_OFFSET(i) ((char *)NULL + (i))
const uint vertex_size = sizeof(float) * (3+2+3);

glTexCoordPointer(2, GL_FLOAT, vertex_size, BUFFER_OFFSET(sizeof(float)*3) );
glNormalPointer(GL_FLOAT, vertex_size, BUFFER_OFFSET(sizeof(float)*5) );
glVertexPointer(3, GL_FLOAT, vertex_size, BUFFER_OFFSET(0) );

The BUFFER_OFFSET macro may be ugly, but its part of the VBO specification, so I

recommend using it. As you can see we now specify offsets instead of pointers as input.

 

The glDrawArray() call remains unchanged.

 

Two more things:

If you want to switch back to normal vertexarrays, bind the VBO with id 0:

glBindBuffer(GL_ARRAY_BUFFER, 0);

This switches to the old gl*Pointer semantics.

 

If a VBO isn't needed anymore, delete it with:

glDeleteBuffers(1, &id);

 

This is the most simple possible usage pattern of VBOs, things that I missed are:

Buffers for Indices and changing part of a already existing VBO.

A full coverage of that topic can be found here:

VBO specification

OpenGL specs tend to be huge and most readers get lost in the uninteresting "Issues" section. Make sure to search for "New Tokens", "New Procedures" and "Usage Examples"

to get to the interesting stuff.

 

Anyway, the key to have only one VBO for each brush is to use an interleaved memorylayout:

[position0][texcoords0][normals0][tangen

t0][bitangent0]

[position1][texcoords1][normals1][tangen

t1][bitangent1]

 

A VBO that uses this layout could be used for bump-rendering and lighting-rendering.

Link to comment
Share on other sites

That's very interesting, thanks for that. I had no idea that VBOs worked by changing the existing semantics of ordinary calls, I assumed that they had their own separate API.

 

Regarding the optimisability of brushes, I was thinking that it might be possible instead of creating a VBO/DL for each brush, to create one for the entire set of geometry using a given shader. This ought to be possible because of the way the current back-end groups geometry by shader -- instead of iterating through all of the Renderable objects in each "material bucket", the bucket could instead create its own DL and call this instead.

 

The difficulty would be in detecting when this DL needs to be re-created, since at the moment there is no distinction between "render the same scene from a different viewpoint" and "the content of the scene has changed". A new flag would need to be added to control this, so that changing geometry would cause the DLs to be re-generated but simply moving the camera or scrolling (which is what the user notices as being slow) would not.

Link to comment
Share on other sites

Please don't consider dlists as something that can be recreated frequently.

Thats not their intended usage pattern, use dynamic/streaming vbos for that.

 

The "global buffer"-solution for all brushes/for each material

is possible, but is a lot of work to get right.

These buffers must be resized when running ot ouf space, or defragmented when a brush gets deleted. When changing a material, the faces have to be deleted from the one

and created in the other buffer etc.

 

I would favor the simple and abit stupid solution as a first step.

Link to comment
Share on other sites

I would favor the simple and abit stupid solution as a first step.

 

It might be a good idea to start with patches in that case -- these are likely to have a few more triangles than brushes, so might benefit more from optimisation.

Link to comment
Share on other sites

The Bonehoard map is a good stress-test, as well as Dram's mansion map. I would also like to add some basic numerical profiling to the render code, so it is obvious where the bottlenecks are (I am certain there are many important optimisations that can be made that don't necessarily relate to GL calls).

Link to comment
Share on other sites

There is some sort of benchmark method in the CamWnd code (CamWnd::benchmark()), but it is unreferenced currently. It basically measures the time of a 360° turn of the camera. Not the best benchmark but at least it's something.

Link to comment
Share on other sites

I probably should have said "profile" rather than "benchmark" -- what I mean is, we need some code to provide millisecond printouts of how long each stage of the render takes, e.g. to traverse the scenegraph, add renderable objects, create the batches, do the GL calls etc. This makes it possible to concentrate on the worst offender rather than guessing and optimising the wrong thing.

Link to comment
Share on other sites

I probably should have said "profile" rather than "benchmark" -- what I mean is, we need some code to provide millisecond printouts of how long each stage of the render takes, e.g. to traverse the scenegraph, add renderable objects, create the batches, do the GL calls etc. This makes it possible to concentrate on the worst offender rather than guessing and optimising the wrong thing.

Profiling a renderer is tricky.

The problem is that objects aren't draw when the application issues a glDraw command.

The driver caches all your rendering commands and state changes as long as possible.

There are a couple of operations which force OpenGL to flush this cache and execute your renderings:

glFlush() (non blocking)

glFinish() (blocking)

backbuffer swapping (blocking)

reading texels from the screensurface (should be non blocking when reading into gpu-ram, blocking otherwise)

 

There may be some more, which I have forgotten.

 

Anyway my point is that by measuring the time of all your Draw commands,

you measure the times spend to process and send these calls to the driver not

the actual rendering time.

 

Measuring your glcommands and doing a seperate measure for the backbuffer flip

should give more detailed informations.

Link to comment
Share on other sites

That's precisely what I want -- to profile the front end processing as well. No doubt the GL calls can be optimised as well, but I would be very surprised if there isn't a whole load of unnecessary stuff going on in the first stages.

Link to comment
Share on other sites

That's precisely what I want

Ok, I missunderstood your first post then :blush:

 

These two applications would come in very handy when trying to improve

gtkrs OpenGL-calls:

 

GLIntercept: http://glintercept.nutty.org/

gDebugger: http://www.gremedy.com/

 

gDebugger is way more advanced then GLIntercept, have a look at the screens.

The downside is, that its a commercial, non open source app.

 

The most usefull feature of both apps is the glcall-logfile, it can be generated

very fast at no cost and without modifing the application.

Its way faster to analylize gl-rendering in this logfiles then it is

through browsing code.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recent Status Updates

    • Ansome

      Finally got my PC back from the shop after my SSD got corrupted a week ago and damaged my motherboard. Scary stuff, but thank goodness it happened right after two months of FM development instead of wiping all my work before I could release it. New SSD, repaired Motherboard and BIOS, and we're ready to start working on my second FM with some added version control in the cloud just to be safe!
      · 0 replies
    • Petike the Taffer  »  DeTeEff

      I've updated the articles for your FMs and your author category at the wiki. Your newer nickname (DeTeEff) now comes first, and the one in parentheses is your older nickname (Fieldmedic). Just to avoid confusing people who played your FMs years ago and remember your older nickname. I've added a wiki article for your latest FM, Who Watches the Watcher?, as part of my current updating efforts. Unless I overlooked something, you have five different FMs so far.
      · 0 replies
    • Petike the Taffer

      I've finally managed to log in to The Dark Mod Wiki. I'm back in the saddle and before the holidays start in full, I'll be adding a few new FM articles and doing other updates. Written in Stone is already done.
      · 4 replies
    • nbohr1more

      TDM 15th Anniversary Contest is now active! Please declare your participation: https://forums.thedarkmod.com/index.php?/topic/22413-the-dark-mod-15th-anniversary-contest-entry-thread/
       
      · 0 replies
    • JackFarmer

      @TheUnbeholden
      You cannot receive PMs. Could you please be so kind and check your mailbox if it is full (or maybe you switched off the function)?
      · 1 reply
×
×
  • Create New...