I don't like that you are talking about performance of rendering as a binary thing: engine either renders an object or not.
It is much more complex. Aside from draw calls (which are very important), there are other things involved.
For instance, pixel fill rate (think about pixels you have to fill with data) and other read/writes to memory.
Also, the complexity of the per-pixel visual effects (which are done in pixel = fragment shaders).
So it is all more complicated: the engine is big, and it is very hard to fully understand all the performance details.
First I'll try to correct some misunderstanding here (well, at least I think so):
- An entity cannot be culled out only because it is occluded by some set of entities. So if there is a candle behind a crate (or several crates) in a room, the candle will be rendered regardless of crates presence. In my opinion, there is no efficient way to check that the candle is fully behind a crate or a barrel, that's why I think so.
- Quake 1 really has software renderer with zero overdraw, but Doom 3 renderer does not have. If engine renders several surfaces which are located on the same place on the screen (and thus occlude each other), then even occluded pixels would take some time to render. However, Doom 3 performs "early depth pass": it renders all the geometry without any lighting/coloring/texturing at the very beginning of the frame to produce depth map. After that it renders everything again, but all the complex math for lighting/texturing (i.e. fragment shader) is done only for the pixels which are actually visible and not occluded by anything. For the pixels which are occluded, only depth calculation and comparison is done (which is much cheaper than full rendering). So you pay for visible pixels, and pay much less for occluded pixels.
As you see, the only way to not pay for triangles and draw calls is to cull the surface completely in frontend.
This is what I'll try to explain now: let's go back to portals culling.
Let's call arbitrary convex polygon on screen "winding". For instance, every portal that you see with r_showPortals is rectangular, and its winding (which is a 2D thing) is a 4-gon.
Every area can be seen by player through a sequence of portals (let's call it "portal path"). For the area where the player is located there are zero portals in this sequence.
Given a portal path, you can intersect the windings of its portals, and you'll get a polygonal winding through which you see the area at the end of the path.
For instance, on this picture the outside area is visible through two portals, which together yield 5-gonal winding (marked by orange color):
Now the main rules are:
- If the windings of the portal path have empty intersection, then it is culled out.
- if one of the portals is sealed (maybe closed door), then the portal path is culled out.
If all portal paths leading to an area are culled out, then it is not rendered and you don't pay for it.
If there is a single remaining portal path into an area, then its winding (recall that it is intersection of path's portals) is used to cull out the entities in the area: if an entity's bounding box is not visible through the winding, then entity is not rendered (so you do not pay for it). If there are several portal paths leading into an area, an entity is drawn only if its bbox gets into at least one winding (I think so). Unfortunately, I cannot say how the winding of the portal path affects the world geometry, but I can imagine that surfaces (sorry, I don't know if "surface" is a known concept/term in DR) which are surely not visible through the winding are culled out too.
This is not over yet. There is also an OpenGL thing called "scissor test", which allows to skip rendering of pixels outside of a specified (axis-aligned) rectangle on the screen. Doom3 uses it heavily: when rendering an area, it sets scissor rectangle to be minimal rectangle bounding the windings of all the portal paths leading to the area (usually one portal path leads to an area, but not always).
As you see, even portals which can be seen through by player (drawn as green when you enable r_showPortals) can help a lot with culling. Note that the description above explains the final effect of portals culling, but does not exactly describe how the implementation works internally. Also, I think that the overhead introduced by portals is pretty low: you should not be afraid that setting 2x more portals would waste CPU time (I suppose so).
Note that there are also shadows, which are culled by different code. For each shadow-casting light, a similar procedure is started from the light itself. It does not matter whether portal is sealed or not here. The algorithm for culling by portal path seems to be completely different in this case, but it also takes into account the fact that even visible portals limit the range through which the light goes.
Having said all that, I'd like to ask a question: has anyone seen a case when adding a portal (or several portals) seriously reduces performance?