Jump to content
The Dark Mod Forums

Some more profiling


Recommended Posts

I discovered a nice profiler called oprofile while investigating performance issues at work, and decided to do some more profiling of DR using this tool. It is much better than gprof because (1) it does not require the application to be recompiled, which is easier to use and also allows you to switch on and off profiling whenever you want during a session, and (2) it profiles everything including shared libraries, libc and kernel.

 

Right off the bat I found this pretty big whoops:

 

void Doom3Group::renderWireframe(Renderer& renderer, const VolumeTest& volume, 
const Matrix4& localToWorld, bool selected) const 
{
renderSolid(renderer, volume, localToWorld, selected);
#ifdef REALLY_SLOW_RENDER
if (GlobalRegistry().get("user/ui/xyview/showEntityNames") == "1") {
	renderer.addRenderable(m_renderName, localToWorld);
}
#endif
}

 

The #ifdef'd out portion is doing an XML registry findPath() every frame for every Doom3Group -- removing this code increased the perceived pan speed of the entirety of askave.map from about 2 fps to about 4 fps (200%).

 

I will post any more interesting findings here. oprofile is open source and available in major Linux distributions (Ubuntu and CentOS at least) if anybody want to try it out.

Link to comment
Share on other sites

Yes, cool find, we need to replace that for sure. :)

 

On a related note, I recently replaced all GlobalFilterSystem() queries from the Face/Patch render methods, which were rather slow. This increased the render time in the bonehoard with cubic clipping off from ~240 msec to 120 msec. :)

Link to comment
Share on other sites

The std::string copy constructor was showing up a lot in one of my profiles. Notice how in the second case the string is being returned unchanged, but the whole thing is copied at the start nonetheless:

 

std::string RegistryTree::prepareKey(const std::string& key) {
std::string returnValue = key;

if (returnValue.length() == 0) {
	// no string passed, return to sender
   return returnValue;
}
else if (returnValue[0]=='/') {
	// this is a path relative to root, don't alter it
	return returnValue;
}
else {
	// add the prefix <darkradiant> and return
	returnValue = std::string("/") + _topLevelNode + std::string("/") + key;		return returnValue;
}
}

 

I have fixed this by not copying the string except in the final return statement where a new string is constructed.

 

Regarding the XMLRegistry::get() called during render, there are several things that call it:

 

  4		 0.0275  entity.so				entity::SpeakerNode::renderWireframe(Renderer&, VolumeTest const&) const
 6		 0.0412  darkradiant			  RadiantModule::getLocalPixbuf(std::string const&)
 14		0.0961  entity.so				entity::EclassModelNode::renderWireframe(Renderer&, VolumeTest const&) const
 22		0.1510  entity.so				entity::GenericEntityNode::renderWireframe(Renderer&, VolumeTest const&) const
 26		0.1785  entity.so				entity::LightNode::renderWireframe(Renderer&, VolumeTest const&) const
 470	   3.2269  entity.so				entity::Speaker::renderWireframe(Renderer&, VolumeTest const&, Matrix4 const&) const
 2527	 17.3498  entity.so				entity::EclassModel::renderWireframe(Renderer&, VolumeTest const&, Matrix4 const&, bool) const
 2968	 20.3776  entity.so				entity::GenericEntity::renderWireframe(Renderer&, VolumeTest const&, Matrix4 const&) const
 3282	 22.5335  entity.so				entity::GenericEntity::renderArrow(Renderer&, VolumeTest const&, Matrix4 const&) const
 5246	 36.0179  entity.so				entity::Light::renderWireframe(Renderer&, VolumeTest const&, Matrix4 const&, bool) const
485	   0.0535  xmlregistry.so		   XMLRegistry::get(std::string const&)
 12129	52.7302  xmlregistry.so		   XMLRegistry::findXPath(std::string const&)
 4851	 21.0895  xmlregistry.so		   gtkutil::IConv::localeFromUTF8(std::string const&)
 2277	  9.8991  libstdc++.so.6.0.7	   (no symbols)

 

The unindented line is the profiled function, those above are the callers and below are the callees. I guess they should all be refactored into an internal boolean which is updated by a Registry key observer when the value is changed, but I'm not feeling in a refactoring mood right now -- shall I just #ifdef them out with an easy-to-find name so we can fix them up later?

Link to comment
Share on other sites

Regarding the XMLRegistry::get() called during render, there are several things that call it:

 

The unindented line is the profiled function, those above are the callers and below are the callees. I guess they should all be refactored into an internal boolean which is updated by a Registry key observer when the value is changed, but I'm not feeling in a refactoring mood right now -- shall I just #ifdef them out with an easy-to-find name so we can fix them up later?

You can report it on the tracker and I'll get it sooner or later - for your local profile builds it probably makes sense to just #IFDEF it out for the moment being so that you can find the other performance drains without being distracted.

Link to comment
Share on other sites

I have added it to the tracker, but assigned it to myself since I am happy to look into this and don't want to increase your workload unnecessarily.

 

Once of the limitations with the RegistryKeyObserver we discussed a long time ago was the fact that the observer doesn't necessarily know which key was changed.

 

class RegistryKeyObserver {
public:
virtual void keyChanged() = 0;
};

 

It would be more flexible to change it to this:

 

class RegistryKeyObserver {
public:
virtual void keyChanged(const std::string& changedKey, const std::string& newValue) = 0;
};

 

Do you see any problems with this approach (besides the obvious need to crawl through the code and update the method)?

Link to comment
Share on other sites

Well I have done the aforementioned refactoring and the speed improvement is noticeable. There is still a GlobalRegistry call in GenericEntity::renderArrow() but this represents a relatively small proportion of the time.

 

The new profile shows much more sensible results:

 

  948	   1.2124  md5model.so			  md5::MD5Surface::render(unsigned int) const
 998	   1.2763  entity.so				entity::light_draw(AABB const&, unsigned int)
 1005	  1.2853  darkradiant			  OpenGLModule::assertNoErrors()
 1545	  1.9759  entity.so				arrow_draw(BasicVector3<double> const&, BasicVector3<double> const&)
 1581	  2.0219  darkradiant			  XYWnd::drawGrid()
 1890	  2.4171  darkradiant			  OpenGLModule::drawString(std::string const&) const
 9485	 12.1302  darkradiant			  aabb_draw_wire(BasicVector3<double> const*)
 10703	13.6879  model.so				 model::RenderablePicoSurface::render(unsigned int) const
 10880	13.9143  darkradiant			  OpenGLStateBucket::flushRenderables(OpenGLState&, unsigned int, BasicVector3<double> const&)
 12879	16.4708  darkradiant			  RenderableWireframe::render(unsigned int) const
 22467	28.7328  darkradiant			  RenderablePatchWireframe::render(unsigned int) const
142294   14.2704  fglrx_dri.so			 (no symbols)
 142294   100.000  fglrx_dri.so			 (no symbols) [self]

 

Unsurprisingly it is patch rendering which is highest on the list. The method RenderablePatchWireframe::render() is making use of glDrawArrays() which is not particularly fast compared to Display Lists or VBOs, so I suspect that there is considerable scope for improvement here.

Link to comment
Share on other sites

  • 9 months later...

I'm having some difficulties getting oprofile to run. Do I have to compile a debug build for this?

 

I'm starting this shell script to gain profile information:

 

sudo opcontrol --start

sudo opcontrol --reset

./darkradiant

sudo opcontrol --dump

sudo opcontrol --stop

 

But this doesn't produce anything. When typing opreport -l I don't see any darkradiant in there (tried grepping that output).

 

Any hints what I'm missing here?

Link to comment
Share on other sites

It's a long time since I've used oprofile so I can't remember off the top of my head what is necessary to get it working, however

 

1. Yes you will need a debug build if you want useful symbol information, HOWEVER make sure you compile with at least -O1 as well so that you are profiling with useful optimisations in place, rather than wasting time on things which the compiler will optimise anyway.

2. I don't ever remember having to run opcontrol --dump; I just started DarkRadiant, got to where I wanted to start profiling, then in a separate terminal ran opcontrol --start, do the action then opcontrol --stop.

3. You don't get any output from oprofile automatically, it places the information in a database which you then access through other tools, I think opreport is the correct one here. There is also a config tool called something like oprof_config which sets up what oprofile will monitor, there are quite a few options in here and it may not be set up correctly to start with.

Link to comment
Share on other sites

Hm ok. In the meantime I went to VC++'s vsperfcmd command line profiler which is also quite useful (and rather easy to use). After two hours of messing around with oprofile I was losing interest :) - vsperfcmd took me only 10 minutes or so to figure it out. I'll try again sometime.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recent Status Updates

    • OrbWeaver

      Does anyone actually use the Normalise button in the Surface inspector? Even after looking at the code I'm not quite sure what it's for.
      · 5 replies
    • Ansome

      Turns out my 15th anniversary mission idea has already been done once or twice before! I've been beaten to the punch once again, but I suppose that's to be expected when there's over 170 FMs out there, eh? I'm not complaining though, I love learning new tricks and taking inspiration from past FMs. Best of luck on your own fan missions!
      · 4 replies
    • The Black Arrow

      I wanna play Doom 3, but fhDoom has much better features than dhewm3, yet fhDoom is old, outdated and probably not supported. Damn!
      Makes me think that TDM engine for Doom 3 itself would actually be perfect.
      · 6 replies
    • Petike the Taffer

      Maybe a bit of advice ? In the FM series I'm preparing, the two main characters have the given names Toby and Agnes (it's the protagonist and deuteragonist, respectively), I've been toying with the idea of giving them family names as well, since many of the FM series have named protagonists who have surnames. Toby's from a family who were usually farriers, though he eventually wound up working as a cobbler (this serves as a daylight "front" for his night time thieving). Would it make sense if the man's popularly accepted family name was Farrier ? It's an existing, though less common English surname, and it directly refers to the profession practiced by his relatives. Your suggestions ?
      · 9 replies
    • nbohr1more

      Looks like the "Reverse April Fools" releases were too well hidden. Darkfate still hasn't acknowledge all the new releases. Did you play any of the new April Fools missions?
      · 5 replies
×
×
  • Create New...