Jump to content
The Dark Mod Forums

Recommended Posts

1 hour ago, stgatilov said:

UPDATE: Given that backend most of the time generates machine code for the template instantiations, I bet the real slowdown can be even 2.5-3 times. If someone remembers when Eigen was integrated, it can be checked directly.

I think 356dfb4e05f1ff6ea7f570376e6a2b4692ad581a was the commit that didn't have Eigen yet. I merged it in the commit right after that.

Link to post
Share on other sites

I think some more context is needed here.

Is the build noticeably slow (in terms of wall clock time), and impacting development, or is this purely a theoretical concern based on profiling data?

Assuming that the data correctly identifies Eigen template compilation as being slow, does this affect all kinds of change, or does the slowdown only happen when you change something fundamental like the Matrix4 header?

I'd be perfectly happy to use pre-compiled headers to avoid compilation cost (especially with the maths classes which don't change very much), but I have no experience with this technique (I think it's largely a Windows thing).

Link to post
Share on other sites
20 minutes ago, OrbWeaver said:

Is the build noticeably slow (in terms of wall clock time), and impacting development, or is this purely a theoretical concern based on profiling data?

Given that I do not work with DR on daily basis, for me all DR-related concerns are purely theoretical.
Compared to some bad projects on my daily job (hint: better do not combine templates with automatic code generation) it is fast anyway.
Do not consider this post as a complaint, better treat it as an interesting piece of information. It is up to you what to do with it.

I will measure wall time with and without Eigen at the moment when it was integrated.

Quote

Assuming that the data correctly identifies Eigen template compilation as being slow, does this affect all kinds of change, or does the slowdown only happen when you change something fundamental like the Matrix4 header?

I measured time of full clean build.

Of course, more typical incremental builds should be much faster, but:

  1. Since vector math is almost everywhere, it would be correct to expect incremental compilation to become slower by about the same ratio.
  2. Linking time becomes a bigger problem for incremental builds. Many template instantiations make it slower too. One indirect way to estimate it is too look at the total size of .obj files.
Quote

I'd be perfectly happy to use pre-compiled headers to avoid compilation cost (especially with the maths classes which don't change very much), but I have no experience with this technique (I think it's largely a Windows thing).

Precompiled headers will win back at most 10% of the time (at most 3% if only considering Eigen).
The real problem is template instantiations everywhere.

The only way to fix it is:

  1. Return back simple vector/matrix classes with trivial implementations of simple arithmetic operations straight in header (they should be inlineable).
  2. Include Eigen headers in only one cpp file and use it to implement whatever complicated operations you need. Expose such operations to headers as non-inlineable methods.

In other words, either don't use Eigen, or set a compilation firewall between it and the rest of the codebase.
Without compilation firewall, you won't get rid of the slowdown.

  • Thanks 1
Link to post
Share on other sites

Measured it with timewatch.
Full build took 2:41 before Eigen, and took 3:15 after Eigen.
I guess I should dig deeper what the numbers mean 🥺

  • Haha 1
Link to post
Share on other sites
25 minutes ago, stgatilov said:

Given that I do not work with DR on daily basis, for me all DR-related concerns are purely theoretical.
Compared to some bad projects on my daily job (hint: better do not combine templates with automatic code generation) it is fast anyway.
Do not consider this post as a complaint, better treat it as an interesting piece of information. It is up to you what to do with it.

Fair enough. Thanks for going to the trouble of producing the analysis.

25 minutes ago, stgatilov said:

Precompiled headers will win back at most 10% of the time (at most 3% if only considering Eigen).

The real problem is template instantiations everywhere.

Presumably the problem is the complexity of the templates, rather than the mere existence of templates? After all our original Vector classes were already templated on the element type, although we only ever instantiated them with <double>. Eigen's templates are considerably more complicated with their MatrixBase, DenseBase and other helper parent classes.

25 minutes ago, stgatilov said:

The only way to fix it is:

  1. Return back simple vector/matrix classes with trivial implementations of simple arithmetic operations straight in header (they should be inlineable).

It's odd that there isn't a way to tell the compiler "instantiate this complex template once, then use it everywhere else as a simple inlined class". I'm pretty sure that once all the helper templates are processed, the Eigen code must reduce to the same sequence of basic multiplications; you'd think there would be a way to get to the end point without having to deal with the semantics of template parsing each and every time (which is what I assume causes slow compilation).

25 minutes ago, stgatilov said:
  1. Include Eigen headers in only one cpp file and use it to implement whatever complicated operations you need. Expose such operations to headers as non-inlineable methods.

I suppose this would trade compilation speed for application speed (since you'd need actual function calls instead of inlined code for even simple operations) so would probably be a pessimisation from the user perspective.

Link to post
Share on other sites
5 minutes ago, stgatilov said:

Measured it with timewatch.
Full build took 2:41 before Eigen, and took 3:15 after Eigen.
I guess I should dig deeper what the numbers mean 🥺

That is actually impressively fast. I think it might even be faster than my Linux build.

My guess is that confusion probably arises from two things:

  • CPU time versus physical time — 600s could be 100s on 6 processor cores in parallel.
  • Overlapping parallel processes incorrectly interpreted as being summed together — a 600s process feeding data into a 500s process might result in a a wall time of 600s rather than 1100s.
Link to post
Share on other sites
1 hour ago, OrbWeaver said:

Presumably the problem is the complexity of the templates, rather than the mere existence of templates? After all our original Vector classes were already templated on the element type, although we only ever instantiated them with <double>. Eigen's templates are considerably more complicated with their MatrixBase, DenseBase and other helper parent classes.

Yes, simple templates usually don't cause much trouble.
But Eigen is most likely designed for large matrices, and all the complexity is worth it when you deal with 500 x 500 matrices.

Quote

It's odd that there isn't a way to tell the compiler "instantiate this complex template once, then use it everywhere else as a simple inlined class". I'm pretty sure that once all the helper templates are processed, the Eigen code must reduce to the same sequence of basic multiplications; you'd think there would be a way to get to the end point without having to deal with the semantics of template parsing each and every time (which is what I assume causes slow compilation).

Instantiated templates are not reparsed today, although MSVC did it for many years.

However, every call site where template code is inlined has to be compiled again and again, there is nothing to reuse there.
Also, they have to be recompiled in every translation unit, because that's what separate compilation model requires.

It is possible to instantiate template code only once using extern template. However, you lose inlining this way too.

Quote

I suppose this would trade compilation speed for application speed (since you'd need actual function calls instead of inlined code for even simple operations) so would probably be a pessimisation from the user perspective.

You don't need inlining for SVD and for decomposing matrix into translate + scale + rotate. These operations are slow anyway, lack of inlining whole be noticeable.
Inlining is very important for trivial things like adding, multiplying, dot products, etc. Just write three additions in header and everything would be OK. Don't use Eigen for that.

Link to post
Share on other sites
1 hour ago, OrbWeaver said:

That is actually impressively fast. I think it might even be faster than my Linux build.

VC++ is not a slow compiler, I think it's actually doing pretty good. When I switched from MinGW to VC++ my life became a lot easier. Build times went down even more (by almost one order of magnitude, iirc) when I added the precompiled headers to the heavier projects, like the DarkRadiant main binary, the S/R and Objectives plugins, and the Scripting plugin. It really pays off, you can literally see how it chews through the compilation units much faster than before.

Precompiled headers are possible in gcc too, and we should see the same difference when we manage to add it to the CMakeLists. The Linux compilations in my VMs are awfully slow compared to what I'm used to in Windows.

  • Like 2
Link to post
Share on other sites
1 hour ago, greebo said:

Precompiled headers are possible in gcc too, and we should see the same difference when we manage to add it to the CMakeLists. The Linux compilations in my VMs are awfully slow compared to what I'm used to in Windows.

Perhaps PCH in GCC are simply worse.

Precompiled header in MSVC is implemented via memory dump of compiler done at the end of processing the header.
When it is used, this saved state is simply loaded from disk (most likely mapped) and processing continues from that point.

That's rather barbaric approach, and it does not work for C++ modules (which act like modular PCH), but it should be perfect in terms of performance.

Link to post
Share on other sites

Getting this to work would pay off big time. I recall doing that once for the TDM source code in Linux, and it was a huge improvement there too. But we were using Scons back then which caused the build to always think it was out of date and one had to recompile everything even if just changing a non-header file - understandably annoying, even with PCH.

  • Like 1
Link to post
Share on other sites

I have inspected the revision just before Eigen was added the same way I did with master.

As expected, parsing takes 128s instead of 172s, and template instantiation takes 121s instead of 605s.
However, exclusive duration for C1DLL goes down from 785s to 514s and CPU time goes down from 608s to 473s. The expected -500s difference is not here. Moreover, the version without Eigen has 5% difference between CPU time and exclusive duration, while the version with Eigen has 20% difference.

I suspect that some data in the story cannot be added together, e.g. time for template instantiations.
And now there is question what took half of frontend time before Eigen, given that Parsing and Template Instantiations summed together only take half of it.

Most likely I'm doing something wrong.
I already have an answer, now I need to find the question 😁

  • Like 1
Link to post
Share on other sites
5 hours ago, greebo said:

Getting this to work would pay off big time. I recall doing that once for the TDM source code in Linux, and it was a huge improvement there too. But we were using Scons back then which caused the build to always think it was out of date and one had to recompile everything even if just changing a non-header file - understandably annoying, even with PCH.

Well, the good news is that this turned out to be really easy to set up. It's one line of CMake which Just Works, although I wrapped in a CMAKE_VERSION check to make sure the build won't break for those who don't have CMake >= 3.16.

The not so good news is that this only seems to shave about 5 seconds off the compile time, from 3:44 down to 3:39. Perhaps it delivers more benefit when you're doing an incremental re-compilation after some code change.

Link to post
Share on other sites
On 6/9/2021 at 2:30 PM, greebo said:

I recall doing that once for the TDM source code in Linux, and it was a huge improvement there too. But we were using Scons back then which caused the build to always think it was out of date and one had to recompile everything even if just changing a non-header file - understandably annoying, even with PCH.

This reminded me about the time when I first got involved with TDM (Dec 2016), in the era of TDM 2.04/2.05-beta, building under Linux natively (i.e. no VM).

It should be noted that the incredibly annoying issue with SCons rebuilding everything all the time was easily fixed with essentially a 1-line tweak.

Reading my notes from back then, I see that when building "from scratch", use of PCH nicely lowered the build time (from 16m39s without PCH to 9m01s with PCH), but only for the 1st build.  Subsequent builds' speed improvement with PCH would obviously depend somewhat on what changed in the code, but the improvement was never anywhere near as dramatic, often showing no measurable improvement.

Fixing that SCons issue saved me about 9 minutes every single time I'd build TDM.  Using PCH saved me a little less than 8 minutes, but only on the very 1st build and never much again.

Link to post
Share on other sites

Some news about Eigen and build time.

I realized rather quickly why my original analysis was wrong.
The "Duration" shown for template instantiations in WPA is inclusive, so summing them up is a bad idea. It is especially bad when templates are deeply nested, which is the case for Eigen.

It took me quite some time to find a way of computing total impact of Eigen.
I had to implement custom analyzer in my fork.
Of course, this is not very reliable, because who knows what I did wrong 😌


Anyway, here is what it reports for the latest rev:

Microsoft (R) Visual C++ (R) Performance Analyzer DEVELOPER VERSION
Total time for parsing files matching "*Eigen*":
  CPU Time:      79.565226 /  96.567721 / 110.676782
  Duration:     448.373768 / 1112.570639 / 1168.585713
Total time for template instantiations matching "*Eigen*":
  CPU Time:     169.516602 / 169.516602 / 170.677942
  Duration:     122.637069 / 188.706707 / 190.002096

This time did not limit parallelization, so Duration should be ignored.
Every line shows 3 numbers. The first two show total exclusive time spent on Eigen headers/templates. They are computed in slightly different way, and for some stupid reason produce different results 😥 The last number shows total inclusive time spent on topmost Eigen headers/templates.
If Eigen template instantiation internally causes instantiation of non-Eigen template, then the last number includes the time for that child template, while the first two numbers don't. That's the difference.

Here we see that 110s is spent on parsing Eigen headers, and 170s is spent on instantiating its templates. That's 300s in total.


Here is comparison of overall stats between the latest revision and the pre-Eigen revision.

  • Full wall time: 164s -> 184s
  • Total CPU time: 840s -> 1270s
  • CPU time of C1DLL (frontend): 650s -> 1052s

So, the frontend now takes more by about 400s of CPU time, which increases the total CPU time by 50%.
The wall time increases only by 12%.
Perhaps that's because the build is not perfectly parallelized, and the old version has more idle time. Basically, the increased CPU load has filled some of the idle time.

Here is how CPU usage plot looks like:
DrBuildCpuLoad.png.72304f50f64dc169c786fab5c765ac0b.png

Maybe the wall time will catch up with CPU time in future, maybe not...

Also, since my tool reports 300s spent on Eigen in the new version (instead of 400s), it would be more correct to say that adding Eigen increased CPU time by 33% (instead of 50%).

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...