Jump to content
The Dark Mod Forums

SIMD anyone?


MoroseTroll

Recommended Posts

I'm working on my own implementation of the idSIMD library in Doom 3. The work itself is hard enough because it is 100% asm, and it is not completed yet (11K lines in total). My main goal is to accelerate the library on AMD CPUs, because on Intel CPUs it is already works fast enough, I believe.

Anyway, in the attached PDF you can find a statistics about the almost all functions I've rewritten. Some of them were not accelerated at all (i.e. ~1.0 boost coefficient), some of them were accelerated at rate of 5 (but these are seldom).

So, what do you say, guys? Does anyone of you already work on the idSIMD library? Do you need my work?

MyIdLib.pdf

Link to comment
Share on other sites

Nice. :)

 

Have you been able to measure any significant FPS increase in the game so far? Because you're aware that the idlib library is linked statically, so the closed source engine (Doom3.exe) won't take notice of the changes you're linking into the gamex86.dll. Is the game code making strong use of SIMD processing? I don't recall many SIMD calls in the game DLL, except for some parts of the animation code, so I'd be curious whether the performance increase is measurable as long as the main part of the engine is closed.

Link to comment
Share on other sites

AFAIK, the idSIMD library is tightly connected with a render and physics components of the engine, so I believe that my optimizations will improve overall performance. Jan Paul van Waveren, the author of the original idSIMD library, has stated somewhere (alas, I don't remember where exactly) that the BlendJoints function is very important for a geometry transformations in the engine. I hope, all other functions are important as well :).

As for the calls to the idSIMD library: it contains very optimized vector and matrix routines, fast memcpy analog and many other useful things. Morethen, if you feel that there is some operation in the game code is too hungry for CPU, I can optimize it by SIMD as well. Of course, such kind of optimization is possible only if the piece of code is doing a relatively simple and repeated actions, as all the functions in the idSIMD library do.

As for tests: no, I haven't made it yet. But I plan to do it this summer, on my vacation, and complete the work.

Link to comment
Share on other sites

Well, I sure do hope that the improvement is measurable, otherwise it'd be a shame and you might be wasting a lot of time of yours, and wasting it real hard. But I guess you know that. :)

Link to comment
Share on other sites

Yes, I know. But if we look at what Raven programmers have made with id Tech 4 in Quake 4 in all and in the idSIMD library particular, then there is a good chance, I hope, that my work will bring some good results. So, time will tell.

 

P.S. I must admit that I'm doing this work in order to get some grant from AMD :). But quiet - nobody should know my insidious plan :ph34r::laugh:

Edited by MoroseTroll
Link to comment
Share on other sites

Bikerdude: Well, yes. The plan is to complete the library, polish it to shine for specific CPUs models (there are several code paths exist in my version of the library for every AMD CPUs generation), and publish a custom gamex86.dll for Doom 3, Quake 4, Prey, and ET:QW. Of course, any noncommercial games based on the mentioned titles are welcome to use my work free of charge.

Edited by MoroseTroll
Link to comment
Share on other sites

You see, Jan Paul van Waveren, the id programmer and the author of the original idSIMD library, has optimized it for Intel Pentium 4. My version uses AMD-specific instructions (E3DNow! & PREFETCH/W) and mode (so named "Misaligned SSE"), so I'm not sure that my code will run fine on Intel CPUs. But don't fret about this, because all Intel CPUs after Pentium 4/D (Core, Core 2, Core i3/i5/i7, Pentium E/G, Celeron M/E) have a very efficient instruction decoder that allows them to perform any code nice and fast. Eventually, if it turns out that my code will be fast enough, I could port it to Intel CPUs as well, by removing AMD-specific stuff.

Edited by MoroseTroll
Link to comment
Share on other sites

That's very nice to hear. My own work on the SIMD library (the code is in TDM) so far has only been a bit of code for Linux to detect the actual CPU and choose the right SIMD provider - but AIK, no SIMD code for Linux exists.

 

So I hope that your code paths will also be available for Linux - otherwise a lot of work would be wasted and not benefit Linux at all.

 

In any event, the current "drawvert.h" code uses a memory layout that is not very SIMD friendly if you want to copy-modify a lot of verts around. This is used f.i. in the SEED code. Getting an SIMD function that can take a lot of drawverts, plug a scale/rotation/translation matrix and churn out a modified version would be very cool (and help until we can finally put such code on the GPU). (The same code will also benefit from computing the min/max verts, but maybe it would be even better if we had a function that would do both steps in one go).

 

So maybe you want to look into this?

 

It would be nice to see someone working on http://bugs.angua.at/view.php?id=2427

"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man." -- George Bernard Shaw (1856 - 1950)

 

"Remember: If the game lets you do it, it's not cheating." -- Xarax

Link to comment
Share on other sites

The only way I can see to implement SIMD in the Linux build of TDM is to use FASM or another multiplatform assembler. I.e. to deploy all the idSIMD library functions as an .asm files and compile them separately.

As for drawvert.h - I will see what I can do :).

Link to comment
Share on other sites

Interesting. I never knew such lowlevel coding was necessary in computer games, but after reading this thread, it makes sense of course.

 

I once worked with asm when implementing a transmission system in a C dialect. But I only looked at the generated asm code in order to see, where I can optimize things for more speed, which is of course by far not as hard as writing it from scratch. So respect goes out to you... :) I will follow this discussion, just out of interest.

Link to comment
Share on other sites

The only way I can see to implement SIMD in the Linux build of TDM is to use FASM or another multiplatform assembler. I.e. to deploy all the idSIMD library functions as an .asm files and compile them separately.

 

I am not sure, because I coded some things in assembler and just have them inline in GCC, so I think the SIMD code must be possible, too. But I could be wrong, has been a while I looked into that.

 

As for drawvert.h - I will see what I can do :).

 

Thanx! The relevant code using it is in DarkMod/ModelGenerator.cpp.

"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man." -- George Bernard Shaw (1856 - 1950)

 

"Remember: If the game lets you do it, it's not cheating." -- Xarax

Link to comment
Share on other sites

AFAIK, the earlier versions of GCC has a different inline assembler syntax than MSVC, but I'm not sure about the newer versions.

About DRAWVERTS.H: I've already optimized all the idSIMD Library functions that are using DRAWVERT primitive: a pair of Dot, a pair of MinMax, TransformVerts, TracePointCull, DecalPointCull, OverlayPointCull, DeriveTriPlanes, DeriveTangents, DeriveUnsmoothedTangents, NormalizeTangents, CreateTextureSpaceLightVectors, CreateSpecularTextureCoords, CreateShadowCache, CreateVertexProgramShadowCache. Are there any of them useful for your needs? If not, I can easily create a new one for you.

Link to comment
Share on other sites

AFAIK, the earlier versions of GCC has a different inline assembler syntax than MSVC, but I'm not sure about the newer versions.

About DRAWVERTS.H: I've already optimized all the idSIMD Library functions that are using DRAWVERT primitive: a pair of Dot, a pair of MinMax, TransformVerts, TracePointCull, DecalPointCull, OverlayPointCull, DeriveTriPlanes, DeriveTangents, DeriveUnsmoothedTangents, NormalizeTangents, CreateTextureSpaceLightVectors, CreateSpecularTextureCoords, CreateShadowCache, CreateVertexProgramShadowCache. Are there any of them useful for your needs? If not, I can easily create a new one for you.

 

I don't have the source code here atm, but I don't think any of these fit. The function I need does the following:

 

* takes a source list of drawverts and a destination list

* a scaling/rotation/translation matrix

* takes also an optional vertex color

* appends the source list by applying the scale/rotation/translation/vertex color to the dest list

 

The code in ModelGenerator.cpp would need to be taken as a guideline.

 

I won't have time to look into the source before sunday, tho.

"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man." -- George Bernard Shaw (1856 - 1950)

 

"Remember: If the game lets you do it, it's not cheating." -- Xarax

Link to comment
Share on other sites

I've already seen ModelGenerator.cpp. It seems to me that there will be no big deal to implement the function you need...

 

That would be very cool, because for very complex models that model-update is the bottleneck. You get bonus points if the function also calculates the max/min bounds at the same time (instead of another function that runs through the results afterwards :)

"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man." -- George Bernard Shaw (1856 - 1950)

 

"Remember: If the game lets you do it, it's not cheating." -- Xarax

Link to comment
Share on other sites

I've seen two places in ModelGenerator.cpp with idDrawVert primitive. I can optimize both of them, but if you are saying about the min/max bounds calculation, then I need an updated version of this file with an updated source code of the DuplicateModel function (+ DrawVert.h, Matrix.h, Vector.h). You see, the last version of TDM source I have is just 1.04...

Edited by MoroseTroll
Link to comment
Share on other sites

I've seen two places in ModelGenerator.cpp with idDrawVert primitive. I can optimize both of them, but if you are saying about the min/max bounds calculation, then I need an updated version of this file with an updated source code of the DuplicateModel function (+ DrawVert.h, Matrix.h, Vector.h). You see, the last version of TDM source I have is just 1.04...

 

The v1.05 source is here:

 

http://www.bloodgate.com/mirrors/tdm/pub/thedarkmod.1.05.src.7z

 

(I wonder why our main page is no longer accessible via thedarkmod.com? Greebo, or Sparhawk, what's up?)

 

I am not sure if the SIMDProcessor->MinMax() call really needs to be folded in, tho. Might be simpler to keep it sep?

 

Anyway, there is also a bug, in the first case the bounds are always copied, but if we scale the model while copying it, the new bounds need to be calculated. Should be added to the tracker, at least.

"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man." -- George Bernard Shaw (1856 - 1950)

 

"Remember: If the game lets you do it, it's not cheating." -- Xarax

Link to comment
Share on other sites

This is probably already a known document but I thought I would link it here just in case:

 

http://software.intel.com/en-us/articles/optimizing-the-rendering-pipeline-of-animated-models-using-the-intel-streaming-simd-extensions/

 

If I'm not mistaken, all the animation code is in the SDK?

 

~~~~

 

Tangent:

 

This may be a little cheeky of me but I brought up the concept of using animated normal maps for facial animation in place of bone based animations. If I understand that flow-chart correctly, there are early-out branches when no polygons are changed. Therefore, my theory appears to be sound. It would offer a performance savings.

 

I must stress, if I hadn't already, that this is of purely academic interest. I have no demand for Springheel or any other modeler to build a bazillion high-poly head models and bake a bazillion normal maps to implement this.

Please visit TDM's IndieDB site and help promote the mod:

 

http://www.indiedb.com/mods/the-dark-mod

 

(Yeah, shameless promotion... but traffic is traffic folks...)

Link to comment
Share on other sites

The idSIMD library charts covered on that Intel site is a part of Enemy Territory: Quake Wars, IIRC, another id Tech 4 game. There are some functions were added into the library since Doom 3, so I wonder whether TDM team can use the new (post-Doom 3) parts or not.

Link to comment
Share on other sites

  • 4 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recent Status Updates

    • taffernicus

      i am so euphoric to see new FMs keep coming out and I am keen to try it out in my leisure time, then suddenly my PC is spouting a couple of S.M.A.R.T errors...
      tbf i cannot afford myself to miss my network emulator image file&progress, important ebooks, hyper-v checkpoint & hyper-v export and the precious thief & TDM gamesaves. Don't fall yourself into & lay your hands on crappy SSD
       
      · 2 replies
    • OrbWeaver

      Does anyone actually use the Normalise button in the Surface inspector? Even after looking at the code I'm not quite sure what it's for.
      · 7 replies
    • Ansome

      Turns out my 15th anniversary mission idea has already been done once or twice before! I've been beaten to the punch once again, but I suppose that's to be expected when there's over 170 FMs out there, eh? I'm not complaining though, I love learning new tricks and taking inspiration from past FMs. Best of luck on your own fan missions!
      · 4 replies
    • The Black Arrow

      I wanna play Doom 3, but fhDoom has much better features than dhewm3, yet fhDoom is old, outdated and probably not supported. Damn!
      Makes me think that TDM engine for Doom 3 itself would actually be perfect.
      · 6 replies
    • Petike the Taffer

      Maybe a bit of advice ? In the FM series I'm preparing, the two main characters have the given names Toby and Agnes (it's the protagonist and deuteragonist, respectively), I've been toying with the idea of giving them family names as well, since many of the FM series have named protagonists who have surnames. Toby's from a family who were usually farriers, though he eventually wound up working as a cobbler (this serves as a daylight "front" for his night time thieving). Would it make sense if the man's popularly accepted family name was Farrier ? It's an existing, though less common English surname, and it directly refers to the profession practiced by his relatives. Your suggestions ?
      · 9 replies
×
×
  • Create New...