Jump to content
The Dark Mod Forums
Sign in to follow this  
greebo

Scenegraph Refactor?

Recommended Posts

A compiling darkrd on Ubuntu would be cool and would it make possible for me to

do some little changes (vbos etc) myself.

DarkRadiant can compile just fine under Ubuntu, OrbWeaver is doing that regularly, I reckon. ;)

 

@scons: I encountered that issue as well and could resolve it by adding a

module_env.Dir('libs/string')

at the module_env = g_env.Copy() statement to force scons somehow into this. If you're building on an AMD64 it's also necessary to add the -fPIC position-independent code switch as standard, otherwise the linker will throw up.

 

Regarding the problem: I was able to setup my build environment under Linux and to reproduce that bug, so I can start debugging this one now.

Share this post


Link to post
Share on other sites
Starts up nicely, but crashes when I try to create a bursh. I'll wait until the

transition from the ugly cast-system is complete.

 

The transition is complete, there is just an outstanding crash on Linux which Greebo is looking into currently.

Share this post


Link to post
Share on other sites

Yeah, and it's driving me completely nuts, I must admit. The crash is not happening consistently, sometimes it crashes, sometimes not. I have cluttered the codebase with std::couts and NULL pointer checks, but this doesn't seem to be the root here.

 

At the moment (this might change) it crashes in Doom3EntityCreator::getEntityForEClass, without doing anything suspicious. Just like that, right after the std::cout. The this pointer is valid and no member variables are accessed, it's a bit random.

 

Is the std::cout debug output reliable in Linux or is it write-buffered (so that some output may be lost when crashing)? The curious thing is that DarkRadiant is displaying an assertion failure that is originating from somewhere completely else in the codebase, not the code sequence I'm currently looking at. It's all very strange.

 

Maybe I've violated some fundamental rules in terms of language or dynamic_cast? I really can't say why the Linux build is behaving so insane, but the Windows build isn't affected at all.

 

edit: I will do some research about the dynamic_cast as I implemented it. Maybe the behaviour of dynamic_cast(&node) (casting with the pointer made out of a reference) is undefined but it happens to work in Windows.

 

edit2: Nope, the pointer-from-reference seems to work correctly, at least in a quick test.

Share this post


Link to post
Share on other sites

I'm taking a different direction now, trying to nail down the revision where the creation of objects stopped working. Currently, I'm at revision 1761, going upwards (1760 is still working).

 

(It's a bit cumbersome, because every second time I quit DarkRadiant my Linux freezes completely, so there might be something going on with the graphics drivers as well, forcing me to hit reset every here and then.)

 

edit:The error occurred somewhere in between 1761 and 1768

Share this post


Link to post
Share on other sites
edit: I will do some research about the dynamic_cast as I implemented it. Maybe the behaviour of dynamic_cast<Bla*>(&node) (casting with the pointer made out of a reference) is undefined but it happens to work in Windows.

 

That's what I was concerned about before -- I don't know for sure whether it is undefined or not, but it doesn't strike me as a particularly good idea. I strongly recommend not using this construct, although I can't say it will solve the problem.

 

edit2: Nope, the pointer-from-reference seems to work correctly, at least in a quick test.

 

Unfortunately the fact that it works in a test does not prove that it is not undefined behaviour.

Share this post


Link to post
Share on other sites

Is it really undefined? I couldn't find anything about this in google, maybe I searched for the wrong terms. I will eventually replace that interface with shared pointers, I just have to make sure that the pointers are stored somewhere, but they should be. One thing I haven't figured out yet is the NodeSmartReference stuff, I haven't looked into that. Same goes for the scene::Path.

 

I'm currently at revision 1763 which works, 1764 is compiling right now. It's all so painfully slow here, I don't know yet what I could do to speed my Ubuntu up a bit.

Share this post


Link to post
Share on other sites

Ok, it's evident now that the dynamic_cast is not working reliably under Linux when using pointers from references. I could produce the first reliable crash in revision 1766, which introduced the first dynamic_cast<:instantiable>. Instantiable is used all the time, that's why it's instantly crashing in this revision.

 

In previous revisions (1764 and 1765), the dynamic_cast has only been used for minor stuff like Nameable* pointers, which were only used for the EntityList. Taking a look at the EntityList in this revision reveals that the dynamic_cast is disfunctional here as well (interestingly it works for the BrushNodes, but not for Entities, maybe this is related to nodes in plugins, I don't know).

 

I'm planning to revert the changes in the trunk to revision 1761 (before the refactor), so that the trunk is functional again. Then I'll open a new branch and start to refactor this thing more carefully. This of course means that my work of nearly a week would be lost, but it's my fault after all.

 

How can I conveniently revert the changes in the trunk to a certain revision?

 

Alternatively, I could go ahead and try to change the Node_getInstantiable() interface to use (shared) pointers, which is a bit more risky, but I think the nut can be cracked. What do you think, OrbWeaver? What is best here and what should I do?

 

I apologise for causing such troubles with the trunk, I didn't know that it would be such a problem!

Share this post


Link to post
Share on other sites

Is it specifically the pointer-from-reference issue, or is dynamic_cast working unreliably altogether? If it is the former, how difficult would it be to convert the functions to accept pointers all the way through, in preparation for a future change to shared_ptr?

Share this post


Link to post
Share on other sites

Hm, I can't really say before I tried it. Dynamic_cast itself should work ok, I can't really imagine that this type of operator wouldn't work reliably in Linux, so I guess the interface is the culprit here.

 

I will do some quick tests (after installing Ubuntu again on my external HDD - the current installation is a bit crappy) and post back here.

 

For reference: revision 1761 in the trunk should still work ok, in case you need a working build to investigate something else. Apart from that I only did some refactorings in the entity plugin, but no functionality changes.

Share this post


Link to post
Share on other sites
Is it specifically the pointer-from-reference issue, or is dynamic_cast working unreliably altogether? If it is the former, how difficult would it be to convert the functions to accept pointers all the way through, in preparation for a future change to shared_ptr?

This seems to be one of the issues why C++-rtti was avoided in the first place.

I never experienced these problems my self, this is a surprise for me too.

 

I did some googleing and found, aside from many many bug reports,

some example code for gcc 3.3 that exposes a dynamic_cast bug:

 class cFoo { ... };
class cBar : public cFoo { ... };

cFoo* pfoo = CreateBarInDLLAndCastToFoo();
cBar* pbar = dynamic_cast<cBar*>(pfoo);
// pbar is zero

Since we work alot with shared libraries, a similar scenario would be possible...

 

In my opinion, a cast mechanism that "might" work is unacceptable, we should

try to implement our own cast-mechanism.

Share this post


Link to post
Share on other sites

So gcc is implementing dynamic_cast in an unreliable way? I can't believe it. For the records, I'm using gplusplus 4.1 in Ubuntu (at least I think I do, because I chose this package in Synaptic).

 

The code example above makes sense though - the BrushNodes were recognised correctly (they are in the core), whereas the EntityNodes didn't (as I first suspected, I still can't believe it).

 

Implementing another self-made type-cast system - isn't there another way around that?

Share this post


Link to post
Share on other sites

Right, this is GCC 4 then, not GCC 3.3 under MinGW. I find it hard to believe that the most recent version of GCC would have such a glaring bug, although I guess it is possible.

 

Implementing another cast system is a waste of effort in my view, if dynamic_cast cannot be used we should just stick with the existing system.

Share this post


Link to post
Share on other sites

There is a FAQ entry on this topic mentioning some compiler/linker flags. Is this useful information? I have hardly any experience with compiler/linker settings so far.

 

http://gcc.gnu.org/faq.html#dso

Share this post


Link to post
Share on other sites

Good find, that's exactly what we need.

 

Changing the call to dlopen() should be easy, but some care is needed with --export-dynamic because different libraries use the objects in the global namespace with the same name (e.g. "Model"). These will need to be properly namespaced to avoid conflicts.

Share this post


Link to post
Share on other sites

I really would like to see this working with just the C++ standard operations.

 

I'd hate to roll back my changes, but if we really have to do it, I'd at least leave only the NodeStaticCast/InstanceStaticCast alive, and ditch the ugly ContainedCasts which are just a symptom of the "clean" design.

Share this post


Link to post
Share on other sites

So gcc is implementing dynamic_cast<> in an unreliable way? I can't believe it.

Well I think they made it as reliable as possible, I have no idea how a compiler should store the required metainformation that type X inherits from type Y that is implemented elsewhere.

Share this post


Link to post
Share on other sites

Just wanted to drop a note here, that I probably won't be able to look into that till the weekend, so if you have time to play around with the Linux linker options, feel free. You can probably handle these things better anyway. :)

Share this post


Link to post
Share on other sites

Well I had a go, as expected it doesn't work, crashing instead at startup during the XYWnd::callbackExpose method. I suspect this is because the addition of --export-dynamic is causing incorrect symbol resolution, although it's not at all obvious why this would affect GL.

Share this post


Link to post
Share on other sites

What can be done to fix the symbol resolution? As far as I can see, there are a few calls to global functions like Map_Valid() and GlobalOpenGL_debugAssertNoErrors(), which are not in a namespace.

 

Did you both change the load library call to RTDL_GLOBAL and set the -Wl,-E flag? (I got this info here as well)

 

(Hm, I get the feeling that my knowledge is very insufficient here. Sucks.)

 

Should I re-implement the NodeStaticCast stuff?

Share this post


Link to post
Share on other sites
What can be done to fix the symbol resolution? As far as I can see, there are a few calls to global functions like Map_Valid() and GlobalOpenGL_debugAssertNoErrors(), which are not in a namespace.

 

Good question. I am not entirely sure what the problem is, it could be that some of the QGL stuff is overriding/redefining OpenGL functions which are clashing with the standard OpenGL calls in libGL.so, but I thought Crispy removed all of this stuff. It will probably be worth a check for this.

 

Did you both change the load library call to RTDL_GLOBAL and set the -Wl,-E flag? (I got this info here as well)

 

Yes.

 

Should I re-implement the NodeStaticCast stuff?

 

Hmm, it seems a pity to undo all of the work you did just because of a symbol resolution issue which in theory should be solvable. I guess you've still got the refactor branch, so you could temporarily revert the trunk and I can investigate the symbol issues in the separate branch.

Share this post


Link to post
Share on other sites
it could be that some of the QGL stuff is overriding/redefining OpenGL functions which are clashing with the standard OpenGL calls in libGL.so, but I thought Crispy removed all of this stuff.

So did I.


My games | Public Service Announcement: TDM is not set in the Thief universe. The city in which it takes place is not the City from Thief. The player character is not called Garrett. Any person who contradicts these facts will be subjected to disapproving stares.

Share this post


Link to post
Share on other sites

I finally bought an additional HDD to setup Ubuntu on my own machine, as a dual-boot solution with WinXP on a RAID0 doesn't seem to be supported properly (or would cause some serious research I didn't bother to go through).

 

It's an AMD64, that's why I had to add some default compiler flags to Sconstruct (the -fPIC switch), otherwise the linker would throw an error "could not read symbol" or something.

 

I'm now at revision 1766 in the scenegraph_refactor_test which previously crashed on the other system I was working on (a non-64-bit machine, where I didn't need the -fPIC). It somehow doesn't crash. The dynamic_cast and works fine and I can create brushes and Entities and such.

 

Maybe this is due to the -fPIC? There are some relations between -fPIC and dynamic_cast on google, but I haven't read through all of them. OrbWeaver, could you try to recompile the HEAD revision (rev1818), which already contains the -fPIC flags by default? Is there anything different as far as the crash is concerned?

 

(Of course there are some other problems like splash image not being loaded and a crash on shutdown, but I think this is due to my amateur-ish Ubuntu setup. And Eclipse is crashing constantly. And my keyboard settings are reset every time I reboot. And don't even ask about my ATI driver installation odyssey...)

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...