Jump to content
The Dark Mod Forums
Sign in to follow this  
greebo

Scenegraph Refactor?

Recommended Posts

An easy check to see if the dynamic_cast is working, is to open the EntityList after creating a few lights, entities and brushes/patches. If all the names are showing up (light_1, Brush etc.), the cast is working fine. If only the Brushes/Patches are visible and everything else is "node", the cast fails.

 

I'm a bit puzzled regarding the splash image - I think it's using the same routines as the icons and everything else. Strange.

 

edit: found another thread discussing this dynamic_cast problem and it seems that -fPIC is indeed solving some issues: http://www.codecomments.com/archive286-2005-5-489612.html

Share this post


Link to post
Share on other sites

The dynamic_cast appears to be working fine.

 

Here is the backtrace for the double free, it's not very helpful but does suggest that there is a problem destroying a string.

 

#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb74169a1 in raise () from /lib/tls/i686/cmov/libc.so.6
#2  0xb74182b9 in abort () from /lib/tls/i686/cmov/libc.so.6
#3  0xb744a87a in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6
#4  0xb7450fd4 in malloc_usable_size () from /lib/tls/i686/cmov/libc.so.6
#5  0xb745134a in free () from /lib/tls/i686/cmov/libc.so.6
#6  0xb75c58b1 in operator delete () from /usr/lib/libstdc++.so.6
#7  0xb75a48f3 in std::string::_Rep::_M_destroy () from /usr/lib/libstdc++.so.6
#8  0xb75a5884 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string () from /usr/lib/libstdc++.so.6
#9  0xb71a88e6 in __tcf_0 () at libs/gtkutil/image.cpp:34
#10 0xb74195d7 in exit () from /lib/tls/i686/cmov/libc.so.6
#11 0xb7402eac in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#12 0x08078db1 in _start () at ../sysdeps/i386/elf/start.S:119

 

Incidentally, what is the purpose of this:

 

if radiant_env['PLATFORM'] == 'win32':
radiant_prog = radiant_env.Program(target='darkradiant', source=radiant_src + 'radiant/darkradiant.o'])
else:
radiant_prog = radiant_env.Program(target='darkradiant', source=radiant_src)

 

I can't imagine why you would need to explicitly add an object file to the source list, is there some problem on Windows with the linking?

Share this post


Link to post
Share on other sites

gtkutil/image is involved here, so it might even be related to the splash image not showing up, but that might be a wild guess as well.

 

How can I generate such a backtrace, btw? I have obviously no experience with gdb.

 

edit: The object file is the windows resource icon. It is the only way I could get it to work with my limited scons experience. :blush:

Share this post


Link to post
Share on other sites
How can I generate such a backtrace, btw? I have obviously no experience with gdb.

In a console, type (without the >s of course):

 

>gdb name_of_executable_to_be_debugged

>run

 

Trigger the crash, then type:

 

>bt


My games | Public Service Announcement: TDM is not set in the Thief universe. The city in which it takes place is not the City from Thief. The player character is not called Garrett. Any person who contradicts these facts will be subjected to disapproving stares.

Share this post


Link to post
Share on other sites
How can I generate such a backtrace, btw? I have obviously no experience with gdb.

 

GDB is fantastic (on Linux), it just takes some getting used to. Run

 

$ gdb install/darkradiant

 

to start GDB, then at the prompt type run to actually start the program. When the program stops (for a segfault, abort or breakpoint), go back to the prompt and type bt to get a backtrace.

 

If you want to examine individual stack frames, type frame <number> using the frame number displayed in the backtrace. You can then view the code lines with l (for "list") and inspect individual variables using an actual C++ expression, e.g.

 

(gdb) print *widget
(gdb) print self->getWidget()

 

etc.

 

edit: The object file is the windows resource icon. It is the only way I could get it to work with my limited scons experience. :blush:

 

Oh I see, in that case it should probably be called darkradiant_icon.o or something. At the moment it looks like you are trying to override the default linking behaviour of the object files generated by the build itself.

 

Can you not add the process of actually compiling the icon resource file as a part of the Scons script? Presumably this is done using GCC or other open-source tools as well?

Share this post


Link to post
Share on other sites

More useful information here:

 

http://tldp.org/HOWTO/Program-Library-HOWT...-libraries.html

 

It looks like both -fPIC and --export-dynamic would be needed in this situation, but curiously it seems to be working without --export-dynamic at the moment. The LD_DEBUG variable may generate some useful output as well.

Share this post


Link to post
Share on other sites
Can you not add the process of actually compiling the icon resource file as a part of the Scons script? Presumably this is done using GCC or other open-source tools as well?

It is compiled automatically, there is this statement:

radiant_env.buildIconResource()

which generates the object file using MinGW's windres. I agree that I could have named it better, I'll eventually change it (after the scenegraph mess is cleaned up).

 

Thanks for the explanation of gdb, this will come in handy. :)

 

I'm a bit lost in terms of the splash image / the shutdown crash and I won't have time to investigate this till tomorrow evening, unfortunately.

Share this post


Link to post
Share on other sites

I changed the splash_image() method a bit and used the gtkutil::getLocalPixbuf() instead of the new_local_image() legacy function, but this didn't change anything.

 

I experimentally assigned a shortcut to the splash screen and it seems that the image display is not working only during startup. If I invoke the show_splash() function later (after everything else is initialised), the image is displayed correctly. Strange.

 

The returned GtkImage* pointer is not NULL, I could check that using gdb (the new_local_image() returns NULL btw), but it doesn't get displayed, only the empty window container.

Share this post


Link to post
Share on other sites

The location of the show_splash() call does indeed make a difference. In the gtkutil/image.cpp file there is a (legacy) global g_BitmapsPath() which has to be initialised by each module using the image library. In the case of the DarkRadiant core this happens by calling Environment::init(), almost immediately after startup.

 

After this line:

 

ModuleLoader::loadModules(Environment::Instance().getAppPath());

the g_BitmapsPath variable is empty again and the gtkutil/image library fails to load the image. I don't know why this gets erased, but it is certainly related to the -fPIC change. Any ideas what might be going on?

 

It appears likely that the other crashes are related to this.

Share this post


Link to post
Share on other sites
Any ideas what might be going on?

 

No, but it's a good example of why we don't use global variables in this way. Can it be moved into the XML registry, and initialised once at startup?

 

It appears likely that the other crashes are related to this.

 

Very good point. God knows how much stupid global nonsense is being used elsewhere, which could be affected by linker options.

Share this post


Link to post
Share on other sites

Adding a static qualifier to the g_BitmapsPath like this:

 

static std::string g_bitmapsPath;

 

fixes the problem and the variable does not get cleared after the Modules are loaded. The splash screen is showing up again. I'll commit this single change and then move on to investigate the crash with the Media Browser. I think it happens somewhere in the recursive VFSTreePopulator methods, but I may remember wrongly as well.

 

Can it be moved into the XML registry, and initialised once at startup?

Incidentally, I already prepared the Registry for this case, but I didn't get around to change the gtkutil/image.cpp to query the Registry for the bitmaps path. It's probably necessary to add some GlobalRegistryModuleRef here and there for modules that solely depend on the gtkutil/image library but not on the registry (if there are any modules like this).

Share this post


Link to post
Share on other sites
Incidentally, I already prepared the Registry for this case, but I didn't get around to change the gtkutil/image.cpp to query the Registry for the bitmaps path.

 

Actually the gtkutil library (and any other library) should not call any part of the Radiant module interface, these should be standalone utility methods with no dependencies on Radiant. This is why I changed the gtkutil::errorDialog methods to accept a pointer to the main window, rather than calling GlobalRadiant().getMainWindow() themselves.

 

This probably makes gtkutil::getLocalPixbuf rather useless, since it would need to take a path to the bitmaps directory from the calling function. In this case, it might be better to have getLocalPixbuf be a part of the GlobalRadiant interface instead.

Share this post


Link to post
Share on other sites

Agreed, this is probably the cleaner solution. I will add this to the bugtracker after I committed the temporary workaround with the static.

Share this post


Link to post
Share on other sites

Am I insane or does the HEAD revision work correctly now in 32-Bit-Linux? I just tested it on an Athlon XP (non-64-bit) system under Linux and everything seems to work (no crash on creating stuff, no shutdown crashes, splashscreen is showing up correctly). Are the issues only AMD64-related?

Share this post


Link to post
Share on other sites

It's been working on 32-bit Linux since the -fPIC changes (did you think I was testing on 64-bit?). I don't get a spashscreen though, I suspect your fix will be needed for that.

 

To be honest, I don't have a clue what is going on at this point. Your splashscreen fix should be a NOP since "static" is the default allocation for global objects defined at file scope, and the class casting shouldn't work until --export-dynamic and RTLD_GLOBAL are used, but neither of them seem to be needed and they in fact just cause crashes.

 

The only thing I can be sure of is that we are on very shaky ground at the moment. A random problem that "just disappears" will probably "just reappear" at some particularly inconvenient future time.

Share this post


Link to post
Share on other sites

Hm. My 32-bit Linux here is working fine and the splash screen is showing up regardless of the show_splash() call location. So much for the reproducibility, maybe you're using another version of gplusplus (I'm using 4.1.2 and ld 2.17.50)?

 

I share your feeling about the solidness and I don't quite like the feeling of being at the mercy of all the various compilers and linkers. There's one positive point about the old Radiant stuff, I reckon, at least they work on all the platforms. I've not enough compiler experience myself and it's unlikely that I can do enough research when there is so much other substantial work to be done.

 

What shall we do here in this situation? Should we go back to a stable version, sticking with the old typesystem (without the ContainedCast)? Should we implement namespaces approach (which would involve a (probably small) learning curve)? Should we go forward with these changes (some AMD64 issues have to be fixed then)?

Share this post


Link to post
Share on other sites
What shall we do here in this situation? Should we go back to a stable version, sticking with the old typesystem (without the ContainedCast)? Should we implement namespaces approach (which would involve a (probably small) learning curve)? Should we go forward with these changes (some AMD64 issues have to be fixed then)?

 

I recommend moving forward, not back. If the code is written properly, without global variables, conflicting symbols at global scope and other badly-designed cruft, then there is no reason why it should not work using the modern approach. There may just be some refactoring required to ensure that it is robust and portable.

Share this post


Link to post
Share on other sites

Ok, I will try to get everything to work on my AMD64/gcc 4.1, which seems to be the most challenging system for some reason. :)

 

So far I still have problems with the MediaBrowser/VFSTreePopulator and during shutdown. I haven't even tested loading a large map, that is still to come.

 

(Also, it's about time that I give mohij's contribution a shot, he's still waiting for my feedback on this.)

Share this post


Link to post
Share on other sites

Ok, loading any map crashes my DarkRadiant instantly (on a 32 bit Linux here). Does/did this work fine on your system?

Share this post


Link to post
Share on other sites

I'm experiencing the following (in plugins/mapdoom3/parse.cpp):

 

The map parsing code tries to cast a node onto a MapImporter class, which fails (but shouldn't). I guess this is the same problem but in the opposite direction. Originally we had problems within the core code casting classes created in shared libraries. Now we have a problem within the shared library casting a node created in the core.

 

It seems that this is the case intended to be resolved by the --export-dynamic switch, but I'm not sure.

Share this post


Link to post
Share on other sites

I experimented a bit with the linker flags (-Wl,--export-dynamic and -Bsymbolic) and tried some combinations of RTLD_LAZY/NOW and RTLD_GLOBAL) but this was to no avail.

 

What about this approach: is it possible that (instead of having inlined Node_getYYY() methods) we use a singleton object in the core performing the casts? If the casting only works reliably in the core, a solution might be to leave all the casts there.

 

Or move them into a statically linked library, where the Node_getYYY are not inlined, but properly declared in the header and defined in a source file?

 

Then again, this would mean that all uses of dynamic_cast(scene::Node*) had to be wrapped.

Share this post


Link to post
Share on other sites

I don't think it is where or how you perform the casts that matters, it is where the casted objects are defined that is causing the problem.

 

Do you get a crash if you use RTLD_GLOBAL in the DynamicLibrary constructor? This seems to trigger the OpenGL crash on my Linux system.

Share this post


Link to post
Share on other sites

RTLD_GLOBAL doesn't do anything positive for me, in any combination of -Wl,-Bsymbolic and/or -Wl,-export-dynamic, just a crash at shutdown with a huge list of error output in the console ("glibc detected corrupt double-linked list" or something like that).

 

I don't know if the DarkRadiant codebase in its current form can be "tamed" by any combination of compiler switches in the first place. Maybe there is always a conflict with some implementation, but I can't say for sure.

Share this post


Link to post
Share on other sites

I could get the map loading to work by adding these link flags in the sconstruct script:

LINKFLAGS += '-Wl,-fini,fini_stub -L. -static-libgcc -Wl,--export-dynamic '

I could've sworn that I tried the --export-dynamic before, maybe I missed a '-' or something. I didn't change anything else apart from that (-fPIC is still active), no RTLD_GLOBAL or anything. It still crashes on any tree view though (MediaBrowser, ModelSelector), I will look into this next.

Share this post


Link to post
Share on other sites

The treeviews are working again. There were two calls in VFSTreePopulator and in MediaBrowser storing the result of std::string::rfind into an unsigned int, which discards the sign (without compiler warning, methinks). The subsequent comparison to std::string::npos was therefore never true and the population loop went into infinite. std::string::npos seems to be differently defined on my system, otherwise I can't explain why this didn't crash before. I was lucky to have gdb here, otherwise this would've taken me half a day or so.

 

The crash on shutdown is still left. :)

 

edit: Actually, std::size_t is an unsigned type as well, but it seems that it has a different sizeof() value on 64-bit systems. Makes much more sense now. http://www.cplusplus.com/reference/string/string/npos.html

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...