So many misconceptions here, I guess I should say something.
Yes, 32-bit is perfectly OK in general.
There is limitation of 2GB of memory in 32-bit mode. And that's a hard limit: a process cannot use more than 2 GB even if you have 16GB RAM physically installed and 32GB pagefile for swap. When trying to allocate more, process will simply crash on 32-bit platform.
If your workload fits well into this limit, then there is no harm from it. If it does not, then it becomes an unavoidable blocker. If you have a very big map, you simply cannot dmap it on 32-bit, but if your map is small/medium size, it does not matter for you if executable is 32-bit or 64-bit.
Speaking of variables in code and 64-bit mode switch, some of them stay the same size, but some of then switch from 32-bit to 64-bit.
All the pointers inevitably become two times larger, and all sizes in STL types do the same. Also, 64-bit process has more overhead for heap allocations. It means that the process consumes more memory in 64-bit, and consequently can work slower since the cache size does not change. You can probably see increased memory consumption yourself: take a big brushy map, and dmap it both in 32-bit and in 64-bit modes, looking at memory consumption in process explorer.
While this point is not very strong for native applications and games, it is a major problem for managed languages like Java, where pointers are everywhere. For instance, modern JVM still uses 32-bit pointers internally, as long as you don't request for more than 32 GB of heap memory. They say it is 30% faster
64-bit mode does not offer any new capabilities except for more-than-2GB memory.
There are many other differences between 32-bit and 64-bit, but they are all about performance.
64-bit mode does not offer better precision. Floating point numbers are 32-bit and 64-bit both on 32-bit and 64-bit, working with same performance. In fact, 32-bit mode also offers 80-bit floating point numbers. Moreover, before something like TDM 2.06, this awful 80-bit arithmetic was used everywhere for intermediate values when computing complicated expressions. We got many precision problems when we disabled it, and we are still fixing them (e.g. there is a bunch of such fixes for dmap in upcoming 2.08). Now 80-bit floats are disabled both for 32-bit and 64-bit builds of TDM.
So it turns out that 32-bit mode offers more floating point precision, although these 80-bit numbers cannot be used universally in Visual Studio anyway.
(assembly geeks can also notice that 32-bit mode without 80-bit arithmetic has a tiny bit of overhead for passing floats into functions, due to old calling convention)
Speaking of integers, both 32-bit and 64-bit modes offer integers of size up to 64-bit. The 64-bit integers are emulated on 32-bit platform and are pretty slow, but TDM never uses them. 64-bit GCC also offers builtin 128-bit integer type, but Visual Studio does not, and so TDM does not use them. Anyway, I can hardly imagine what we could use them for
(assembly geeks can also notice that using 32-bit integers in 64-bit mode occasionally have a bit of overhead, due to some additional extending instructions)
And of course bitness of executable does not affect GPU
The main performance advantage of 64-bit mode is having 2x more registers in CPU: both GP registers and SSE/AVX registers.
When compiler goes out of registers in a tight loop, it has to "spill" into memory. This memory is surely in L1 cache, but L1 cache accesses are still slower than registers, plus more dependencies between instructions.
In TDM, there are many computationally heavy routines done in SSE/AVX (aka SIMD), which we had to rewrite for 64-bit mode. Because the original implementation by ID was written in inline assembly, and 1) you cannot use 32-bit assembly on 64-bit build, and 2) VC does not allow inline assembly at all in 64-bit build. So there are two completely different implementations of SIMD routines now: the first one (by ID) is used in 32-bit mode, and the second one (by us) is used in 64-bit mode. As far as I remember, the second one is often pretty low on registers, so if we start using it in 32-bit mode, it will work a bit slower. Some of the new routines written by us are already used in 32-bit mode, e.g. AVX code.
Quite unlikely I would say.
Only that 32-bit build will start working slower than 64-bit build. People with low-end machines should use 64-bit build, so it's not a big problem.
As you see, performance difference is a very complicated question.
Out of all I wrote, I think the most difference comes from:
64-bit build wastes more memory (pointers, heap overhead)
64-bit build has more registers
64-bit and 32-bit builds have different SIMD implementations
To be honest, no idea which build wins now.