2016+ CPU/GPU News

MoroseTroll · August 18, 2016

The idea behind Bulldozer was sound. Moving Floating Point operations to the GPU is what should be done
...
If you remember the old Nvidia AMD motherboards, (nForce) that was one of the things it did (reduced CPU workload
by intercepting instructions at the chipset and pre-processing them seamlessly... No "evangelize the coders and rewrite" needed. ).

I'm afraid you're confused a bit. Yes, GPUs are fantastically fast, in terms of calculations, but they have a different and very difficult-to-learn paradigm. Yes, you can make GPUs work for you, but you'll have to learn OpenCL/CUDA/DirectCompute or create a very specific shaders in GLSL/HLSL. Yes, there some specific frameworks exist that can make your program run on both CPU and GPU, but those ones are so named JIT (just-in-time) translators, not compilers. Also, GPU, even on the same die with CPU, has a very limited connection speed with CPU, so even if you write a code that can use GPU at full speed, you'll have a lot of lags when you try to move a chunk of information from CPU to GPU and vice versa. Actually, Direct3D 12 and Vulkan (did you see how amazingly it works in Doom 4?) can reduce these lags almost to zero, but again, you'll have to 1) learn these APIs, 2) persuade TDM gamers to migrate to DX12/Vulkan-capable graphics cards (i.e. GeForce 600+ and Radeon HD 7700+).

About the nVidia's nForce DASP: it was just a cache prefetch system, no more. There were no instruction interceptions but just a several tables with memory addresses processed by smart enough analyzer inside the chipset. That's it. Every modern CPU already has something inside.

nbohr1more · August 18, 2016

Shouldn't the instruction decode be able to hardcode the API translation needed to feed CPU floating point requests to the GPU?

I get that application designers need to rewrite to get around the quirks of hardware access via the host OS but if you are in control

of the actual hardware, actual instruction code, CPU driver stack, Chipset driver stack, and GPU driver stack, you're telling me that

you cannot make data routing better than someone without all that access who has to work from the outside via Vulkan or DX12?

lowenz · August 18, 2016

Zen is so much Intel "Core" inspired

AMD stated in the brief that power consumption and efficiency was constantly drilled into the engineers, and as explained in previous briefings, there ends up being a tradeoff between performance and efficiency about what can be done for a number of elements of the core (e.g. 1% performance might cost 2% efficiency)

lowenz · August 18, 2016

Fingers crossed for the Zen APU releases. Maybe someone in the engineering department will sneak this idea back into
the architecture unbeknownst to the management at AMD (and do it right this time).

LOL.

Good old romantic nbohr! The "suddenly unemployed engineer hero"

Edited August 18, 2016 by lowenz

Bikerdude · August 18, 2016

It seems they are just gonna mimic Intel and hope they can compete on price and efficiency. (Good luck.)

Fingers crossed for the Zen APU releases. Maybe someone in the engineering department will sneak this idea back into
the architecture unbeknownst to the management at AMD (and do it right this time).

Still, it looks like good news for price\performance competition in the PC market for a little while.

Does this mean the current issue where an AMD CPU is paired with an nVidia GPU which results in crap perf in IDtech4 wont be an issue with Zen? I really want the option to get away from intel (due to the shit boot issue & Management chip)

nbohr1more · August 18, 2016

Right, Zen should run IDtech4 much better than current AMD Bulldozer based CPU's. That'll be pretty cool to see.

It should also run the Wii\Gamecube emulator "Dolphin" much better and any old engine that is CPU bound ( Oblivion\Skyrim...).

MoroseTroll · August 18, 2016

Shouldn't the instruction decode be able to hardcode the API translation needed to feed CPU floating point requests to the GPU?

There are at least two things that make CPU<->GPU conversation extremely difficult. First, every GPU generation even from one vendor has its own instruction set (ISA). For example, nVidia's Pascal is not equal to Maxwell, i.e. they are similar, but not the same. So if you write a low-level code for Maxwell, you'll have to adapt it for Pascal, too. And for Kepler. And for Fermi. Should I continue ? The same situation with AMD: GCN 4th gen, 3rd gen, 2nd gen, 1st gen, VLIW4, VLIW5...

Second, when you communicate with GPU, you use PCI-e interface, i.e. you fill a structure with data you need in RAM and then call the driver to transfer it to GPU. It takes thousands CPU clocks just for a simple commands (say, glBegin) and dozens and even hundreds of thousands CPU clocks to upload a texture. You know why? Because PCI-e always works on a constant frequency declared by PCI-SIG (yes, PCI-e has different versions and number of channels, but these ones don't change the situation drastically). Of course, in case of integrated GPU, the latencies could be much lower, but just in case if the vendor has created a very fast channel between CPU and GPU on the same die. I'm not sure how fast those channels are in the modern AMD and Intel APUs (CPU + GPU), but I doubt that they are way faster than PCI-e. Why would those vendors bother to invent a fast channel for the integrated graphics ?

I get that application designers need to rewrite to get around the quirks of hardware access via the host OS but if you are in control
of the actual hardware, actual instruction code, CPU driver stack, Chipset driver stack, and GPU driver stack, you're telling me that
you cannot make data routing better than someone without all that access who has to work from the outside via Vulkan or DX12?

Um... I was saying that Vulkan and DX12 are the best ways to make GPU run at full speed in current situation, with current hardware. Can Intel and/or AMD create a chip with half or totally new design, where GPU = FPU, and every program can use this feature? Yes, they can, but they never won't. The reason is simple: binary compatibility. Like I said earlier, every GPU generation has its own ISA, different from the previous ones. The differencies can be small or big, but they always are. If you freeze GPU ISA, you'll have to live with it for dozen years, which means that your rivals can optimize their GPU architectures, whereas you can't - it's a straight way to lose your market share and then Chapter 11.

Edited August 18, 2016 by MoroseTroll

nbohr1more · August 18, 2016

Those are good points but I would suggest that they should make the APU a core product so they could make that fast interconnect and it

would make their cores more competitive in Scientific GP-GPU fields. It would allow them to tier calculations depending on requirement.

Quick turnaround for branching = local floating point units vs Large dataset with lots of SIMD (etc) = dedicated GPU work.

I thought that was originally the point of Torrenza? Heterogeneous architecture?

lowenz · August 18, 2016

AMD has HSA to share resources between GPU and CPU

https://en.wikipedia.org/wiki/Heterogeneous_System_Architecture

lowenz · August 18, 2016

Um... I was saying that Vulkan and DX12 are the best ways to make GPU run at full speed in current situation, with current hardware. Can Intel and/or AMD create a chip with half or totally new design, where GPU = FPU, and every program can use this feature? Yes, they can, but they never won't. The reason is simple: binary compatibility. Like I said earlier, every GPU generation has its own ISA, different from the previous ones. The differencies can be small or big, but they always are. If you freeze GPU ISA, you'll have to live with it for dozen years, which means that your rivals can optimize their GPU architectures, whereas you can't - it's a straight way to lose your market share and then Chapter 11.

It's why LLVM - https://en.wikipedia.org/wiki/LLVM - and IR - https://en.wikipedia.org/wiki/Intermediate_representation - are the future.

At version 3.4, LLVM supports many instruction sets, including ARM, Qualcomm Hexagon, MIPS, Nvidia Parallel Thread Execution (PTX; called NVPTX in LLVM documentation), PowerPC, AMD TeraScale,^[25] AMD Graphics Core Next (GCN), SPARC, z/Architecture(called SystemZ in LLVM documentation), x86/x86-64, and XCore. Some features are not available on some platforms. Most features are present for x86/x86-64, z/Architecture, ARM, and PowerPC.

Edited August 18, 2016 by lowenz

MoroseTroll · August 19, 2016

Those are good points but I would suggest that they should make the APU a core product so they could make that fast interconnect and it
would make their cores more competitive in Scientific GP-GPU fields. It would allow them to tier calculations depending on requirement.
Quick turnaround for branching = local floating point units vs Large dataset with lots of SIMD (etc) = dedicated GPU work.

I hope some day this will happen, maybe even with Zen-based APUs, but I personally wouldn't hold my breath. Some paradigms are too tough to be spreaded all over the world.

I thought that was originally the point of Torrenza? Heterogeneous architecture?

AFAIK, Torrenza was a failure.

HSA is a very good idea, but right now, AFAIK, this technology is working on AMD hardware only, if we speak about the x86 world. Will Intel and/or nVidia ever support HSA? I'm not sure, but if they won't, then HSA will be limited by mobile market and will very rarely be used in x86 (at least in client sector, not sure about server and workstation ones).

It's why LLVM - https://en.wikipedia.org/wiki/LLVM - and IR - https://en.wikipedia.org/wiki/Intermediate_representation - are the future.

Maybe, maybe not. How much do you know universal applications written on LLVM, that can run straight out of the box on many modern OSes (Windows, MacOS, Linux, Android, iOS)? How fast do they perform on different hardware? How bug-less are they? I know - these questions perhaps are not the simple ones. Sure, LLVM is quite a cool thing, but that's just not enough to became a de facto standard. If this happens, it would be nice.

About IR: the simpler IR, the simpler the reverse engineering of it. I know, there some obfuscation technologies exist, but I think that many big companies would prefer to stay with their old-fashioned x86-code just to make sure that nobody knows their algorithms.

Edited August 19, 2016 by MoroseTroll

lowenz · August 19, 2016

I hope some day this will happen, maybe even with Zen-based APUs, but I personally wouldn't hold my breath. Some paradigms are too tough to be spreaded all over the world.
AFAIK, Torrenza was a failure.
HSA is a very good idea, but right now, AFAIK, this technology is working on AMD hardware only, if we speak about the x86 world. Will Intel and/or nVidia ever support HSA? I'm not sure, but if they won't, then HSA will be limited by mobile market and will very rarely be used in x86 (at least in client sector, not sure about server and workstation ones).

Maybe, maybe not. How much do you know universal applications written on LLVM, that can run straight out of the box on many modern OSes (Windows, MacOS, Linux, Android, iOS)? How fast do they perform on different hardware? How bug-less are they? I know - these questions perhaps are not the simple ones. Sure, LLVM is quite a cool thing, but that's just not enough to became a de facto standard. If this happens, it would be nice.
About IR: the simpler IR, the simpler the reverse engineering of it. I know, there some obfuscation technologies exist, but I think that many big companies would prefer to stay with their old-fashioned x86-code just to make sure that nobody knows their algorithms.

LLVMPipe is terrific. I can run the old Unreal in OpenGL 2.1 with the *main processor* as GPU @25/30 FPS

Edited August 19, 2016 by lowenz

jaxa · August 19, 2016

HSA is a very good idea, but right now, AFAIK, this technology is working on AMD hardware only, if we speak about the x86 world. Will Intel and/or nVidia ever support HSA? I'm not sure, but if they won't, then HSA will be limited by mobile market and will very rarely be used in x86 (at least in client sector, not sure about server and workstation ones).

Even if AMD's desktop offerings died in a fire, AMD has a lot of leverage: it makes the CPUs and GPUs in the Xbox One and PlayStation 4. This should also be the case with the upgraded mid-cycle 4K/VR versions coming within the next year or two.

MoroseTroll · August 20, 2016

LLVMPipe is terrific. I can run the old Unreal in OpenGL 2.1 with the *main processor* as GPU @25/30 FPS

Details, please .

Even if AMD's desktop offerings died in a fire, AMD has a lot of leverage: it makes the CPUs and GPUs in the Xbox One and PlayStation 4. This should also be the case with the upgraded mid-cycle 4K/VR versions coming within the next year or two.

Sure. But the current game consoles are a totally different ecosystems. While PS4 & XBO use AMD x86-CPUs and Radeon GPUs, they have a different OSes, libraries, etc. You can make your game console's architecture as weird as you want (ask anyone who coded for PS3, and you'll hear a lot swearing), but if you try to do something on the PC market, then be ready to lose.

jaxa · August 20, 2016

Details, please .

Sure. But the current game consoles are a totally different ecosystems. While PS4 & XBO use AMD x86-CPUs and Radeon GPUs, they have a different OSes, libraries, etc. You can make your game console's architecture as weird as you want (ask anyone who coded for PS3, and you'll hear a lot swearing), but if you try to do something on the PC market, then be ready to lose.

PS4 and Xbone have more in common with PC than ever before, because they are all x86 platforms.

lowenz · August 21, 2016

Details, please .

Download this package: http://download.qt.io/development_releases/qtcreator/4.1/4.1.0-rc1/installer_source/windows_vs2013_32/qtcreator.7z

Extract opengl32sw(.dll) and rename it opengl32(.dll)

Test an OpenGL 2.1 application

It's MESA implementation of OpenGL, Gallium3D: compiled to run on a x86 processor ("SoftPipe") with LLVM/Clang optimizations (->"LLVMPipe")

Edited August 21, 2016 by lowenz

MoroseTroll · August 21, 2016

PS4 and Xbone have more in common with PC than ever before, because they are all x86 platforms.

So what? The XBO's DirectX differs enough from the Windows' one. PS4's GNM & GNMX are neither DirectX, nor OpenGL, nor Vulkan. Both game consoles are not still cracked in order to run their games right under Windows PC. So it doesn't matter whether they are built upon x86 or not - they are too different from Windows.

Like I said, AMD won't risk by investing in and creating a chip that nobody else would support, because the first one has enough bitter experience in this: 3DNow!, E3DNow!, SSE4a, MaSSE, IBS, XOP, LWP, FMA4, TBM. How much do you applications that support these extensions? How much do they benefit from this support?

lowenz: Got it. Frankly, I expected something... different - say, a universal application that can run on both CPU and GPU, compiled by some LLVM-based framework.

Oldjim · August 24, 2016

Rather than start a new thread - as will have been noticed from my appalling results with Deus Ex Mankind Divided - I have decided to get a new graphics card and despite being a long time Nvidia user I have decided to go with a Radeon RX470 4GB from Gigabyte - any thoughts as to whether I have made the right decision

The things which swayed me towards Gigabyte were their excellent UK RMA system and the fact that it has a metal backplate

http://forums.hexus.net/graphics-cards/204247-graphics-card-warranties-lets-see-how-good-they-really-look.html

The warranty situation had MSI as an option but this rear side temperature rather put me off

http://www.kitguru.net/components/graphic-cards/zardon/msi-rx-470-gaming-x-8g-review/28/

Edited August 24, 2016 by Oldjim

lowenz · August 24, 2016

The temperature criterion is always a double-edged sword: is it high because the energy supply circuit is bad or because the dissipation plate is really capable of drain heat?

Edited August 24, 2016 by lowenz

Bikerdude · August 25, 2016

In DX12 / Vulkan the RX 480 is showing in various benchmarks as being faster than the 1060. But the 1060 is faster in DX11 and below), and if the trend I have seen with IDtech4 prevails then the 1060 will faster with that also.

if you do end up getting an RX 480 I will be very interested to see how TDm perfs on it, I cant only hope that OpenGL perf issue has been fixed in this gen compared to previous R9 series.

Destined · August 25, 2016

I built a new PC at the beginning of the year and got a GeForce GTX960. However, while I am pleased with the performance, I experience random freezes of the PC. Usually, it is not even a BSOD, but a complete freeze. Still, I got a crash dump file once or twice and when I googled the error code, I found out, that apparently the current version of DirectX and the latest GeForce drivers are not really compatible. At least, I found a couple of complaints about freezing computers, that are related to incompatibility of these two and strongly suspect my freezes to stem from there. So, just speaking from this experience I would say, Radeon is currently definitely the better choice.

Oldjim · August 25, 2016

Slight change of plan - this is a real RX 470 killer https://www.overclockers.co.uk/sapphire-radeon-rx-480-nitro-4096mb-gddr5-pci-express-rgb-graphics-card-gx-37d-sp.html and the RX 480 4GB knocks spots off the RX 470

Bikerdude · August 25, 2016

@Jim, found you better deal -

£199 w/free shipping - https://www.overclockers.co.uk/xfx-radeon-rx-480-rs-edition-4096mb-gddr5-pci-express-graphics-card-with-backplate-gx-23c-xf.html

That and XFX have since I last look updated the gfx card warranty terms, they now do a 3+2 yr plan. I have asked overclockers to give email me the UK RMA address detail for XFX, should you need to send the card back after 1yr has passed.

Oldjim · August 25, 2016

I may have found a similar one except I am waiting for them to come back on the warranty period as they say 24 months and the other suppliers say 36 months https://www.scan.co.uk/products/nda-4gb-sapphire-radeon-rx-480-nitroplus-14nm-polaris-pcie-30-7000mhz-gddr5-1208mhz-gpu-1306mhz-boos

I wouldn't touch XFX with a bargepole - their RMA system as previously reported in the Scan Forums is an absolutes disaster

Bikerdude · August 25, 2016

Yeah, that why I asked overclockers for that info. And on the subject of warranties its why I have always paid a bit extra and gone for Msi or Gigabyte.

2016+ CPU/GPU News

Recommended Posts

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

lowenz

jaxa

jaxa

Posted Images

Join the conversation

Recent Status Updates