Digital Lifestyle · 2018-09-05

New GPU architecture from NVIDIA promises faster ray tracing & memory support

This article was 1st published on our sister Site, Digital World Native.

GPU architecture

Olichel / Pixabay

NVIDIA recently announced its next-gen Turing GPU architecture at the latest SIGGRAPH industry event. The new features will be incorporated into high-end chips & boards later this year & are aimed at Content creators & professional visualization pros.

Its main innovations include support for hybrid rendering for real time ray tracing & faster “rasterization” as well as much faster memory throughput by supporting next-gen GDDR6.

This translates into greatly improved performance numbers in upcoming GPU chips and graphics cards, although initially the benefits will not be seen by consumers & gamers. The ray tracing speedup (a 25x improvement to 10 Gigay rays per second) means that real-time ray tracing during game play and video playback will be a reality, as game developers are already working on support.

This means that pre-rendering will eventually no longer be necessary for CG-heavy film cuts, intros for games, etc. NVIDIA now has the hardware architecture to go along with an API & game engines are catching up, to produce the next generation of eye-catching game titles.

The support for GDDR6 memory from Samsung & others translates into an aggregated bandwidth increase of 40% over current GDDR5X, for up to 16Gbps per pin of memory bandwidth, or a projected total bandwidth of 672 Gbps on the highest end NVIDIA Quadro RTX 8000.

The new chips will likely be incorporated into 2-way GPU configurations on the upcoming Quadro cards with 2 NVLinks per board. VirtualLink will also be supported, which is a much faster alternative to USB type-C & is intended for fast links to VR headsets. The new standard supports a single cable that gives at a 10Gbps data rate, 15W of power & 4-lane DisplayPort HBR3 video.

NVIDIA’s new chip architecture includes 4608 CUDA cores & 576 tensor cores, with tensor core performance of 125 TFlops at FP16 precision mode, 250 TOPS INT8 & 500 TOPS INT4 precisions. This results in a 6X performance boost over the previous architecture (Pascal).

Finally, there’s also support for deep learning & neural networks for the tensor cores via an SDK that will help integrate image processing & neural networks.


 

Click here to opt-out of Google Analytics