NVIDIA's CCCL 3.1 introduces three determinism levels for parallel reductions, letting developers trade performance for reproducibility in GPU computations. (ReadNVIDIA's CCCL 3.1 introduces three determinism levels for parallel reductions, letting developers trade performance for reproducibility in GPU computations. (Read

NVIDIA CCCL 3.1 Adds Floating-Point Determinism Controls for GPU Computing

2026/03/06 01:46
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

NVIDIA CCCL 3.1 Adds Floating-Point Determinism Controls for GPU Computing

Caroline Bishop Mar 05, 2026 17:46

NVIDIA's CCCL 3.1 introduces three determinism levels for parallel reductions, letting developers trade performance for reproducibility in GPU computations.

NVIDIA CCCL 3.1 Adds Floating-Point Determinism Controls for GPU Computing

NVIDIA has rolled out determinism controls in CUDA Core Compute Libraries (CCCL) 3.1, addressing a persistent headache in parallel GPU computing: getting identical results from floating-point operations across multiple runs and different hardware.

The update introduces three configurable determinism levels through CUB's new single-phase API, giving developers explicit control over the reproducibility-versus-performance tradeoff that's plagued GPU applications for years.

Why Floating-Point Determinism Matters

Here's the problem: floating-point addition isn't strictly associative. Due to rounding at finite precision, (a + b) + c doesn't always equal a + (b + c). When parallel threads combine values in unpredictable orders, you get slightly different results each run. For many applications—financial modeling, scientific simulations, blockchain computations, machine learning training—this inconsistency creates real problems.

The new API lets developers specify exactly how much reproducibility they need through three modes:

Not-guaranteed determinism prioritizes raw speed. It uses atomic operations that execute in whatever order threads happen to run, completing reductions in a single kernel launch. Results may vary slightly between runs, but for applications where approximate answers suffice, the performance gains are substantial—particularly on smaller input arrays where kernel launch overhead dominates.

Run-to-run determinism (the default) guarantees identical outputs when using the same input, kernel configuration, and GPU. NVIDIA achieves this by structuring reductions as fixed hierarchical trees rather than relying on atomics. Elements combine within threads first, then across warps via shuffle instructions, then across blocks using shared memory, with a second kernel aggregating final results.

GPU-to-GPU determinism provides the strictest reproducibility, ensuring identical results across different NVIDIA GPUs. The implementation uses a Reproducible Floating-point Accumulator (RFA) that groups input values into fixed exponent ranges—defaulting to three bins—to counter non-associativity issues that arise when adding numbers with different magnitudes.

Performance Trade-offs

NVIDIA's benchmarks on H200 GPUs quantify the cost of reproducibility. GPU-to-GPU determinism increases execution time by 20% to 30% for large problem sizes compared to the relaxed mode. Run-to-run determinism sits between the two extremes.

The three-bin RFA configuration offers what NVIDIA calls an "optimal default" balancing accuracy and speed. More bins improve numerical precision but add intermediate summations that slow execution.

Implementation Details

Developers access the new controls through cuda::execution::require(), which constructs an execution environment object passed to reduction functions. The syntax is straightforward—set determinism to not_guaranteed, run_to_run, or gpu_to_gpu depending on requirements.

The feature only works with CUB's single-phase API; the older two-phase API doesn't accept execution environments.

Broader Implications

Cross-platform floating-point reproducibility has been a known challenge in high-performance computing and blockchain applications, where different compilers, optimization flags, and hardware architectures can produce divergent results from mathematically identical operations. NVIDIA's approach of explicitly exposing determinism as a configurable parameter rather than hiding implementation details represents a pragmatic solution.

The company plans to extend determinism controls beyond reductions to additional parallel primitives. Developers can track progress and request specific algorithms through NVIDIA's GitHub repository, where an open issue tracks the expanded determinism roadmap.

Image source: Shutterstock
  • nvidia
  • gpu computing
  • cccl
  • floating-point determinism
  • cuda
Market Opportunity
Ucan fix life in1day Logo
Ucan fix life in1day Price(1)
$0.0004869
$0.0004869$0.0004869
-6.63%
USD
Ucan fix life in1day (1) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Tags:

You May Also Like

Vitalik Buterin to Ethereum Developers: Build It Like It Has to Last Without You

Vitalik Buterin to Ethereum Developers: Build It Like It Has to Last Without You

Key Takeaways Vitalik Buterin wants Ethereum apps built to survive without developers, corporate servers, or trusted third parties Two major […] The post Vitalik
Share
Coindoo2026/03/07 15:49
Non-Opioid Painkillers Have Struggled–Cannabis Drugs Might Be The Solution

Non-Opioid Painkillers Have Struggled–Cannabis Drugs Might Be The Solution

The post Non-Opioid Painkillers Have Struggled–Cannabis Drugs Might Be The Solution appeared on BitcoinEthereumNews.com. In this week’s edition of InnovationRx, we look at possible pain treatments from cannabis, risks of new vaccine restrictions, virtual clinical trials at the Mayo Clinic, GSK’s $30 billion U.S. manufacturing commitment, and more. To get it in your inbox, subscribe here. Despite their addictive nature, opioids continue to be a major treatment for pain due to a lack of effective alternatives. In an effort to boost new drugs, the FDA released new guidelines for non-opioid painkillers last week. But making these drugs hasn’t been easy. Vertex Pharmaceuticals received FDA approval for its non-opioid Journavx in January, then abandoned a next generation drug after a failed clinical trial earlier this summer. Acadia similarly abandoned a promising candidate after a failed trial in 2022. One possible basis for non-opioids might be cannabis. Earlier this year, researchers at Washington University at St. Louis and Stanford published a study showing that a cannabis-derived compound successfully eased pain in mice with minimal side effects. Munich-based pharmaceutical company Vertanical is perhaps the furthest along in this quest. It is developing a cannabinoid-based extract to treat chronic pain it hopes will soon become an approved medicine, first in the European Union and eventually in the United States. The drug, currently called Ver-01, packs enough low levels of cannabinoids (including THC) to relieve pain, but not so much that patients get high. Founder Clemens Fischer, a 50-year-old medical doctor and serial pharmaceutical and supplement entrepreneur, hopes it will become the first cannabis-based painkiller prescribed by physicians and covered by insurance. Fischer founded Vertanical, with his business partner Madlena Hohlefelder, in 2017, and has invested more than $250 million of his own money in it. With a cannabis cultivation site and drug manufacturing plant in Denmark, Vertanical has successfully passed phase III clinical trials in Germany and expects…
Share
BitcoinEthereumNews2025/09/18 05:26
Short-term profit-taking pushes Bitcoin back below key $70K level – What next?

Short-term profit-taking pushes Bitcoin back below key $70K level – What next?

The post Short-term profit-taking pushes Bitcoin back below key $70K level – What next? appeared on BitcoinEthereumNews.com. Bitcoin [BTC] rallied as high as $74
Share
BitcoinEthereumNews2026/03/07 16:09