TL;DR
A team has developed a distributed probabilistic computer featuring one million p-bits, surpassing previous hardware limits. It performs Gibbs sampling at over a trillion flips per second, with potential applications in complex optimization problems.
Researchers have built a distributed probabilistic computer with one million p-bits, breaking the capacity limits of single-chip systems. This development enables large-scale sampling and optimization tasks, such as solving complex spin glass models, with high speed and efficiency. The system networks FPGAs to operate as a unified Ising machine, a significant step forward in hardware-based probabilistic computing.
The new architecture involves networking FPGAs into a single, scalable probabilistic platform that maintains all coupling weights in local on-chip memory. It performs Gibbs sampling at over one trillion flips per second, a speed that surpasses previous systems confined to single chips. Communication between devices involves exchanging only 1-bit boundary states, making the system highly efficient.
The team tested the machine on three-dimensional Edwards-Anderson spin glasses, demonstrating that the performance aligns with a key timing ratio, eta = f_comm / f_p-bit. When this ratio exceeds a topology-dependent threshold, the distributed machine matches the performance of a monolithic GPU reference. Below this threshold, residual energy decays more slowly, indicating a tradeoff between throughput and accuracy. This behavior is supported by a theoretical cluster mean-field model, suggesting a universal property of partitioned stochastic dynamics.
Implications for Large-Scale Probabilistic Computing
This breakthrough makes it possible to scale probabilistic computers well beyond the limitations of single-chip hardware, opening new avenues for solving complex optimization problems, such as Max-Cut and Boolean satisfiability, more efficiently. It also provides a foundation for future hardware accelerators that can perform large-scale sampling with high speed, which could impact fields like machine learning, physics simulations, and combinatorial optimization.

Xilinx Artix-7 FPGA M.2 Development Board (A100T FPGA/512MB DDR)
Xilinx XC7A100T-L2FGG484E FPGA
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Advances in Probabilistic Hardware and Distributed Systems
Prior to this development, probabilistic computers built from p-bits were limited to single-chip implementations, constrained by memory bandwidth and capacity. The concept of hardware accelerators for sampling and optimization has been explored, but scaling has remained a challenge. This work builds on recent research into Ising machines and stochastic hardware, demonstrating that distributed architectures can overcome these limits by networking multiple FPGAs into a unified system. The approach leverages local memory and minimal boundary communication to maintain performance at scale.
“This architecture demonstrates that distributed probabilistic systems can scale to millions of p-bits while maintaining high performance.”
— an anonymous researcher

Hardware-Aware Probabilistic Machine Learning Models: Learning, Inference and Use Cases
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Unanswered Questions on Scalability and Practical Use
It is not yet clear how this architecture performs on a wider range of real-world problems beyond the initial tests on spin glasses. The long-term stability, energy efficiency, and integration into existing computing ecosystems remain to be evaluated. Additionally, the optimal boundary exchange frequency for different applications and topologies needs further exploration.

MLIR and the Modern Compiler: Building High-PerformanceDomain-Specific Languages for Hardware Accelerators. (The Bare-Metal Blueprint Series)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Next Steps for Development and Application Testing
Future work will likely focus on testing the system with a broader set of optimization problems, refining the boundary communication protocols, and exploring hardware implementations for commercial use. Researchers may also investigate how to further increase the number of p-bits and improve energy efficiency, aiming to develop scalable, hardware-accelerated probabilistic computing platforms for practical applications.

nanoDLA Logic Analyzer ARM FPGA Debugging Tool Protocol Analysis 24MHz Sampling Rate 8 Channels Open Source Sigrok PulseView (nanoDLA and 10 hooks)
nanoDLA is a hardware and software open source logic analyzer developed and produced.
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
What is a p-bit?
A p-bit (probabilistic bit) is a hardware element that can stochastically switch between states, enabling probabilistic computing and sampling for optimization problems.
How does this system compare to traditional computers?
Unlike conventional deterministic computers, this system performs large-scale probabilistic sampling at trillions of flips per second, making it suitable for specific optimization tasks rather than general-purpose computing.
Can this architecture solve real-world problems?
Initial tests show promise for solving problems like spin glasses and Max-Cut, but further validation is needed to confirm its effectiveness on practical applications.
What are the main technical challenges ahead?
Challenges include optimizing boundary communication, scaling the number of p-bits further, and integrating the system into existing hardware and software ecosystems.
When might this technology become commercially available?
It is too early to predict commercial availability; ongoing research will determine how soon these systems can be adapted for practical use.
Source: Hacker News