TL;DR
A developer built a Linux kernel module that makes USB4/Thunderbolt ports on consumer AMD mini PCs emulate InfiniBand devices. This allows fast, low-latency RDMA communication for AI workloads without enterprise gear.
A developer has created an experimental Linux kernel module that enables USB4/Thunderbolt ports on AMD mini PCs to emulate InfiniBand devices, allowing high-speed RDMA communication for AI workloads at home.
The project involves a custom kernel module that tricks consumer USB4/Thunderbolt ports into functioning as InfiniBand devices, enabling direct Remote Direct Memory Access (RDMA) between two mini PCs. The developer reports achieving around 95 Gb/s bidirectional raw RDMA throughput with approximately 7 microseconds one-way latency, enabling AI workloads such as tensor parallel inference and Fully Sharded Data Parallel (FSDP) training to run across multiple consumer boxes without enterprise networking gear.
Tests involved two AMD mini PCs equipped with experimental RDMA-over-USB4, with results showing significant performance improvements over traditional Ethernet and soft-RoCE configurations. For example, a 27-billion-parameter LoRA FSDP step was reduced from 1,359 seconds over Ethernet to 126 seconds over USB4 RDMA. The setup achieved about 48 Gb/s per direction, with latency notably lower than Ethernet-based alternatives.
The project is experimental, with the developer emphasizing that the code is research-grade, largely AI-generated, and loaded onto hardware that may crash. There is no support or warranty, and the software is not intended for production use.
Why It Matters
This development could democratize high-performance AI training and inference by enabling consumer-grade hardware to perform at levels previously limited to enterprise networks. It offers a potential cost-effective alternative for researchers and hobbyists seeking to run large-scale AI workloads at home, bypassing the need for expensive networking infrastructure. It offers a potential cost-effective alternative for researchers and hobbyists seeking to run large-scale AI workloads at home, bypassing the need for expensive networking infrastructure.
While still in early stages, this project demonstrates the possibility of leveraging consumer hardware for advanced data center-like networking, which could influence future hardware and software designs for AI research and deployment.
Thunderbolt 4 USB4 RDMA high-speed networking
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Background
InfiniBand is a high-speed, low-latency networking technology predominantly used in data centers for AI, HPC, and enterprise workloads. While traditionally used in enterprise settings, recent advancements aim to bring similar performance to consumer hardware. Traditionally, achieving similar performance at home has required expensive hardware and complex configurations. Recent advancements in USB4 and Thunderbolt technology have increased bandwidth capabilities, but their use for RDMA-like communication has been limited. This project builds on ongoing efforts to extend high-performance networking to consumer hardware, with previous work focusing on soft-RoCE and other software-defined solutions. The developer’s recent experiments push these boundaries further by emulating InfiniBand devices over USB4/Thunderbolt ports.
“This is experimental research code, most of it AI-generated, and it loads experimental kernel modules on machines I was willing to crash repeatedly.”
— the developer
“We built experimental RDMA-over-USB4 for 128GB Strix Halo mini PCs, enabling tensor-parallel inference and FSDP workloads across both machines.”
— the developer
mini PC with Thunderbolt 4 for AI workloads
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What Remains Unclear
It remains unclear how stable and scalable this solution is for long-term or large-scale deployment. The software is experimental, and performance may vary across different hardware configurations. Compatibility with other systems and the potential for widespread adoption are still untested.
InfiniBand over USB4 Linux kernel module
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What’s Next
The developer plans to refine the kernel module, improve stability, and explore broader hardware support. For more on smart home tech, see the best smart home security cameras. Further testing will determine whether this approach can be made more reliable and accessible for a wider audience. Future developments may include open sourcing the project and collaborating with the community to evaluate its practical viability.
high performance mini PC for AI training
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
Can this be used in production environments?
No, this is experimental research code not intended for production. It is unstable and may cause system crashes.
What hardware is required to try this?
At minimum, a consumer AMD mini PC with USB4/Thunderbolt ports and the experimental kernel module loaded. Compatibility with other hardware remains untested.
How does performance compare to traditional networking?
The developer reports about 95 Gb/s bidirectional throughput and latency around 7 microseconds, significantly outperforming Ethernet and soft-RoCE in tests.
Is this technology ready for everyday use?
No, it is still in early research stages, with many unknowns regarding stability, scalability, and compatibility.
Source: Hacker News