Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

TL;DR

NanoEuler is a research project that builds a GPT-2-sized language model entirely from scratch in C and CUDA. It trains a 116M-parameter model on a single consumer GPU, focusing on transparent, from-scratch engineering rather than practical AI capabilities.

A developer has released NanoEuler, a GPT-2-scale language model built entirely from scratch in C and CUDA, without relying on external ML libraries or frameworks. This project emphasizes transparent engineering, verified correctness, and educational value, training a 116 million-parameter model on a single consumer GPU.

NanoEuler’s codebase includes a hand-written tokenizer, a complete training pipeline, and custom CUDA kernels for matrix multiplication and attention. The model architecture is a decoder-only transformer with modern components like RMSNorm, rotary position embeddings, SwiGLU feed-forward, and grouped-query attention. It trains on a mixture of books and web data, demonstrating fluent but shallow English output, with no real-world knowledge.

The project features rigorous gradient verification, comparing analytic gradients against finite differences in double precision, confirming the correctness of the backpropagation implementation. It runs on CPU for small models and on GPU for larger models, with training times of a few hours for the small CPU model and longer for the GPU version. The project is explicitly designed for research and educational purposes, not practical AI deployment.

At a glance

reportWhen: announced April 2024

The developmentThe project introduces a fully from-scratch implementation of a GPT-2 scale language model using C and CUDA, verified through detailed gradient checks.

Implications for AI Development and Education

By building a language model entirely from scratch in C and CUDA, NanoEuler demonstrates the feasibility of highly transparent and controllable AI development. It provides a valuable resource for researchers and students aiming to understand the inner workings of transformers and training pipelines without relying on opaque libraries like PyTorch or TensorFlow. While the current model’s capabilities are limited, the project showcases how fundamental components can be implemented and verified independently, fostering deeper understanding and potential innovation in AI engineering.

Yahboom Jetson AGX Thor Developer Board 128GB 2070 TFLOPS AI Large Model Voice Module, USB 3.0 HUB, 15.6in Display, USB Camera

【AI Performance for Edge Computing】 Powered by N-VIDI-A Jetson AGX Thor module with 128GB memory and 2070 TFLOPS…

As an affiliate, we earn on qualifying purchases.

Background on From-Scratch AI Model Projects

Recent years have seen widespread adoption of large language models built using high-level frameworks, often obscuring the underlying implementation details. This project stands out by intentionally avoiding such dependencies, instead opting for a fully from-scratch approach. The developer notes that previous efforts in this space have focused on scaling or fine-tuning existing models, but NanoEuler aims to provide a complete, verified pipeline for training a transformer from first principles, emphasizing correctness and educational value. The project also draws inspiration from neural ODEs and residual networks, framing the model as a discretized differential equation.

“Our goal was to own every piece of the training pipeline, from tokenization to CUDA kernels, ensuring transparency and correctness.”
— the project creator

C Programming Language, 2nd Edition

As an affiliate, we earn on qualifying purchases.

Unverified Capabilities and Future Potential

While the project confirms the correctness of the implementation and the ability to train a GPT-2-like model from scratch, it remains unclear how well the model performs beyond basic language generation. The output is fluent but shallow, lacking real-world knowledge or robustness. It is also uncertain whether further scaling or optimization could significantly improve its capabilities or efficiency, as the current focus is on correctness and transparency rather than performance.

Machine Learning Production Systems: Engineering Machine Learning Models and Pipelines

As an affiliate, we earn on qualifying purchases.

Next Steps in Development and Community Engagement

The developer plans to extend the training data, improve the model’s fluency, and experiment with fine-tuning for specific tasks. They also intend to enhance the transparency of each component, potentially creating educational resources or tutorials based on the project. Community feedback and collaboration could accelerate development, as the project is openly shared for research and educational use. Further verification and benchmarking against existing models are expected in upcoming updates.

Small Detachable Transformer Model – Educational Physics Demonstration Equipment for Students

Understand the principles of transformers with this detachable model, illustrating the relationship between voltage, current, and coil turns.

As an affiliate, we earn on qualifying purchases.

Key Questions

Can NanoEuler be used for practical AI applications?

Currently, NanoEuler is a research and educational project. Its small size and shallow knowledge limit practical use, but it demonstrates foundational engineering principles.

What are the main technical challenges in building from scratch?

Implementing correct backpropagation, efficient CUDA kernels, and a reliable training pipeline without external libraries are significant challenges addressed by the project.

How does the model compare to commercial GPT-2 implementations?

While similar in architecture, NanoEuler’s model is smaller, less capable, and primarily for educational purposes. It does not match the performance or knowledge of optimized commercial models.

Is the codebase available for public use?

Yes, the project is open-source and available publicly, encouraging transparency and community involvement.

What are the long-term goals of this project?

The main goal is to own and understand every component of a transformer-based language model, paving the way for more transparent and controllable AI development.

Source: Hacker News

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

Up next

The US Used to Demand the Best Tech. Now We Ban It

Author

Tech Trend Trove Team

Share article

Implications for AI Development and Education

Yahboom Jetson AGX Thor Developer Board 128GB 2070 TFLOPS AI Large Model Voice Module, USB 3.0 HUB, 15.6in Display, USB Camera

Background on From-Scratch AI Model Projects

C Programming Language, 2nd Edition

Unverified Capabilities and Future Potential

Machine Learning Production Systems: Engineering Machine Learning Models and Pipelines

Next Steps in Development and Community Engagement

Small Detachable Transformer Model – Educational Physics Demonstration Equipment for Students

Key Questions

Can NanoEuler be used for practical AI applications?

What are the main technical challenges in building from scratch?

How does the model compare to commercial GPT-2 implementations?

Is the codebase available for public use?

What are the long-term goals of this project?

Intel hires former SK hynix chief Seok-Hee Lee to lead Intel Foundry advanced packaging — company establishing section as ‘focused business with dedicated leadership’

Disk Is the Contract: Inside Threlmark’s Local-First Architecture

Stenvrik: News as Geography

The Safety Card, Played From Every Side: David Sacks, Anthropic, and the Fable Standoff

Knowledge Distillation of Black-Box Large Language Models

Bending Spoons IPO Spotlights Scavenger Hunt

PNY’s Performance 32GB DDR5-5600 RAM becomes the cheapest 2x16GB kit— DDR5 kit gets a $70 discount

How to Build a Sim Racing Setup in the Right Upgrade Order

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

Up next

Author

Tech Trend Trove Team

Share article

Implications for AI Development and Education

Yahboom Jetson AGX Thor Developer Board 128GB 2070 TFLOPS AI Large Model Voice Module, USB 3.0 HUB, 15.6in Display, USB Camera

Background on From-Scratch AI Model Projects

C Programming Language, 2nd Edition

Unverified Capabilities and Future Potential

Machine Learning Production Systems: Engineering Machine Learning Models and Pipelines

Next Steps in Development and Community Engagement

Small Detachable Transformer Model – Educational Physics Demonstration Equipment for Students

Key Questions

Can NanoEuler be used for practical AI applications?

What are the main technical challenges in building from scratch?

How does the model compare to commercial GPT-2 implementations?

Is the codebase available for public use?

What are the long-term goals of this project?

You May Also Like