TL;DR

Norway’s National Library is developing a Norwegian-language large language model (LLM) using 2 petabytes of Huawei flash storage. The project aims to create a sovereign AI that understands Norwegian culture and language, with ongoing training and technical challenges.

Norway’s National Library is actively training a Norwegian-language large language model (LLM) using 2 petabytes of Huawei OceanStor Dorado flash storage, aiming to develop a sovereign AI that reflects Norwegian language, culture, and history. Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s

The project was discussed by Marius Husnes, Head of IT at the National Library, during Huawei’s ID Forum 2026 in Paris. Norway’s government tasked the library with creating a sovereign AI to preserve and represent Norwegian cultural heritage, leveraging the library’s extensive digital collection, which includes books, newspapers, web content, and multimedia, totaling around 20 petabytes of unique data.

The library has digitized its collection since 2005, accumulating a total of approximately 60 petabytes of data stored across multiple media types, including optical disks and tapes. Husnes explained that the main challenge is not compute power but data quality, cleaning, and pipeline throughput. The data pipeline involves ingestion, cleaning, deduplication, normalization, and validation, with storage infrastructure comprising Huawei OceanStor Dorado all-flash arrays for low-latency processing.

The training process utilizes Norway’s national supercomputer, Sigma2 Olivia, equipped with 448 GPUs and over 64,000 CPU cores, connected to a 5.3 petabyte Cray storage system. Husnes highlighted difficulties in moving data from the large, cost-optimized preservation system to the AI pipeline, which requires high throughput and low latency. The project is still in progress, with ongoing efforts to develop evaluation tools, governance policies, and system orchestration.

Why It Matters

This project demonstrates the increasing role of Huawei’s flash storage solutions in European AI infrastructure, especially for small nations seeking to build sovereign AI models. It highlights the technical and governance challenges involved in developing language-specific LLMs, which are crucial for preserving cultural identity and ensuring national autonomy in AI development.

For countries with less dominant languages, creating a local LLM ensures better representation of local history, culture, and news, addressing limitations of globally trained, English-centric models. Norway’s initiative may serve as a blueprint for other non-English-speaking nations aiming for sovereignty in AI technology.

Amazon

Huawei OceanStor Dorado all-flash storage array

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Norway’s effort is part of a broader global trend where smaller nations seek to develop localized AI models to preserve cultural identity and ensure control over their data and AI applications. The project builds on Norway’s extensive digital archives, digitized since 2005, and responds to the lack of commercial solutions tailored for Norwegian language and culture. The use of Huawei storage solutions indicates significant involvement of Chinese technology in European AI infrastructure, amidst geopolitical considerations.

“No private company has this,”

— Marius Husnes, Head of IT at the Norwegian National Library

“The bottleneck is not compute; it’s data quality, cleaning, and pipeline throughput.”

— Husnes

“We are still learning about evaluation, governance, and orchestration,”

— Husnes

INFINIBAND FOR HIGH-PERFORMANCE COMPUTING AND AI CLUSTERS: Configure RDMA networking, optimize GPU interconnects, and build low-latency infrastructure for distributed training and HPC workload

INFINIBAND FOR HIGH-PERFORMANCE COMPUTING AND AI CLUSTERS: Configure RDMA networking, optimize GPU interconnects, and build low-latency infrastructure for distributed training and HPC workload

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It is not yet clear how effective the Norwegian LLM will be in real-world applications or how governance and access policies will be finalized. The development of evaluation tools and standards for such a language-specific model remains ongoing, and the project’s long-term impact is still uncertain.

QNAP TL-R1600PES-RP-US 16 Bay 3U Short Depth rackmount PCIe Interface SATA JBOD for petabyte-Scale Expansion (Diskless)

QNAP TL-R1600PES-RP-US 16 Bay 3U Short Depth rackmount PCIe Interface SATA JBOD for petabyte-Scale Expansion (Diskless)

Mini-SAS HD (SFF-8644) 1 x 2 (in, out)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

The next steps include completing the training process, developing evaluation and governance frameworks, and integrating the LLM into practical applications. Norway plans to refine its models and policies, with potential public deployment once these challenges are addressed. Monitoring how the model performs and is adopted will be key.

HHCJ6 Dell NVIDIA Tesla K80 24GB GDDR5 PCI-E 3.0 Server GPU Accelerator (Renewed)

HHCJ6 Dell NVIDIA Tesla K80 24GB GDDR5 PCI-E 3.0 Server GPU Accelerator (Renewed)

Dell Nvidia Tesla K80 GPU (Nvidia Part Number: 900-22080-0000-000)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is Norway developing its own language model?

Norway aims to create a sovereign AI that accurately reflects its language, culture, and history, which is not fully possible with globally trained, English-centric models.

What role does Huawei storage play in this project?

Huawei’s OceanStor Dorado all-flash arrays provide high-performance, low-latency storage crucial for managing and processing the large-scale datasets used in training the Norwegian LLM.

What are the main technical challenges faced?

The primary challenges involve data quality, pipeline throughput, and efficiently moving PB-scale datasets from archival storage to AI training environments.

When will the Norwegian LLM be ready for deployment?

The project is still ongoing; no specific deployment date has been announced. Completion depends on resolving evaluation, governance, and technical issues.

Could this approach be adopted by other countries?

Yes, especially for small or non-English-speaking nations seeking to preserve their language and culture through AI, though technical and governance challenges will vary by context.

Source: Hacker News

You May Also Like

Microsoft open-sources “the earliest DOS source code discovered to date”

Microsoft has open-sourced the earliest DOS source code to date, including 86-DOS and early utilities, offering new insights into the OS’s origins.

Improving C# Memory Safety

C# is set to introduce a new memory safety model in .NET 16, redesigning the unsafe keyword to improve code review and security. Preview in .NET 11, full release in .NET 12.

Silk: Open-source cooperative fiber scheduler

Silk is a new open-source fiber scheduler for Linux featuring per-CPU threads, io_uring integration, and topology-aware work-stealing, enhancing concurrency and performance.

The Second Reckoning Over AI Writing

Author Steven Rosenbaum attributes fake quotes in his book to AI errors, highlighting growing concerns over AI’s role in writing and authenticity.