New Offering Delivers a Unique Fusion Architecture That’s Being Leveraged by Industry-Leading AI Pioneers Like Cohere, CoreWeave, and NVIDIA to Deliver Breakthrough Performance Gains and Reduce Infrastructure Requirements For Massive AI Training and Inference Workloads
PARIS and CAMPBELL, Calif., July 8, 2025 /PRNewswire/ — At RAISE SUMMIT 2025, WEKA unveiled NeuralMesh Axon, a groundbreaking storage system that leverages an innovative fusion architecture to tackle the core challenges of running exascale AI applications and workloads. This cutting-edge system seamlessly integrates with GPU servers and AI factories, streamlining deployments, slashing costs, and dramatically enhancing AI workload responsiveness and performance. The result? Underutilized GPU resources are transformed into a unified, high-performance infrastructure layer.
Building on WEKA’s recently announced NeuralMesh storage system, this new offering enhances its containerized microservices architecture with powerful embedded functionality. This advancement enables AI pioneers, cloud providers, and neocloud services to accelerate AI model development at an unprecedented scale, particularly when combined with NVIDIA AI Enterprise software stacks for advanced model training and inference optimization. NeuralMesh Axon also supports real-time reasoning, significantly improving time-to-first-token and overall token throughput, allowing customers to bring innovations to market at breakneck speed.
AI Infrastructure Obstacles Compound at Exascale
In the world of large language model (LLM) training and inference workloads, performance is everything, especially at extreme scale. Organizations running massive AI workloads on traditional storage architectures face a host of challenges. These systems, which rely heavily on replication, waste NVMe capacity, suffer from significant inefficiencies, and struggle with unpredictable performance and resource allocation.
The root of the problem? Traditional architectures simply weren’t built to handle the real-time processing and storage of massive data volumes. This limitation creates bottlenecks in data pipelines and AI workflows that can bring exascale AI deployments to their knees. Underutilized GPU servers and outdated data architectures turn high-end hardware into expensive paperweights, resulting in costly downtime for training workloads. Inference workloads grapple with memory-bound barriers, including key-value (KV) caches and hot data, leading to reduced throughput and increased strain on infrastructure. Limited KV cache offload capacity creates data access bottlenecks and complicates resource allocation for incoming prompts, directly impacting operational costs and time-to-insight. While many organizations are turning to NVIDIA accelerated compute servers paired with NVIDIA AI Enterprise software to address these challenges, they still face significant limitations in pipeline efficiency and overall GPU utilization without modern storage integration.
Built For The World’s Largest and Most Demanding Accelerated Compute Environments
NeuralMesh Axon tackles these challenges head-on with its high-performance, resilient storage fabric that integrates directly into accelerated compute servers. By leveraging local NVMe, spare CPU cores, and existing network infrastructure, it creates a unified, software-defined compute and storage layer. This innovative approach delivers consistent microsecond latency for both local and remote workloads, outpacing traditional local protocols like NFS.
Moreover, when harnessing WEKA’s Augmented Memory Grid capability, it can provide near-memory speeds for KV cache loads at massive scale. Unlike replication-heavy approaches that waste aggregate capacity and crumble under failures, NeuralMesh Axon’s unique erasure coding design can withstand up to four simultaneous node losses, maintain full throughput during rebuilds, and enable predefined resource allocation across existing NVMe, CPU cores, and networking resources. This breakthrough transforms isolated disks into a memory-like storage pool at exascale and beyond, while ensuring consistent low-latency access to all addressable data.
Cloud service providers and AI innovators operating at exascale need infrastructure solutions that can keep pace with the exponential growth in model complexity and dataset sizes. NeuralMesh Axon is tailor-made for organizations at the cutting edge of AI innovation that require immediate, extreme-scale performance rather than gradual scaling. This includes AI cloud providers and neoclouds building AI services, regional AI factories, major cloud providers developing enterprise AI solutions, and large corporations deploying the most demanding AI inference and training solutions that must nimbly scale and optimize their AI infrastructure investments to support rapid innovation cycles.
Delivering Game-Changing Performance for Accelerated AI Innovation
Early adopters, including Cohere, the industry’s leading security-first enterprise AI company, are already reaping transformative benefits.
Cohere is among the first customers to deploy NeuralMesh Axon to power its AI model training and inference workloads. Faced with sky-high innovation costs, data transfer bottlenecks, and underutilized GPUs, Cohere initially deployed NeuralMesh Axon in the public cloud to unify its AI stack and streamline operations.
\”For AI model builders, speed, GPU optimization, and cost-efficiency are mission-critical. It’s about using less hardware, generating more tokens, and running more models—without capacity constraints or data migration headaches,\” said Autumn Moulder, vice president of engineering at Cohere. \”By embedding WEKA’s NeuralMesh Axon into our GPU servers, we’ve maximized utilization and accelerated every step of our AI pipelines. The performance gains have been nothing short of revolutionary: Inference deployments that once took five minutes now happen in just 15 seconds, with checkpointing that’s 10 times faster. This breakthrough allows our team to iterate on and bring groundbreaking new AI models, like North, to market at an unprecedented pace.\”
To enhance training and support the development of North, Cohere’s secure AI agents platform, the company is deploying WEKA’s NeuralMesh Axon on CoreWeave Cloud. This move creates a robust foundation for real-time reasoning and delivers exceptional experiences for Cohere’s end customers.
\”We’re entering an era where AI advancement goes beyond raw compute power—it’s unleashed by intelligent infrastructure design,\” said Peter Salanki, CTO and co-founder at CoreWeave. \”CoreWeave is redefining the possibilities for AI pioneers by eliminating the complexities that constrain AI at scale. With WEKA’s NeuralMesh Axon seamlessly integrated into CoreWeave’s AI cloud infrastructure, we’re bringing processing power directly to the data, achieving microsecond latencies that slash I/O wait times and deliver more than 30 GB/s read, 12 GB/s write, and 1 million IOPS to a single GPU server. This groundbreaking approach boosts GPU utilization and empowers Cohere with the performance foundation they need to shatter inference speed barriers and deliver cutting-edge AI solutions to their customers.\”
\”AI factories are shaping the future of AI infrastructure built on NVIDIA accelerated compute and our ecosystem of NVIDIA Cloud Partners,\” said Marc Hamilton, vice president of solutions architecture and engineering at NVIDIA. \”By optimizing inference at scale and embedding ultra-low latency NVMe storage close to the GPUs, organizations can unlock more bandwidth and extend the available on-GPU memory for any capacity. Partner solutions like WEKA’s NeuralMesh Axon deployed with CoreWeave provide a critical foundation for accelerated inferencing while enabling next-generation AI services with exceptional performance and cost efficiency.\”
The Benefits of Fusing Storage and Compute For AI Innovation
NeuralMesh Axon delivers immediate, measurable improvements for AI builders and cloud service providers operating at exascale, including:
\”The infrastructure challenges of exascale AI are unlike anything the industry has faced before,\” said Ajay Singh, chief product officer at WEKA. \”We’re seeing organizations grapple with low GPU utilization during training and GPU overload during inference, while AI costs spiral into millions per model and agent. That’s why we engineered NeuralMesh Axon, born from our deep focus on optimizing every layer of AI infrastructure from the GPU up. Now, AI-first organizations can achieve the performance and cost efficiency required for competitive AI innovation when running at exascale and beyond.\”
Availability
NeuralMesh Axon is currently available in limited release for large-scale enterprise AI and neocloud customers, with general availability scheduled for fall 2025. For more information, visit:
About WEKA
WEKA is transforming how organizations build, run, and scale AI workflows through NeuralMesh™, its intelligent, adaptive mesh storage system. Unlike traditional data infrastructure, which becomes more fragile as AI environments expand, NeuralMesh becomes faster, stronger, and more efficient as it scales, growing with your AI environment to provide a flexible foundation for enterprise and agentic AI innovation. Trusted by 30% of the Fortune 50 and the world’s leading neoclouds and AI innovators, NeuralMesh maximizes GPU utilization, accelerates time to first token, and lowers the cost of AI innovation. Learn more at www.weka.io, or connect with us on LinkedIn and X.
WEKA and the W logo are registered trademarks of WekaIO, Inc. Other trade names herein may be trademarks of their respective owners.
Most Commented