As AI workloads grow with larger, multimodal models and on-device generative and agentic AI, edge systems need more than incremental compute. They require dedicated acceleration for real-time performance, lower power, strong data privacy, and scalability. The Ara240 Discrete Neural Processing Unit (DNPU) is built to meet these edge AI demands.
As NXP’s first discrete neural processing unit (DPNU), the Ara240 offers an AI‑optimized architecture with up to 40 equivalent Tera Operations Per Second (eTOPS), large on‑chip memory and high off‑chip bandwidth In sum, purpose‑built to run advanced AI, LLMs, vision language models (VLMs), multimodal language models and next‑generation edge inference.
Whether designing industrial automation systems, autonomous robots, smart infrastructure, advanced human‑machine interface (HMI) platforms or edge servers, the Ara240 DNPU provides the performance headroom needed to run modern AI workloads directly at the edge.
Figure 1. The Ara240 DPNU enables real‑time, on‑device inference for advanced AI applications
How Ara240 DNPU is Purpose-Built for Advanced AI at the Edge
The Ara240 DNPU is architected specifically for high-demand on-device AI applications where latency, privacy and power efficiency are critical. With support for the most widely-used AI model architectures—including Convolutional Neural Networks (CNNs) , transformers, LLMs, VLMs and multimodal models—the Ara240 DNPU enables developers to bring generative and high‑performance AI to embedded and edge systems without relying on cloud compute.
Key technical capabilities of the DNPU include:
- Up to 40 equivalent TOPS (eTOPS): High throughput for complex, parallel AI workloads offloaded from host processor
- Large on chip memory + dedicated LPDDR4 interface (up to 16 GB): Supports larger models and higher bandwidth processing while not increasing contention to the host memory
- PCIe Gen4 x4 and USB 3.2 Gen1 host interfaces: Delivering flexible, high-speed integration
- Secure boot and hardware root-of-trust: Enabling secure AI pipelines and protected deployment
- Runtime support for Linux and Windows: Offering broad compatibility for edge systems
- Framework support: Including TensorFlow, PyTorch and ONNX
Designed as a scalable AI companion processor, the Ara240 DNPU brings powerful AI acceleration to systems that are relied upon to execute complex inference locally—delivering lower latency, reduced cloud cost and stronger data privacy.
Prototype Faster with the Ara240 16GB M.2 Module
To help developers evaluate the Ara240 DNPU quickly, NXP offers the Ara240 16GB M.2 Module, designed for seamless integration into any host platform with an M‑Key Peripheral Component Interconnect express (PCIe) interface.
Module highlights include:
- Up to 40 eTOPS of AI performance
- Proprietary neural network processor operating up to 900 MHz
- 16 GB low‑power double data rate (LPDDR4) memory
- M.2 2280 M‑Key form factor
- PCIe Gen4 x1/x2/x4 configurations
- Currently supported with i.MX 8M Plus and i.MX 95 applications processors
This module provides a simplifed path for evaluating Ara240 performance, accelerating proof-of-concept development and integrating high‑performance AI into existing designs. The Ara240 16 GB M.2 Module will be available from nxp.com and distributors in June 2026.
Figure 2. The Ara240 16GB M.2 Module allows developers to prototype faster with the Ara240
Better Together: Ara + i.MX
The Ara240 M.2 module can be used as an AI co‑processor with our i.MX applications processors:
This adaptability makes it easy for developers currently working with NXP MPUs to significantly scale AI performance using the Ara240 DNPU as a companion accelerator.
Partner Ecosystem Offering Compact, Scalable Ara240 Accelerator Modules
In addition to the NXP Ara240 M.2 module, our ecosystem partners are releasing their own Ara240‑based modules.
| Partner |
Product |
On-Board Memory |
Power |
Module Type |
Unique Features |
| F&S |
FS-M2-AI |
16GB |
~12W typical |
M.2 |
Made in Germany. F&S engineering support and AI workshops |
| Forlinx |
FAI-ARA240-M |
8/16GB |
~12W typical |
M.2 |
China based manufacturing. High-bandwidth architecture for large models |
| Gateworks |
GW16168 |
16GB |
~12W typical |
M.2 |
Made in USA; secure boot and root of trust |
| Geniatech |
AIM-M-K |
8/16GB |
~12W typical |
M.2 |
China based manufacturing. Seamless integration into edge servers |
These modules make it easy to evaluate the Ara240 DNPU in different thermal, mechanical and performance configurations—supporting applications such as industrial PCs, robotics systems and compact embedded edge devices. Together, these boards support a smooth development path from early evaluation through a full system design process.
Purpose-Built Software to Enable AI in the Physical World
NXP’s eIQ® Agentic AI Framework extends the eIQ AI software development environment with capabilities specifically designed to leverage its dedicated NPU acceleration at the edge. The framework enables deterministic, real-time execution of agentic AI workloads by coordinating multiple models—such as vision, language and control—while mapping inference and decision making efficiently onto hardware accelerators rather than general purpose CPUs.
By combining hardware aware model preparation, optimized orchestration and secure on-device execution, our eIQ Agentic AI Framework allows DNPUs to sustain low latency and predictable performance for autonomous and generative AI workloads, reducing cloud dependence while simplifying deployment of complex, multimodal edge-AI systems.
With the Ara240 DNPU—and a growing ecosystem of M.2 modules—developers gain:
- Scalable AI performance
- Real‑time inference capability
- Improved privacy through local processing
- Lower operational and cloud costs
- Flexibility to support evolving model architectures
Accelerate real-time, on-device AI at the edge with the Ara240—our first DNPU to deliver up to 40 eTOPS with scalable memory and bandwidth. Learn more about the Ara240 DPNU.