Arm - Edge AI and Vision Alliance

How Lenovo is scaling Level 4 autonomous robotaxis on Arm

pigzippa47 — Fri, 20 Feb 2026 09:00:53 +0000

This blog post was originally published at Arm’s website. It is reprinted here with the permission of Arm.

As L4 robotaxis shift from pilot to production, Arm offers the compute foundation needed to deliver end-to-end physical AI that scales across vehicle fleets.

After years of autonomous driving pilots and controlled trials, the automotive industry is moving toward the production-scale deployment of Level 4 (L4) robotaxis. This marks a significant moment for artificial intelligence (AI), as it moves from advising humans on recommended actions to enabling vehicles that perceive their environment, although it comes with a steep increase in technical demands.

Compared with today’s advanced L2++ vehicles, L4 systems typically require broader sensor stack, such as LiDAR, cameras and radar, which drive data processing requirements from roughly 25GB per hour to as much as 19TB per hour. This has forced a fundamental rethink of compute for physical AI.

To that effect, Lenovo has developed L4 Autonomous Driving Domain Controller AD1, a production-ready autonomous driving computing platform powered by dual Arm-based NVIDIA DRIVE AGX Thor chips. WeRide is deploying the platform in its GXR Robotaxi, which is the world’s first mass-produced L4 autonomous vehicles.

Inside Lenovo AD1

The Lenovo AD1 serves as the central brain inside the GXR Robotaxi, managing multiple functions from perception, prediction and trajectory planning, to real-time motion control and safety monitoring. The platform is designed for production-grade L4 autonomy for robotaxis and other autonomous vehicles. Supporting over 2,000 TOPS of AI capacity, it enables dense perception, prediction, and planning models to run simultaneously for faster, better decision-making on the roads.

For robotaxis, many loosely coupled electronic control units (ECUs) cannot deliver the latency, safety, or scalability L4 requires, so instead they need centralized, high-performance compute platforms. Therefore, AD1 is powered by NVIDIA DRIVE AGX Thor, a centralized car computer built on the Arm Neoverse V3AE CPU, which brings previously separate driving, parking, cockpit, and monitoring functions into one compute domain.

Efficiency, safety, and foundation for physical AI

Arm serves as the foundational compute architecture of the NVIDIA DRIVE AGX Thor platform, enabling advanced computing capabilities that power Lenovo’s AD1 platform.

Performance per watt for fleet economics: As robotaxis operate for extended hours in demanding dense urban environments, the Arm compute platform delivers server-class performance into a highly efficient power envelope, enabling large AI workloads without compromising vehicle battery or thermal design.
A safety-ready architecture: The Arm ecosystem – including functional-safety-capable technologies, toolchains, software solutions, and long-established automotive partners – supports the platforms designed to meet ASIL-D and other global safety requirements, a critical factor for long-lived commercial deployments.
A mature, scalable software ecosystem: Since Arm provides a unified architecture across cloud, edge and physical environments, it allows developers to build, optimize, and scale AI models using widely available software tools and frameworks.
A roadmap aligned with future AI workloads: As physical AI models continue to grow in size and complexity, compute efficiency and architectural stability become increasingly important. By building on Arm, automakers gain a consistent architectural foundation with a long-term roadmap and helps avoid future redesigns and keeping the compute strategy stable even as AI evolves.

The road to autonomy is being built on Arm

The deployment of Lenovo AD1 in WeRide’s GXR Robotaxis shows how physical AI in autonomous driving systems is moving beyond controlled pilots and into real, complex urban environments. As autonomous capabilities advance through L4 robotaxis and other autonomous vehicles, the industry is converging on platforms that deliver high performance, safety, and power-efficiency through a centralized architecture.

Arm sits at the core of this shift, providing the foundation that enables companies like Lenovo and WeRide to run dense AI workloads continuously, adapt to rapidly evolving models, and support fleets that must operate reliably for years. As robotaxis expand into new cities and global markets, the Arm compute platform – built for safety and engineered to meet the real-world demands of physical AI at scale – is a critical part of the road ahead.

The post How Lenovo is scaling Level 4 autonomous robotaxis on Arm appeared first on Edge AI and Vision Alliance.

The Next Platform Shift: Physical and Edge AI, Powered by Arm

pigzippa47 — Mon, 26 Jan 2026 09:00:15 +0000

This blog post was originally published at Arm’s website. It is reprinted here with the permission of Arm.

The Arm ecosystem is taking AI beyond the cloud and into the real-world

As CES 2026 opens, a common thread quickly emerges across the show floor: most of what people are seeing, touching, and experiencing is already built on Arm. Arm-based platforms power the devices and systems behind the product and technology demos, including intelligent vehicles navigating complex environments, robots interacting with humans, and immersive XR devices blending the digital and physical worlds.

These mark a broader inflection point for AI as it becomes increasingly sophisticated, moving from perception to action in the real world. As NVIDIA CEO Jensen Huang put it in his CES 2026 keynote, “the ChatGPT moment for physical AI is here.” And it’s happening on Arm.

Built for the real world: Edge-first design and proven software ecosystem

As AI moves into the physical world it must operate under real-world constraints. This next phase is defined by systems that can respond instantly, run efficiently, and operate reliably in the physical world. That transition demands compute that is designed for predictable, low-latency performance, extreme power and thermal efficiency, and continuous local inference. Just as critical, safety and security must be foundational, not layered on after deployment.

This is where edge-first platforms become essential, with Arm uniquely positioned. Arm delivers both unmatched energy efficiency and the world’s largest software developer base, making it the natural platform for building and scaling physical and edge AI systems globally. From operating systems and middleware to AI frameworks and developer tools, partners like NVIDIA and Qualcomm have developed their technologies on Arm over decades. That maturity means innovation can move faster, scale more broadly, and deploy more safely as AI transitions from digital intelligence to physical intelligence in the real world.

The next frontier: AI that moves

At CES 2026, NVIDIA outlined its vision for robotics, with on-stage demos of robots powered by its new physical AI stack. NVIDIA unveiled open robot foundation models, simulation tools, and edge hardware – including Jetson Thor that is built on Arm Neoverse – to accelerate AI that can reason, plan, and adapt in dynamic environments. Partners including Boston Dynamics, Caterpillar, LG Electronics, and NEURA Robotics showcased robots trained on NVIDIA’s full physical AI stack that leverages the Arm compute platform and deeply established software ecosystem spanning automotive, autonomous and robotics.

Qualcomm is further advancing its robotics portfolio with the new Dragonwing IQ10 robotics processor for advanced use cases like industrial robots, autonomous mobile robots (AMRs), and humanoid systems. Qualcomm’s robotics portfolio runs on the Arm compute platform, delivering energy-efficient robots and physical AI at the edge.

These robotics announcements build on pre-existing technologies pioneered across automotive, an industry that Arm has enabled for decades. Much like robots, AI systems in vehicles already sense their environment, make split-second decisions, and act safely in the physical world. As robotics evolves, it will increasingly mirror the complexity, safety requirements, and system architecture of modern vehicles. Many of the companies shaping the future of automotive will also design the robots of tomorrow, like Rivian. With the entire automotive industry already building on Arm, the transition from cars to robots is a natural one.

In automotive at CES 2026, NVIDIA debuted their Drive AV Software in the all-new Mercedes-Benz CLA. The AV stack’s in-vehicle compute and Hyperion architecture is powered by Arm Neoverse-based NVIDIA DRIVE AGX Thor. Meanwhile, Qualcomm’s Snapdragon Digital Chassis continues to expand, and is now adopted by global automakers transitioning to AI-defined vehicles. These platforms are builton Arm’s compute efficiency and consistent software ecosystem across infotainment, advanced driver assistance systems (ADAS), and in-vehicle AI.

Scaling intelligence from edge to cloud

Beyond robotics and automotive, we’re continuing to see momentum for Arm-based platforms both in the cloud and at the edge.

NVIDIA’s new Vera Rubin AI platform includes six new chips, two of which – Vera and Bluefield-4 – are built on Arm. Bluefield-4, a DPU powered by the Arm Neoverse V2-based Grace CPU, delivers up to six times the compute performance of its predecessor, transforming the DPU’s role in rack-scale inference and enabling new optimizations such as a new AI inference specific storage solution.

At the developer level, NVIDIA is pushing the frontier with powerful local AI systems. Developers can take advantage of the latest open and frontier AI models on a local deskside system, from 100-billion-parameter models on DGX Spark to 1-trillion-parameter models on DGX Station. Both platforms are powered by the Arm-based Grace Blackwell architecture, delivering petaflop-class performance and enabling seamless development that can scale from desk to data center.

On the personal computing front, the Windows on Arm AI PC portfolio is expanding into the mainstream, enabling OEMs to scale solutions to the mass market, extend battery life, and close the gap with legacy x86 systems.

Arm is the compute foundation powering CES 2026

What connects NVIDIA, Qualcomm, and a global ecosystem of innovators? Arm’s scalable, energy-efficient architecture.

CES 2026 is already demonstrating that the Arm compute platform powers data centers, robots, vehicles and countless edge devices, including:

NVIDIA’s accelerated platforms, from cloud to edge;
Qualcomm’s mobile, AI PC, XR/Wearables, and automotive systems; and
Nuro’s driverless fleets and Uber’s cloud infrastructure.

A prime example is the Nuro-Lucid-Uber partnership. Nuro’s latest driverless platform, built on the Arm Neoverse platform, enables efficient, real-time edge AI in autonomous Lucid Gravity SUVs. These vehicles, featuring NVIDIA DRIVE Thor and Arm Neoverse V3AE, deliver Level 4 autonomy with safety-critical reliability. Uber, meanwhile, is scaling on Arm-based Ampere servers to lower power use while increasing cloud density, illustrating Arm’s pivotal role from cloud to car.

Why ecosystem scale wins

CES 2026 sends a clear message: AI is now becoming embedded in the world around us. Making the physical and edge AI era a reality isn’t about individual chips or product launches; it requires full-stack ecosystem scale. This means:

Software portability across devices;
Developer familiarity and productivity;
Long product lifecycles with stable platforms; and
Standards-based innovation across industries.

The next platform shift isn’t defined by model size, but by intelligence that can operate autonomously, adapt in real time, and scale efficiently from cloud to edge. It’s about systems that are designed from day one to learn continuously, distribute decision-making, and perform within real-world constraints.

Arm provides the common compute foundation that makes this possible – trusted, scalable, and optimized for efficiency. That’s why Arm shows up everywhere at CES 2026 and wherever physical AI is taking shape.

The post The Next Platform Shift: Physical and Edge AI, Powered by Arm appeared first on Edge AI and Vision Alliance.

Arm at NeurIPS 2025: How AI Research is Shaping the Future of Intelligent Computing

pigzippa47 — Fri, 12 Dec 2025 18:40:08 +0000

This blog post was originally published at Arm’s website. It is reprinted here with the permission of Arm.

NeurIPS 2025 provided Arm with a unique opportunity to share the latest technical trends and insights with the global AI research community.

NeurIPS is one of the world’s leading AI research conferences, acting as a thriving global hub for the latest breakthroughs and discussions around machine learning (ML), deep learning, and AI research. With AI moving rapidly from models to full systems – spanning reasoning, multimodality, physical intelligence, and highly efficient training and inference – Arm’s presence at NeurIPS 2025 (2-7 December) aimed to showcase our forward-looking technologies, while exchanging the latest trends and insights with the global AI research community. For Arm, the event is not just an opportunity to highlight the strength of the Arm compute platform; it’s a two-way conversation about the future of AI.

The Arm booth at NeurIPS

The future of AI: The breakthroughs and research trends highlighted at NeurIPS

NeurIPS 2025 made one thing clear: the industry is moving toward architectural efficiency, theoretical grounding, and stability as the foundation of AI innovation. In fact, efficient compute, memory patterns, and architectural flexibility will matter more than the race to over trillion-parameters of AI compute.

A range of technical papers at the event underscored this shift, with insights around foundational AI models, generative systems, and training dynamics, to name a few. These reinforced a broader theme that emerged across NeurIPS 2025: AI’s next wave of progress will come not just from scaling, but from better understanding, smarter engineering, and tighter system-level design.

Just some of the award-winning papers showcased at the event included:

Understanding model homogeneity and the “Artificial Hivemind” effect, with one of the Best Paper award winners, ‘Artificial Hivemind’ by Jiang et al., revealing a striking pattern of intra- and inter-model homogeneity showing that models not only repeat their own ideas, but also increasingly converge on similar outputs across different architectures due to shared training data and alignment techniques. This challenges assumptions about model diversity and raises questions about the true independence of model ensembles, with the authors introducing a dataset and evaluation framework to help diagnose and mitigate these effects. This could be an important step towards more varied and robust generative AI systems.

Gated attention Improves large language model (LLM) stability and scaling, with another Best Paper, ‘Gated Attention for LLMs’ by Qiu et al., demonstrating that adding a simple, head-specific sigmoid gate to the attention mechanism can meaningfully improve training stability, mitigate activation spikes, and deliver better scaling behavior across both dense and “Mixture-of-Experts” models. The approach outperforms standard SoftMax attention in large-scale experiments spanning 400B to 3.5T tokens, and also enhances the models’ ability to handle extended context lengths. This supports a broader trend that incremental architectural refinements can meaningfully boost LLM efficiency and capability without requiring larger model sizes.

Why diffusion models generalize without memorization, with Bonnaire et al.’s Best Paper on diffusion models showing that diffusion models naturally undergo two predictable training phases: an early generalization phase independent of dataset size, and a later memorization phase that grows linearly with the amount of data. This discovery provides the clearest explanation to date for why diffusion models avoid memorizing their training sets, and establishes a rigorous foundation for designing more reliable and privacy-preserving generative systems.

Why NeurIPS matters for Arm

Arm’s focus in AI research broadly centers around enabling efficient, scalable, and trustworthy AI across cloud, edge, and physical computing. This aligns closely with future trends identified across the Arm ecosystem, covering thousands of technology partners and over 22 million developers.

Arm talk at NeurIPS

NeurIPS offered the perfect opportunity for Arm to have a range of conversations with academic and research partners, including Graphcore Research and leading academic institutions like Carnegie Mellon University (CMU). This included discussions about the AI breakthroughs that are likely to define the next generation of technologies, with several being frequently referenced:

Small language models (SLMs) that are shrinking in size while growing in compute capabilities. These are supported by breakthroughs in distillation, compression and new architectural features that enable SLMs to deliver reasoning capabilities at a fraction of the compute, while running on-device with low latency and high privacy.

World models that will transform physical AI – from robotics to autonomous machines –through allowing developers and engineers to construct rich virtual environments that can predict and simulate how their AI models and workloads will perform in the real-world before deployment. These world models are enabled by advances in video generation, diffusion-transformer hybrids, and high-fidelity simulation.

Ultra-efficient AI model training, with “reasoning per joule” emerging as a key benchmark to measure training efficiency. This is likely to lead to model distillation, like FP8 precision, becoming the standard across the industry.

The rising use of agentic AI and reinforcement learning, with AI evolving from an assistant to autonomous systems that perceive, reason, and act with limited oversight.

Engaging directly with the AI research community helps Arm to hone the research strategy as well as our future technologies, from optimizing performance and efficiency benefits to adding new features that will be important to our ecosystem in the next decade, and even beyond. Two great recent examples include Arm neural technology – which adds dedicated neural accelerators to future Arm GPUs for PC-quality, AI-powered graphics on mobile – and Arm Scalable Matrix Extension 2 (SME2) – which accelerates matrix heavy workloads that are essential for computer vision (CV) and generative AI directly on the CPU. While both are recent releases, these technologies were years in the making, with Arm’s architecture, engineering and research teams realizing their value even before the rapid acceleration of AI-based compute.

Demo walkthrough on the Arm booth

Why Arm is the right partner for the AI research community

NeurIPS 2025 signaled a shift from raw scale and power to AI that is more intelligent, efficient and responsible. Arm is committed to supporting this future direction through deep collaboration with the global research community, sharing insights, advancing state-of-the-art compute, and building the foundations for sustainable, scalable AI. Whether you’re an AI researcher, data scientist, or developer, Arm wants to talk with you and explore how we can work together to enable efficient, scalable, smarter AI for everyone.

The post Arm at NeurIPS 2025: How AI Research is Shaping the Future of Intelligent Computing appeared first on Edge AI and Vision Alliance.

The Architecture Shift Powering Next-Gen Industrial AI

pigzippa47 — Fri, 12 Dec 2025 09:00:04 +0000

This blog post was originally published at Arm’s website. It is reprinted here with the permission of Arm.

How Arm is powering the shift to flexible AI-ready, energy-efficient compute at the “Industrial Edge.”

Industrial automation is undergoing a foundational shift. From industrial PC to edge gateways and smart sensors, compute needs at the edge are changing fast. AI is moving out of the cloud and into real-world environments. Systems once designed around predictable, single-purpose workloads must now run intelligent, updateable, and secure applications in dynamic, high-mix settings.

Traditional compute architectures, long optimized for high-volume, low-variation tasks, weren’t built for this. And as industrial OEMs push for greater flexibility, scalability, and AI-readiness, the Arm architecture – built around AI acceleration, power efficiency, real-time performance, and software portability – is gaining ground.

Edge AI is redefining industrial compute

The fourth industrial revolution isn’t theoretical anymore. AI and machine learning (ML) are increasingly embedded across the industrial edge — guiding robots, managing predictive maintenance, optimizing energy use, and more. But this shift introduces new requirements:

AI inference at the edge, close to sensors and actuators;
Real-time responsiveness for control and safety;
Low power usage in thermally constrained environments; and
Secure, software-defined platforms that are easy to update and deploy.

For many OEMs, these challenges aren’t on the horizon; they’re blockers now. Today’s high-mix manufacturing environments are pushing conventional compute architectures to their limits, with three key challenges that are hindering innovation at the edge:

Power inefficiency, which restricts innovations around form factor and thermal designs;
Limited AI acceleration scalability across diverse workload profiles; and
Rigid hardware-software integration, which slows innovation at the edge.

This requires compute platforms that are more adaptable, power-efficient, and scalable, hallmarks of Arm-based systems.

Why industrial OEMs are migrating to Arm

The move to Arm-based compute platforms in industrial settings at the edge is being accelerated by the following critical drivers:

Built-in AI acceleration to support AI models running at the edge, enabling real-time decision-making in latency-sensitive environments.
Performance-per-watt leadership to support compute-intensive tasks without blowing power budgets, allowing for more compact and thermally efficient designs.
Platform scalability from microcontrollers to server-class processors on a common architecture, simplifying development across product lines and reducing time to market.
Edge-to-cloud software portability, enabling developers to train in the cloud and deploy at the edge seamlessly – minimizing code rewrites and speeding up deployment cycles.
A diverse silicon ecosystem, offering faster innovation cycles and fit-for-purpose design options across automation robotics, and industrial control systems.
Long-term software and hardware roadmap alignment, giving OEMs confidence in lifecycle support and the ability to standardize on a future-proof architecture.

The latest Armv9 architecture brings these capabilities together, from the smallest Arm Cortex-A320 CPU cores for edge AI applications to the latest Arm Neoverse cloud cores, with adoption by leading industrial silicon vendors and built-in AI acceleration instructions. This architectural consistency streamlines development, reduces maintenance complexity, and enables faster innovation, especially as AI models become software updateable.

Arm’s platform approach, combining flexible compute IP, robust software tools, and a vibrant ecosystem, is helping OEMs reduce development friction while unlocking higher levels of integration, intelligence, and control at the edge. Industrial OEMs, like Siemens, are already embracing the shift.

Listen to the conversation between Arm’s Paul Williamson and VDC Research’s Chris Rommel as they unpack how edge AI is transforming embedded systems and IoT development – covering everything from the rise of Python and Linux to the role of Arm’s ecosystem in enabling intelligent, scalable solutions.

Designing for AI at the edge

For developers and product leaders, the shift to Arm means more than just swapping CPUs. It opens new pathways for design through the following ways:

Heterogeneous compute architectures covering CPU, GPU and NPU to support embedded AI more efficiently.
Software-updateable intelligence where AI models can evolve without hardware redesign.
New form factors — fanless, rugged, or compact — enabled by thermal and power efficiencies.
Modern DevOps workflows supported by containerization and cloud-native ML tools.

With one architectural foundation from edge to cloud, developers gain flexibility, faster iteration, and reduced friction across product lifecycles. One great example is Schneider Electric’s proof-of-concept built on Arm SystemReady, demonstrating how standardized Arm platforms simplify deployment and accelerate industrial innovation.

The future is edge-native

As more intelligence moves outside the data center, compute architectures must evolve. The next generation of industrial systems won’t be defined by legacy constraints, but by application needs: AI-readiness, flexibility, and real-time performance at the edge.

The broader trend is clear: industrial compute is becoming edge-native, and increasingly, Arm-native. To learn more about how Arm is revolutionizing AI at the edge to transform industrial infrastructure, visit Arm Edge AI.

Pablo Fraile, Director, Segment Marketing, Edge AI, Arm

The post The Architecture Shift Powering Next-Gen Industrial AI appeared first on Edge AI and Vision Alliance.

Smarter Smartphone Photography: Unlocking the Power of Neural Camera Denoising with Arm SME2

pigzippa47 — Thu, 06 Nov 2025 09:00:36 +0000

This blog post was originally published at Arm’s website. It is reprinted here with the permission of Arm.

Discover how SME2 brings flexible, high-performance AI denoising to mobile photography for sharper, cleaner low-light images.

Every smartphone photographer has seen it. Images that look sharp in daylight but fall apart in dim lighting. This happens because signal-to-noise ratio (SNR) drops dramatically when sensors capture fewer photons. At 1000 lux, the signal dominates and images look clean. At 1 lux, readout noise appears as grain, color speckles, and loss of fine detail.

That is why neural camera denoising is one of the most critical and computationally demanding steps in the camera pipeline. When done well, it transforms noisy frames into sharp, vibrant captures. When done poorly, it leaves smudges and artifacts that ruin the shot.

Arm Scalable Matrix Extension 2 (SME2) advances denoising on mobile. It is a powerful new technology for CPU-based AI inference that is enabled across our new C1 CPUs. It is featured in several new flagship smartphones, see the device list.

SME2 is designed to accelerate a range of AI operations from generative AI to computer vision. It enhances the latest computational photography experiences. This brings automated image improvements for sharper, cleaner images with unprecedented speed and efficiency.

In this blog post, we explain how this happens.

Scalable Matrix Extensions for imaging innovation

Dedicated image signal processor (ISP) hardware remains highly effective for standard tasks such as denoising, demosaic, and tone mapping. However, imaging algorithms are evolving rapidly, and fixed-function blocks cannot easily adapt.

Arm Scalable Matrix Extension 2 (SME2) adds a new layer of flexibility. SME2 combines wide SIMD and matrix-multiply compute capability, enabled by Arm’s SVE2 (Scalable Vector Extension 2) and SME ISA features.
This combination enables high-throughput AI and computer vision (CV) acceleration directly into the CPU pipeline. Making it easier to integrate new algorithms without waiting for hardware refreshes.

SME2-enabled C1 CPUs enable OEMs and developers to:

Match or exceed DSP-level performance in imaging workloads.
Run some applications without using separate AI accelerators, thanks to SME2’s scalable throughput.
Benefit from a CPU-like programming model, making it easier for developers to optimize and evolve code.

Neural camera denoising on SME2-enabled C1 CPUs

Arm has developed a neural camera denoising pipeline purpose-built for SME2. It operates directly in the RAW domain for superior noise modeling and detail retention.

It is built from two complementary algorithms:

UltraLite

Temporal
Downscale, per-channel processing, motion mask estimation, temporal accumulation.
Efficient; stabilizes video in low light.

CollapseNet

Spatial
Cascaded, pyramid-based denoiser (UGGV color space)
Superior detail retention in sub-lux conditions

When combined, UltraLite and CollapseNet form a spatio-temporal denoising pipeline. UltraLite delivers both temporal stability and CollapseNet restores spatial detail.

This combination ensures versatility. UltraLite excels at video, while CollapseNet ensures high-quality stills. Together, they provide robust denoising across the full range of scenarios.

Real-time performance on a single core

Neural camera denoising achieves real-time throughput, even when running on a single CPU core with SME2 enabled. The table below shows how SME2-enabled CPUs balance efficiency and flexibility to deliver DSP-class performance without requiring separate accelerators.

UltraLite (temporal only)	1080p	>180fps	Lightweight, efficient temporal denoising
CollapseNet (spatial)	4K	~30fps	High-quality RAW-domain denoising
Combined (spatio-temporal variant)	4k	~30fps	UltraLite + CollapseNet pipeline for video and stills

Programmability and developer tools

Neural camera denoising is implemented as optimized C++ code and includes standalone benchmarking binaries for aarch64 targets. Developers can provide custom inputs, measure performance, and debug with ease.
Crucially, SME2 supports Arm C Language Extensions (ACLE) intrinsics. This allows

Low-level tuning of critical kernels, such as convolutions and blending.
Familiar workflows, using the same toolchains developers already rely on for Arm CPUs.

For experimentation, PyTorch and Keras models are also available. This enables rapid prototyping before deploying optimized implementations.

Explore the open-source code in the KleidiAI Camera Pipelines repository on GitLab.

Results: Extending image quality

Lab evaluations show that SME2-based neural camera denoising improves image quality in the conditions that matter most; 1 lux and below. In these low-light conditions, SME2-based denoising produces sharper, cleaner, and more natural results than ISP-only pipelines or even premium handsets

This highlights SME2’s complementary role. It works alongside the ISP, and takes over when fixed-function hardware reaches its limits.

Looking ahead

Neural camera denoising is only the beginning. SME2 also accelerates cinematic mode (depth-of-field effects), low-light enhancement, and other advanced camera features. The combination of performance, programmability, and scalability positions SME2 as a general-purpose imaging accelerator. It complements ISPs and enables continuous software innovation.

Conclusion

Noise is one of the hardest problems in photography. Low-light conditions push sensors to their limits. SME2-enabled C1 CPU neural camera denoising gives device makers a flexible, high-performance tool to deliver superior low-light imaging. It acts not as a replacement for ISP hardware, but as a complementary capability that extends what cameras can do.

SME2 combines ACLE programmability, real-time 4K performance on a single core, and open-source examples available today. Together, these make it a powerful technology for the next generation of computational photography.

Importantly, SME2 demonstrates the power of hardware and software algorithm co-design, where silicon capabilities and software techniques evolve together to unlock entirely new imaging possibilities.

Try it today with the AI Camera Pipelines on GitLab:

AI Camera Pipelines repository on GitLab

David Packwood
Principal Computer Vision Architect, Arm

The post Smarter Smartphone Photography: Unlocking the Power of Neural Camera Denoising with Arm SME2 appeared first on Edge AI and Vision Alliance.

Smarter, Faster, More Personal AI Delivered on Consumer Devices with Arm’s New Lumex CSS Platform, Driving Double-digit Performance Gains

Brian Dipert — Wed, 10 Sep 2025 13:29:43 +0000

News Highlights:

Arm Lumex CSS platform unlocks real-time on-device AI use cases like assistants, voice translation and personalization, with new SME2-enabled Arm CPUs delivering up to 5x faster AI performance
Developers can access SME2 performance with KleidiAI, now integrated into all major mobile OSes and AI frameworks, including PyTorch ExecuTorch, Google LiteRT, Alibaba MNN and Microsoft ONNX Runtime
For flagship devices, Arm Lumex CSS platform achieves an unprecedented six years of double-digit IPC performance gains
New Mali G1-Ultra redefines mobile entertainment and is built for gamers, with 2x ray tracing uplift

AI is no longer a feature, it’s the foundation of next-generation mobile and consumer technology. Users now expect real-time assistance, seamless communication, or personalized content that is instant, private, and available on device, without compromise. Meeting these expectations requires more than incremental upgrades, it demands a step change that brings performance, privacy and efficiency together in a scalable way.

Introducing Arm Lumex

That’s why we’re introducing Arm Lumex, our most advanced compute subsystem (CSS) platform, purpose-built to accelerate AI experiences on flagship smartphones and next-gen PCs.

Lumex unites our highest performing CPUs with Scalable Matrix Extension version 2 (SME2), GPUs and system IP, enabling the ecosystem to bring AI devices to market faster and deliver experiences from desktop class mobile gaming to real time translation, smarter assistants, and personalized applications.

We are enabling SME2 across every CPU platform and by 2030, SME and SME2 will add over 10 billion TOPS of compute across more than 3 billion devices, delivering an exponential leap in on-device AI capability.

Partners can choose exactly how they build Lumex into their SoC – they can take the platform as delivered and leverage cutting-edge physical implementations tailored to their needs, reaping time to market and time to performance benefits. Alternatively, partners can configure the platform RTL for their targeted tiers and harden the cores themselves.

Lumex and our simplified naming conventions across the Arm portfolio were announced earlier this year.

The platform combines:

Next-generation SME2-enabled Armv9.3 CPU cluster including C1-Ultra and C1-Pro, powering flagship devices
New C1-Premium, purpose built for the sub-flagship market, providing best in class area efficiency
New Mali G1-Ultra GPU with next-generation ray tracing enabling advanced graphics and gaming, plus a boost to AI performance
The most flexible and power-aware DynamIQ Shared Unit (DSU) Arm has delivered to date: C1-DSU
Optimized physical implementations for 3nm nodes
Deep integration across the software stack delivering seamless AI acceleration for developers using KleidiAI libraries

Accelerated AI Everywhere with SME2-Enabled CPUs

The SME2-enabled Arm C1 CPU cluster provides dramatic AI performance gains for real-world, AI-driven tasks:

Up to 5x uplift in AI performance
4.7x lower latency for speech-based workloads
2.8x faster audio generation

This leap in CPU AI compute enables real-time, on-device AI inference capabilities, providing users with smoother, faster experiences across interactions like audio generation, computer vision, and contextual assistants.

So what does this mean in real world use cases? SME2 can deliver a whole new level of responsiveness and efficiency. For example, our Smart Yoga Tutor demo app saw a 2.4x boost in text-to-speech, meaning users get instant feedback on their poses, all without draining battery life. Together with Alipay and vivo, we achieved 40% reduction in the time it takes for LLM response for interaction with the user, proving SME2 is delivering faster real-time generative AI on-device.

SME2 isn’t just about speed; it’s also unlocking AI-powered capabilities that traditional CPUs can’t match. For example, neural camera denoising now runs at over 120fps in 1080p or 30fps in 4K, all on a single core. That enables smartphone users to capture sharper, crystal-clear images even in the darkest scenes, allowing for smoother interactions and richer experiences on everyday devices.

Unlike cloud-first AI, which is constrained by latency, cost, and privacy concerns, Lumex brings intelligence directly to the device where it’s faster, safer, and always available. SME2 is being embraced by leading ecosystem players including Alibaba, Alipay, Samsung LSI, Tencent and vivo.

Architectural Freedom for Every Product Tier

Lumex offers partners the freedom to balance peak performance, sustained efficiency, and silicon area in products ranging from high-end smartphones and PCs to emerging AI-first form factors:

CPU	Key benefit	Performance and efficiency gains	Ideal use cases
C1-Ultra	Flagship peak performance	+25% single-thread performance Double-digit IPC gain year-on-year	Large-model inference, computational photography, content creation, generative AI
C1-Premium	C1-Ultra performance with greater area efficiency	35% smaller area than C1-Ultra	Sub-flagship mobile segments, voice assistants, multitasking
C1-Pro	Sustained efficiency	+16% sustained performance	Video playback, streaming inference
C1-Nano	Extremely power-efficient	+26% efficiency, using less area	Wearables, smallest form factors

Enabling Desktop-Class Gaming and Faster AI Inference on Mali GPU

With over 12 billion Arm GPUs shipped to date, Arm is at the center of mobile gaming experiences. The new Arm Mali G1-Ultra GPU continues to push the boundaries of mobile gaming, delivering high-fidelity, console-class graphics. This is made possible by a brand-new Ray Tracing Unit v2 (RTUv2), powering advanced lighting, shadows and reflections, leading to a 2x uplift in ray tracing performance compared to its predecessor. For AI workloads, the G1-Ultra enables up to 20% faster inference performance, enhancing responsiveness across real-time applications.

The Mali G1-Ultra delivers 20% better performance across graphics benchmarks compared to the previous generation, with across-the-board improvements for leading titles, including Arena Breakout, Fortnite, Genshin Impact, and Honkai Star Rail. The G1-Premium and G1-Pro GPUs deliver superior performance and power-efficiency for constrained devices.

Finally, Developer-Friendly AI for Mobile

For developers, AI experiences just work on the Lumex platform. Through the KleidiAI integration across major frameworks including PyTorch ExecuTorch, Google LiteRT, Alibaba MNN and Microsoft ONNX Runtime, apps automatically benefit from SME2 acceleration with no code changed required.

For developers building cross-platform apps, Lumex brings new portability:

Google apps like Gmail, YouTube and Google Photos are already SME2-ready, ensuring seamless integration as Lumex-based devices hit the market
Cross platform portability means optimizations built for Android can seamlessly extend to Windows on Arm and other platforms
Partners like Alipay are already showcasing on device LLMs running efficiently with SME2

Technology leaders – including Apple, Samsung, and MediaTek – are integrating AI acceleration capabilities for faster, more efficient on-device AI. Apple is powering Apple Intelligence; Samsung and MediaTek are improving responsiveness and efficiency of real-time AI applications such as translation, summarization, and personal assistants using Google Gemini.

Arm Lumex: Platform-Level Intelligence for the AI Era

Arm Lumex is more than our most advanced CSS platform for the consumer computing market, it’s the foundation for the next era of intelligent AI-enabled experiences. Whether you’re an OEM or developer, Lumex gives you the tools to deliver personal, private and high-performance AI at the edge, where it matters most. Built for the AI era, Lumex is where the future of mobile innovation begins.

Chris Bergey
SVP and GM of the Client Line of Business, Arm

Supporting Quotes:

“Through deep integration with SME2, MNN enables low-latency, quantized inference for billion-parameter models like Qwen on smartphones — showcasing Arm and Alibaba’s joint innovation in scalable, next-gen mobile AI.”
Xiaotang Jiang, Head of MNN, Taobao and Tmall Group, Alibaba

“The validation of LLM inference using SME2 has been completed on vivo’s next generation flagship smartphone through the close collaboration of Arm, Alipay and vivo. We observe that prefill and decode performance can be improved by over 40% and 25% respectively. These results demonstrate significant progress in CPU backend, and we are highly encouraged by the outcomes achieved so far.”
Xindan Weng, Head of Client Engineering, Alipay

“SME2-enhanced hardware enables more advanced AI models, like Gemma 3, to run directly on a wide range of devices. As SME2 continues to scale, it will enable mobile developers to seamlessly deploy the next generation of AI features across ecosystems. This will ultimately benefit end-users with low-latency experiences that are widely available on their smartphones.”
Iliyan Malchev, Distinguished Software Engineer, Android at Google

“At Honor, our mission is to bring premium experiences to more users, especially through our upper mid-range smartphones. By leveraging the Arm Lumex CSS platform, we’re able to deliver smooth performance, intelligent AI features, and outstanding power efficiency that elevate everyday mobile experiences.”
Honor

“AI is changing how we interact with our devices and the world around us, and the Arm ecosystem is driving important developments in this space. At Meta, we’re excited about the integration of Arm Kleidi and PyTorch’s ExecuTorch, allowing our applications to seamlessly run on next-generation technology that accelerates end-user experiences.”
Sy Choudhury, Director, AI Partnerships, Meta

“At Samsung, we’re excited to continue our collaboration with Arm by leveraging Arm’s compute subsystem platform to develop the next generation of flagship mobile products. This partnership enables us to push the boundaries of on-device AI, delivering smarter, faster, and more efficient experiences for our users.”
Nak Hee Seong, Vice President and Head of SOC IP Development Team at Samsung Electronics

“SME2 accelerates on-device large language models, like Tencent’s Hunyuan, by addressing key performance bottlenecks and enabling efficient LLM deployment on mobile for enhanced user experiences.”
Felix Yang, Distinguished Expert, Machine Learning Platform, Tencent

The post Smarter, Faster, More Personal AI Delivered on Consumer Devices with Arm’s New Lumex CSS Platform, Driving Double-digit Performance Gains appeared first on Edge AI and Vision Alliance.

The Chiplet Consolidation Wave: How Strategic Acquisitions are Shaping the Future of Silicon

Brian Dipert — Fri, 15 Aug 2025 13:39:06 +0000

The Disaggregation of the Monolith: A New Paradigm Forged by AI’s Demands

The semiconductor industry is navigating its most significant architectural shift in decades, a transition compelled by the relentless demands of Generative AI. The monolithic System-on-Chip (SoC), the reigning paradigm for half a century, is fracturing under the weight of exponentially growing AI models and their insatiable need for memory bandwidth and computational power. This has given rise to a more flexible and economically viable model: the chiplet. While the concept dates to 2014 or before, the need for chiplets has increased dramatically in recent times. While AI processing is a dominant use case, many other applications can benefit once the chiplet ecosystem and costs have improved.

This transition represents a fundamental re-imagining of how complex circuits are designed and manufactured. However, the popular “Lego Block” analogy belies the immense technical and business challenges that underscore the infancy of this new ecosystem. The path to a genuinely open, multi-vendor chiplet marketplace is not a simple assembly job; it is a frontier defined by profound hurdles in reliability, testing, and security that the industry is only now beginning to address.

Still in the First Rodeo

To date, almost all chiplet implementations involve AI accelerators and their interfaces to High Bandwidth Memory, I/O, and optical networking. These are ultra-high value applications where a $50 cost difference does not matter. Even higher-value networking and industrial applications lack this luxury, not to mention automotive or consumer applications. Importantly, no single entity can solve all these problems. It takes an entire ecosystem that cooperates at speed and scale. This level of cooperation does not yet exist in mainstream applications. Later in this article, we describe the differences between AMD’s walled garden, ARM’s open ecosystem, and Nvidia’s managed collaborations.

The GenAI Imperative: Why Monolithic Architectures Are Cracking

At its core, the chiplet concept centers on modularity. Instead of one large, complex chip, engineers can build a system by combining multiple specialized dice in an advanced package. Although this idea has been evolving for years, the rapid growth of Generative AI has transformed it from a strategic option into a necessity. As AI models grow and inference workloads become more demanding – requiring larger context lengths and multimodal capabilities – monolithic designs are hitting fundamental limitations in memory, power, and cost.

The modern AI inference workload is not uniform; it involves a complex interaction between prompt processing (which is compute-intensive) and output generation (which is memory-bandwidth-intensive). No single monolithic architecture can be optimally designed for both tasks. As AI startup d-Matrix emphasizes, a “one-size-fits-all” approach is no longer practical. Chiplets offer a solution by enabling modular architectures where logic, memory, and I/O can be individually optimized and scaled to meet the specific needs of various AI applications, from large frontier models in the cloud to smaller, more efficient reasoning models at the edge. The fundamental challenges of the chiplet era include:

1. The Known Good Die (KGD) and Design-for-Test (DFT) Crisis: The most immediate technical barrier is ensuring each chiplet is free of defects before being sealed in an expensive package. The KGD problem becomes exponentially harder with chiplets. Advanced packaging techniques use interconnect pitches of less than 50 microns, making the physical contacts too fine to be tested with traditional wafer probes. A single faulty, untested chiplet can render a multi-thousand-dollar packaged system useless, creating unacceptable financial risk.

This has ignited a revolution in Design-for-Test (DFT). The industry is reviving and adapting established standards and developing new ones specifically for 3D-ICs, to create standardized digital test access pathways into and through stacked dice. Methodologies like Built-In Self-Test (BIST) for logic and memory are no longer optional but mandatory. EDA giants like Synopsys are providing comprehensive toolchains to manage this complexity, enabling everything from at-speed interconnect testing to in-field monitoring, which is critical for addressing the rising concern of Silent Data Corruption (SDC) in large-scale data centers.

2. Reliability, Availability, and Serviceability (RAS): In a monolithic SoC, RAS is a contained problem. In a multi-chiplet system, especially one with dice from different vendors, it becomes a system-level nightmare. This is particularly acute in the automotive sector, where functional safety (e.g., ASIL-D) is non-negotiable.

New technology to address these issues
As pioneered by Athos Silicon, the multiple-Systems-on-Chip (mSoC) architecture redefines system resilience for safety-critical computing. While most of the industry uses chiplets to boost performance within a conventional PC-like architecture, Athos takes a fundamentally different approach. The mSoC platform distributes execution across multiple redundant chiplets, coordinated by a fault-aware voting mechanism that continuously monitors workload integrity. Unlike monolithic designs, where a single transistor upset can lead to catastrophic system failure, mSoC eliminates single points of failure by isolating faulty chiplets and reassigning their functions in real time. Built to meet ASIL-D and aerospace-grade standards, this architecture enables deterministic and certifiable computing platforms. In fact, the certification of high levels of autonomy in ADAS and robotics may not be feasible without the architectural simplifications and fault containment mechanisms introduced by mSoC. And the certification of high levels of autonomy in ADAS and robotics is likely to require a solution of this sort.

3. Unresolved Business Models and Security: In a multi-vendor system, who is liable if the final part fails? This unresolved business question remains one of the biggest impediments to a truly open ecosystem. Furthermore, security becomes more complex. The system’s Root of Trust (RoT) must be able to extend its authority to third-party chiplets, verifying their identity and integrity through attestation. Establishing common standards for these security and RAS functions is a primary focus of industry bodies and a key area of competition. While its products are not currently in the form of a chiplet, Axiado has made impressive progress in multi-mode security, including Root of Trust.

Interconnects, Packaging, and Testability

The chiplet revolution rests on three critical technological pillars: high-speed interconnects, sophisticated advanced packaging, and a standardized framework for test and serviceability. The development and control of these enabling technologies have become a central battleground for industry dominance.

The Interconnect Imperative and the Rise of UCIe

The performance of any chiplet-based system is gated by the quality of its die-to-die (D2D) interconnect. The need for an open standard gave rise to the Universal Chiplet Interconnect Express (UCIe). Backed by a broad consortium of semiconductor industry leaders, UCIe is the catalyst intended to unlock a vibrant, multi-vendor marketplace. The standard is rapidly evolving to meet the demands of AI. As an example, the UCIe Consortium recently introduced the 3.0 specification with 64 GT/s performance and enhanced manageability, in addition to releasing the 2.0 specification, which adds support for a standardized system architecture for manageability.

As highlighted by Alphawave Semi, the industry is already moving beyond the initial 16-24 Gbps implementations to UCIe 64G, a third-generation IP capable of delivering over 20 Tbps/mm of bandwidth density. This leap in performance is essential for connecting next-generation compute and memory chiplets. The elephant in the room is power consumption. The incredible benefits of LLMs have driven every supplier to the highest performance levels in AI datacenters – at the expense of power. History says this pendulum will swing back. The cost and availability of power are too large. After the initial exuberance of massive LLM training is over, datacenter operators will resist mightily the purchase of high-power components and systems.

The Memory Bottleneck and the Evolution of HBM

High-Bandwidth Memory (HBM) has become the most critical chiplet in any AI accelerator, and the performance gap between compute power (FLOPS) and memory bandwidth is a primary system bottleneck. The evolution of the HBM standard is a direct response to this challenge. The upcoming HBM4 standard will double the I/O width to 2048 bits and push speeds higher, effectively doubling the bandwidth per stack to over 2 TB/s.

High-Bandwidth Memory (HBM) is central to AI accelerators, but system performance is increasingly limited by memory bandwidth rather than compute power. It takes an advanced chiplet-based approach to stack HBM dice and enable high-bandwidth data transfers. To address this, the upcoming HBM4 standard doubles the I/O width to 2,048 bits and bandwidth per stack to over 2 TB/s. A key new trend with HBM4 is the rise of custom HBM, where companies design specialized logic dice beneath the DRAM stack—moving memory controller or compute functions into the HBM base die. By replacing the standard PHY with a direct chiplet interface like UCIe, this approach yields higher bandwidth, lower latency, and greater efficiency than the traditional JEDEC HBM standard.

High-Bandwidth Memory (HBM) is essential for AI accelerators, with memory bandwidth increasingly limiting system performance. The upcoming HBM4 standard addresses this by doubling the I/O width to 2,048 bits and increasing stack bandwidth to over 2 TB/s. A key trend with HBM4 is the rise of custom HBM, where companies integrate specialized logic dice beneath the DRAM stack, such as moving memory controller or compute functions into the base die. This approach, using direct chiplet interfaces like UCIe instead of the standard PHY, improves bandwidth, reduces latency, and enhances efficiency.

Holy Grail: A Standardized Framework for Test and Serviceability

As highlighted by the KGD crisis, a robust test framework is as crucial as the physical interconnects. The industry is building a multi-layered test architecture founded on decades of experience, now adapted for 3D systems. It starts with the old JTAG standard that provides test access, but requires more advanced standards that work with stacked dice and multiple data pathways. The requirements extend to in-field monitoring, which is new to many system and chip designers.

The Battle for Ecosystem Control

As the foundational technologies mature, a strategic battle is unfolding over the business model that will define the chiplet era. Four competing philosophies have emerged, each championed by major industry players.

The Walled Garden with Open Gates: AMD’s Anchor Ecosystem

AMD is promoting a structured, semi-open model built around a central “Anchor Chiplet.” This AMD-designed anchor orchestrates essential system-level functions like power, security, and RAS. It then defines two models for third-party integration: Third-Party Die (TPD), using standard interfaces like UCIe, and Third-Party Adapted die (TPA), which gives a partner’s chiplet premium access to AMD’s proprietary infrastructure, including its high-speed Infinity Fabric, for maximum performance. This strategy creates a curated ecosystem that keeps AMD at the center, ensuring a consistent level of quality and reliability.

The Standards-Based Approach: Arm’s Chiplet System Architecture (CSA)

Arm is championing a more open, standards-based approach. The Arm Chiplet System Architecture (CSA) is a specification that defines how chiplets handle system-level tasks beyond the physical layer. CSA defines the protocols for I/O coherent memory (using AMBA CHI-C2C over UCIe), system control, security, and boot sequences. By standardizing these higher-level functions, Arm aims to create a truly interoperable marketplace. The “Project Leapfrog” initiative, a collaboration between Arm, Samsung Foundry, ADTechnology, and Rebellions, is a real-world demonstration of this multi-vendor ecosystem in action, aiming to build an open AI training platform.

The Fully Open Vision: Tenstorrent’s OCA

Challengers like Tenstorrent are pushing the open model even further. By acquiring Blue Cheetah, an interconnect IP provider, Tenstorrent is not just securing technology for its products but positioning itself as a hub for its Open Chiplet Architecture (OCA). The strategy is to use its IP to attract partners to a fully open ecosystem built on UCIe and the RISC-V instruction set, creating a direct challenge to the proprietary models of incumbents.

Managed Collaboration: Nvidia’s Approach

Nvidia collaborates with third-party chiplet suppliers through its NVLink Fusion program. This initiative marks a significant strategic shift for the company, opening its tightly controlled ecosystem to allow for the integration of external CPUs and custom AI accelerators with its powerful GPUs.

The NVLink Fusion program, unveiled as a key component of Nvidia’s strategy, is designed to enable the creation of semi-custom AI infrastructure. This allows hyperscalers and other large-scale data center operators to tailor their systems with specialized processors while still leveraging Nvidia’s dominant GPU technology. The program has already been joined by industry leaders such as Fujitsu, Qualcomm, MediaTek, Marvell, Alchip Technologies, Astera Labs, Synopsys, and Cadence.

This strategic opening of its platform allows Nvidia to maintain its central role in the AI hardware landscape while offering the flexibility that large-scale customers are increasingly demanding. By fostering a collaborative ecosystem, Nvidia is positioning itself to be the core of a broader range of customized, high-performance computing solutions.

The Acquisition Spree and The New Value Chain

The technical complexities and strategic importance of chiplets have triggered a land grab for foundational IP and talent. This M&A activity is reshaping the semiconductor value chain, creating new opportunities and elevating the importance of specialized innovators.

Anatomy of a Strategic Acquisition: Tenstorrent and Blue Cheetah

The acquisition of Blue Cheetah by Tenstorrent is a perfect case study. It was a multi-layered move to de-risk its roadmap by internalizing critical D2D interconnect IP, “acqui-hire” a world-class analog design team to fill a key expertise gap, and gain complete control over the technology’s cost and evolution. Most importantly, it was a strategic play to fuel its Open Chiplet Architecture vision by owning the foundational “picks and shovels” needed to build the ecosystem.

The New Chiplet Value Chain and M&A Targets

The chiplet paradigm creates value in new and diverse areas, making companies in these niches essential to consider:

Interconnect Specialists: Companies like Eliyan and Kandou AI, which provide the critical D2D PHY IP, remain the most sought-after targets.
Specialized Function Providers (Extreme Heterogeneity): The true power of chiplets is unlocked by integrating diverse technologies.
Analog/Digital: Companies like Sagence AI are using chiplets to partition their analog in-memory compute engines from their digital control logic.
Compound Semiconductors: Innovators like PseudolithIC are developing paradigms to integrate compound semiconductor (GaN, InP) chiplets onto standard silicon wafers for high-performance RFICs.
I/O and Memory Chiplets: Disaggregating I/O into separate chiplets is a significant trend. Companies like Alphawave Semi (acquired by Qualcomm for $2.4B in June 2025) are providing off-the-shelf, multi-protocol I/O chiplets.
Ecosystem Enablers: A new category of startups is emerging to build the infrastructure for a chiplet marketplace, including companies like Yorchip, Chipletz, and zGlue.
- AMD recently licensed Arteris’ FlexGen network-on-chip (NoC) interconnect IP for its next-gen AI chiplet design, providing high-performance data transport in AMD chiplets to power AI across applications, including data centers, edge, and end devices.
The acquisition wave has already started, as illustrated by this list of select recent acquisitions:

A New Life for Mature Foundries

Notably, the need for heterogeneous integration creates a significant opportunity for foundries without leading-edge EUV capabilities, such as Tower Semiconductor, SkyWater Technology, and XFab, while considering potential market dynamics. To start, it enables the integration of chiplets from completely different fabrication processes. Importantly, cost factors can be favorable, even with smaller wafer sizes (which are common in older and specialized processes). Why? Die sizes of chiplets are tiny, which increases the number of dice per wafer and improves yields to make them cost-effective in performance-driven applications. This does not solve the issue of testability, but it does make these older processes usable and cost-effective.

For example, SkyWater recently finalized the acquisition of Infineon’s 200 mm facility in Austin, Texas, which it will open to foundry customers. It will increase output for what it calls “foundational” chips on nodes from 130 nm to 65 nm. Customers like Infineon, the Department of Defense (DoD), and even quantum-computing companies like D-Wave are expected to use the fab.

Strategic Implications and the Future Trajectory: The Optical Horizon

The shift towards chiplets and the subsequent consolidation are forging the future of the semiconductor industry. The strategic maneuvers of today are defining the competitive fault lines for the next decade and setting the stage for the next great technological leap in connectivity: the transition from electrical to optical I/O.

The Next Frontier: Scaling with Optical I/O

Even as electrical interconnects like UCIe improve, the rapid growth of large-scale AI systems is pushing them to their physical limits. The power needed to drive high-speed electrical signals across a rack— and the resulting thermal density— is becoming unsustainable. As Ayar Labs points out, the power per rack for GPU systems is expected to skyrocket, creating a “connectivity problem” that electrical I/O cannot solve efficiently.

The solution lies in replacing electrons with photons. Optical I/O, which uses co-packaged optical chiplets to transmit data as light, represents the next frontier. This technology promises a transformative leap in performance, offering:

Massively Higher Bandwidth Density: Optical waveguides can carry far more data in the same physical space.
Lower Power Consumption: Transmitting data with light is significantly more energy-efficient, with projections of a 10x improvement at iso-performance.
Longer Reach: Optical links can efficiently transmit data over hundreds of meters, enabling true resource disaggregation and the creation of vast, coherent compute fabrics that can span multiple racks.

This future is rapidly approaching. Companies like Ayar Labs are already producing reliable, UCIe-compliant optical I/O chiplets and the external laser sources needed to power them. Innovators like Xscape Photonics are developing novel multi-wavelength laser platforms that can further increase bandwidth per fiber by over 10x using Dense Wavelength Division Multiplexing (DWDM). These optical chiplets are the key to breaking the I/O wall and enabling the scalable, flexible, and composable AI infrastructure of the future.

Conclusion and Outlook

The semiconductor industry has embarked on an irreversible journey away from the monolithic chip. Driven by the unyielding demands of AI, the chiplet architecture has moved from a niche concept to the mainstream. This disaggregation has ignited a vibrant ecosystem, but its full potential is constrained by significant challenges in testing (KGD), reliability (RAS), and security.

The acquisition spree is a direct result of these complexities and is set to continue. Well-funded, technologically differentiated startups focused on interconnects, packaging, security, and, increasingly, test and optical I/O solutions, are the most logical targets for the next M&A wave.

Looking ahead, the market will not resolve into a purely “open” or “closed” state. A hybrid model will emerge where large players leverage proprietary solutions for their flagship products while engaging with open standards to tap into a broader ecosystem of innovation. The transition to optical I/O will mark the next great inflection point, promising to shatter current performance barriers. The race to develop and control the key technologies for that optical future has already begun, ensuring that the chiplet consolidation wave is not an endpoint, but merely the beginning of the next chapter in the evolution of silicon.

What will the second and third rodeos look like? We will continue to follow the market and report on the most important developments and trends. Please let us know what’s important to you!

George Jones
Managing Director, Woodside Capital Partners

Alain Bismuth
Managing Director, Woodside Capital Partners

Woodside Capital Partners is one of the leading corporate finance advisory firms for tech companies in M&A and financings in the $30M –$500M segment. The firm has worked with some of the best entrepreneurs and investors since 2001, providing ultra-personalized service to select clients. Our team has global vision and reach, and has completed hundreds of successful engagements. We have deep industry knowledge and extensive domain experience in the following sectors: Autonomous Vehicles and ADAS, Computer Vision, Artificial Intelligence, CloudTech, Enterprise Software, Information Security, Digital Entertainment & Lifestyle, Health Tech, Internet of Things, Networking / Infrastructure, Robotics, Semiconductors, Batteries, Energy Storage, Aerospace and Defense. Woodside Capital Partners is a specialist in cross-border transactions, with extensive relationships among venture capitalists, private equity investors, and corporate executives from global 1000 companies. More about Woodside Capital Partners here.

Questions? Contact George Jones, Managing Director, Woodside Capital Partners at gjones@woodsidecap.com.

The post The Chiplet Consolidation Wave: How Strategic Acquisitions are Shaping the Future of Silicon appeared first on Edge AI and Vision Alliance.

Arm Neural Technology Delivers Smarter, Sharper, More Efficient Mobile Graphics for Developers

Brian Dipert — Tue, 12 Aug 2025 13:52:10 +0000

News Highlights:

Arm neural technology is an industry first, adding dedicated neural accelerators to Arm GPUs, bringing PC-quality, AI powered graphics to mobile for the first time – and laying the foundation for future on-device AI innovation
Neural Super Sampling is the first application, an AI-driven graphics upscaler that enables potential for 2x resolution uplift at 4ms per frame
Developers can start building now with the industry’s first open development kit for neural graphics with an Unreal Engine plugin, emulators, and open models on GitHub and Hugging Face

On-device AI is transforming workloads everywhere, from mobile gaming to productivity tools to intelligent cameras. This is driving demand for stunning visuals, high frame rates and smarter features – without draining battery or adding friction. Announced today at SIGGRAPH, Arm neural technology is an industry first, bringing dedicated neural accelerators to Arm GPUs from 2026. This takes the performance of GPUs for graphics rendering to new heights, delivering up to 50% GPU workload reduction for today’s most intensive mobile content, starting with mobile gaming. And this is just the beginning – the availability of this new technology lays the foundations for the industry to deliver even more on-device AI innovation in the future.

Alongside this technology, we are launching the world’s first publicly available neural graphics development kit, designed to integrate AI-powered rendering into existing workflows so that developers can start building today, a full year ahead of hardware availability. All of Arm’s neural technology will be completely open – that means the model architecture, the weights and the tools that a studio would need to retrain the model. Partners that have shown support for the development kit to date include Enduring Games, Epic Games (Unreal Engine), NetEase Games, Sumo Digital, Tencent Games, and Traverse Research.

This marks the arrival of desktop-quality neural graphics on mobile, a major milestone for game developers on the frontlines of the shift to on-device AI. But Arm neural technology isn’t just about games – it will have an impact in applications such as neural camera workloads, giving developers the tools to bring graphics to life, on-device and at scale in use cases ranging from upscaling to path tracing.

Meeting Developers Where They Are: An Open Development Kit for Neural Technology

Developers can get going today with the neural graphics development kit – giving them a head start on integrating AI-powered graphics before hardware ships. Built with mobile gaming in mind, the kit includes everything needed to integrate and customize AI-driven visuals, including:

An Unreal Engine plugin
PC-based Vulkan emulation
Updated profiling tools
Fully open models available via GitHub and Hugging Face
Arm ML extensions for Vulkan

The open Arm ML extensions for Vulkan let developers bring AI directly into familiar rendering pipelines. While traditional Vulkan supports graphics and compute pipelines, Arm’s extensions introduce a third: the Graph Pipeline, designed specifically for neural network inference. All of this makes it dramatically easier to incorporate AI into mobile rendering as a native part of the graphics pipeline. Developers can find out more here.

Neural Graphics in Action: Neural Super Sampling

Leveraging all the pieces of the development kit is Arm Neural Super Sampling (NSS) – our AI-powered graphics upscaling engine. It builds on the foundation laid by Arm Accuracy Super Resolution (ASR), which is already being leveraged by the studios behind games including Fortnite and Infinity Nikki.

NSS delivers the potential for upscaling from 540p resolution to 1080p at a cost of 4ms per frame, while delivering near-native quality. Developers can save up to 50% of the GPU workload compared with rendering the full frame using traditional methods, and either bank that saving to reduce the overall power consumption of their game, spend it on delivering a higher frame rate or increasing the quality of the visuals. With NSS, developers can use AI to preserve surface detail, lighting, and motion clarity, giving them the flexibility to balance visual fidelity with energy efficiency depending on their game’s needs.

Check out this technical blog to see NSS in action and find out how to access the neural graphics development kit and other resources.

Beyond Upscaling: More Neural Graphics Ahead

In 2026, we’ll expand our roadmap of neural technology applications with Neural Frame Rate Upscaling – which uses AI to double frame rates without doubling the rendering load – and Neural Super Sampling and Denoising – which applies AI to enable real-time path tracing on mobile with fewer rays per pixel. Both will be available ahead of hardware.

With these new developments, we’re enabling neural graphics that are open, accessible, and optimized for real-world performance. By giving developers a unified, open platform, we’re making it easier to deploy AI across the full range of on-device experiences, built on Arm.

Geraint North
Fellow, AI and Developer Platforms, Arm

Supporting partner quotes

“Enduring Games and Arm share a vision for empowering developers with greater control, performance and content quality – regardless of device. Arm’s Neural Graphics Development Kit is delivering a purpose-built platform for the future of mobile game development, giving developers exciting new options to explore AI-enhanced workflows and bring richer, more immersive experiences to players everywhere.” Adam Creighton, Founder and CEO of Enduring Games

“One of our missions at NetEase Games is to bring console-class visuals to every mobile handset. We see this initiative and development direction as a dynamic step forward, and we’re excited to collaborate with Arm to optimize Neural Graphics Development Kit, leveraging the outstanding performance of our mobile engine as it reaches real-world devices.” Yuwen Wu, Senior Engine Development Expert, NetEase Games

“Sumo Digital believes that neural graphics and AI-based upscaling will revolutionize mobile gaming – delivering console-quality visuals and deeply immersive experiences without draining battery life. Neural technology unlocks a new era where stunning graphics meet portability and we are delighted to be working with Arm to explore what the future holds for mobile game creative and technical innovation.” Scott Kirkland, Group Technology Director, Sumo Digital

“The Engine Technology Team at Tencent Games is working very closely with Arm on the Neural Graphics Development Kit, jointly exploring and advancing capabilities of delivering console-level rendering effects on mobile devices. We look forward to our continued collaboration with Arm as we play a key role in the evolution and widespread adoption of this toolkit, supporting the development of the next generation of mobile games.” Tencent Games

“At Traverse Research we’re always exploring the boundaries of computer graphics and proud to partner with Arm to bring their vision for neural rendering to life.” Jasper Bekkers, CEO, Traverse Research

The post Arm Neural Technology Delivers Smarter, Sharper, More Efficient Mobile Graphics for Developers appeared first on Edge AI and Vision Alliance.

Renesas Sets New MCU Performance Bar with 1-GHz RA8P1 Devices with AI Acceleration

Brian Dipert — Tue, 01 Jul 2025 19:00:29 +0000

Single- and Dual-Core MCUs Combine Arm Cortex-M85 and M33 Cores with Arm Ethos-U55 NPU to Deliver Superior AI Performance up to 256 GOPs

Unprecedented 7300+ CoreMarks[1] with Dual Arm CPU cores
TSMC 22ULL Process Delivers High Performance and Low Power Consumption
Embedded MRAM with Faster Write Speeds and Higher Endurance and Retention
Dedicated Peripherals Optimized for Vision and Voice AI plus Real-Time Analytics
New AI Software Framework Eases Development and Enables Easy Migration with MPUs
Leading-Edge Security Features Ensure Data Privacy

TOKYO, Japan, July 1, 2025 ― Renesas Electronics Corporation (TSE:6723), a premier supplier of advanced semiconductor solutions, today introduced the RA8P1 microcontroller (MCU) Group targeted at Artificial Intelligence (AI) and Machine Learning (ML) applications, as well as real-time analytics. The new MCUs establish a new performance level for MCUs by combining 1GHz Arm^® Cortex^®-M85 and 250MHz Cortex-M33 CPU cores with the Arm Ethos-U55 Neural Processing Unit (NPU). This combination delivers the highest CPU performance of over 7300 CoreMarks and AI performance of 256 GOPS at 500 MHz.

Designed for Edge/Endpoint AI

The RA8P1 is optimized for edge and endpoint AI applications, using the Ethos-U55 NPU to offload the CPU for compute intensive operations in Convolutional and Recurrent Neural Networks (CNNs and RNNs) to deliver up to 256 MACs per cycle that yield 256 GOPS performance at 500 MHz. The new NPU supports most commonly used networks, including DS-CNN, ResNet, Mobilenet TinyYolo and more. Depending on the neural network used, the Ethos-U55 provides up to 35x more inferences per second than the Cortex-M85 processor on its own.

Advanced Technology

The RA8P1 MCUs are manufactured on the 22ULL (22nm ultra-low leakage) process from TSMC, enabling ultra-high performance with very low power consumption. This process also enables the use of embedded Magnetoresistive RAM (MRAM) in the new MCUs. MRAM offers faster write speeds along with higher endurance and retention compared with Flash.

“There is explosive growth in demand for high-performance edge AIoT applications. We are thrilled to introduce what we believe are the best MCUs to address this trend,” said Daryl Khoo, Vice President of Embedded Processing Marketing Division at Renesas. “The RA8P1 devices showcase our technology and market expertise and highlight the strong partnerships we have built across the industry. Customers are eager to employ these new MCUs in multiple AI applications.”

“The pace of innovation in the age of AI is faster than ever, and new edge use cases demand ever-improving performance and machine learning on-device,” said Paul Williamson, senior vice president and general manager, IoT Line of Business at Arm. “By building on the advanced AI capabilities of the Arm compute platform, Renesas’ RA8P1 MCUs meet the demands of next generation voice and vision applications, helping to scale intelligent, context-aware AI experiences.”

“It is gratifying to see Renesas harness the performance and reliability of TSMC 22ULL embedded MRAM technology to deliver outstanding results for its RA8P1 devices,” said Chien-Hsin Lee, Senior Director of Specialty Technology Business Development at TSMC. “As TSMC continues to advance our embedded non-volatile memory (eNVM) technologies, we look forward to strengthening our long-standing collaboration with Renesas to drive innovation in future groundbreaking devices.”

Robust, Optimized Peripheral Set for AI

Renesas has integrated dedicated peripherals, ample memory and advanced security to address Voice and Vision AI and Real-time Analytics applications. For vision AI, a 16-bit camera interface (CEU) is included that supports sensors up to 5 megapixels, enabling camera and demanding Vision AI applications. A separate MIPI CSI-2 interface offers a low pin-count interface with two lanes, each up to 720Mbps. In addition, multiple audio interfaces including I²S and PDM support microphone inputs for voice AI applications.

The RA8P1 offers both on-chip and external memory options for efficient, low latency neural network processing. The MCU includes 2MB SRAM for storing intermediate activations or graphics framebuffers. 1MB of on-chip MRAM is also available for application code and storage of model weights or graphics assets. High-speed external memory interfaces are available for larger models. SIP options with 4 or 8 MB of external flash in a single package are also available for more demanding AI applications.

New RUHMI Framework

Along with the RA8P1 MCUs, Renesas has introduced RUHMI (Renesas Unified Heterogenous Model Integration), a comprehensive framework for MCUs and MPUs. RUHMI offers efficient AI deployment of the latest neural network models in a framework agnostic manner. It enables model optimization, quantization, graph compilation and conversion, and generates efficient source code. RUHMI provides native support for machine-learning AI frameworks such as TensorFlow Lite, Pytorch & ONNX. It also provides the necessary tools, APIs, code-generator, and runtime needed to deploy a pre-trained neural network, including ready-to-use application examples and models optimized for RA8P1. RUHMI is integrated with Renesas’s own e²Studio IDE to allow seamless AI development. This integration will facilitate a common development platform for MCUs and MPUs.

Advanced Security Features

The RA8P1 MCUs provide leading-edge security for critical applications. The new Renesas Security IP (RSIP-E50D) includes numerous cryptographic accelerators, including CHACHA20, Ed25519, NIST ECC curves up to 521 bits, enhanced RSA up to 4K, SHA2 and SHA3. In concert with Arm TrustZone^®, this provides a comprehensive and fully integrated secure element-like functionality. The new MCUs also provides strong hardware Root-of-Trust and Secure Boot with First Stage Bootloader (FSBL) in immutable storage. XSPI interfaces with decryption-on-the-fly (DOTF) allow encrypted code images to be stored in external flash and decrypted on the fly as it is securely transferred to the MCU for execution.

Ready to Use Solutions

Renesas provides a wide range of easy-to-use tools and solutions for the RA8P1 MCUs, including the Flexible Software Package (FSP), evaluation kits and development tools. FreeRTOS and Azure RTOS are supported, as is Zephyr. Several Renesas software example projects and application notes are available to enable faster time to market. In addition, numerous partner solutions are available to support development with the RA8P1 MCUs, including a driver monitoring solution from Nota.AI and a traffic/pedestrian monitoring solution from Irida Labs. Other solutions can be found at the Renesas RA Partner Ecosystem Solutions Page.

Key Features of the RA8P1 MCUs

Processors: 1GHz Arm Cortex-M85, 500MHz Ethos-U55, 250 MHz Arm Cortex-M33 (Optional)
Memory: 1MB/512KB On-chip MRAM, 4MB/8MB External Flash SIP Options, 2MB SRAM fully ECC protected, 32KB I/D caches per core
Graphics Peripherals: Graphics LCD controller supporting resolutions up to WXGA (1280×800), parallel RGB and MIPI-DSI display interfaces, powerful 2D Drawing engine, parallel 16bit CEU and MIPI CSI-2 camera interfaces, 32bit external memory bus (SDRAM and CSC) interface
Other Peripherals: Gigabit Ethernet and TSN Switch, XSPI (Octal SPI) with XIP and DOTF, SPI, I2C/I3C, SDHI, USBFS/HS, CAN-FD, PDM and SSI audio interfaces, 16bit ADC with S/H circuits, DAC, comparators, temperature sensor, timers
Security: Advanced RSIP-E50D cryptographic engine, TrustZone, Immutable storage, secure boot, tamper resistance, DPA/SPA attack protection, secure debug, secure factory programming, Device Lifecycle management
Packages: 224BGA, 289BGA

Winning Combinations

Renesas has combined the new RA8P1 MCUs with numerous compatible devices from its portfolio to offer a wide array of Winning Combinations, including Video Conferencing Camera with AI Capabilities, AI Drawing Robot Arm and AI-Enabled Surveillance Camera. These designs are technically vetted system architectures from mutually compatible devices that work together seamlessly to bring an optimized, low-risk design for faster time to market. Renesas offers more than 400 Winning Combinations with a wide range of products from the Renesas portfolio to enable customers to speed up the design process and bring their products to market more quickly. They can be found at renesas.com/win.

Availability

The RA8P1 MCUs are available now. Renesas is also shipping an RA8P1 Evaluation Kit. More information is available at renesas.com/RA8P1. Samples and kits can be ordered either on the Renesas website or through distributors.

Renesas MCU Leadership

A world leader in MCUs, Renesas ships more than 3.5 billion units per year, with approximately 50% of shipments serving the automotive industry, and the remainder supporting industrial and Internet of Things applications as well as data center and communications infrastructure. Renesas has the broadest portfolio of 8-, 16- and 32-bit devices, delivering unmatched quality and efficiency with exceptional performance. As a trusted supplier, Renesas has decades of experience designing smart, secure MCUs, backed by a dual-source production model, the industry’s most advanced MCU process technology and a vast network of more than 250 ecosystem partners. For more information about Renesas MCUs, visit renesas.com/MCUs.

About Renesas Electronics Corporation

Renesas Electronics Corporation (TSE: 6723) empowers a safer, smarter and more sustainable future where technology helps make our lives easier. A leading global provider of microcontrollers, Renesas combines our expertise in embedded processing, analog, power and connectivity to deliver complete semiconductor solutions. These Winning Combinations accelerate time to market for automotive, industrial, infrastructure and IoT applications, enabling billions of connected, intelligent devices that enhance the way people work and live. Learn more at renesas.com. Follow us on LinkedIn, Facebook, X, YouTube and Instagram.

[1] EEMBC’s CoreMark® benchmark measures performance of MCUs and CPUs used in embedded systems.

The post Renesas Sets New MCU Performance Bar with 1-GHz RA8P1 Devices with AI Acceleration appeared first on Edge AI and Vision Alliance.

Embedded Quest 2025: MCU Vendors Step up Edge AI Play

Brian Dipert — Tue, 01 Jul 2025 15:03:54 +0000

Edge AI was the primary focus of microcontroller vendors at the 2025 Embedded World Exhibition. We caught up with some industry executives for our annual Embedded Quest program, asking them about their take on Edge AI.

As usual, the Embedded World Exhibition in Nuremberg, Germany, drew a large crowd of companies in the computing processor space. What was different at the event this year was their focus. This time, it was all about artificial intelligence. No surprise about this, right? Except the majority of exhibitors at EW2025 were focused more on Edge AI, rather than on generic or cloud AI.

Microcontroller vendors and semiconductor IP companies led the group. Companies like Arm, Microchip, NXP, STMicroelectronics, MIPS, CEVA and others were in the throng, displaying and talking passionately about moving AI from the “talkshop” to the workshop. Despite massive travel disruptions, the event teemed with executives eager to display their companies’ Edge AI offerings. Arm spoke and demonstrated devices that showed its “early adoption of AI and edge computing,” as Paul Williamson, general manager, IoT line of business at the Cambridge, UK-based company told our editors during an interview.

“We’ve been looking at the evolution of AI for a long time,” Williamson said. “We’ve had to step up the tooling, the software and the computing elements because we’re seeing people move from convolutional networks to transformer networks and to vision-based solutions very quickly and they want to run all of those at the edge in embedded systems.”

It’s not a surprise that semiconductor suppliers in the microcontroller sector are diving deep into the Edge AI pool. They see massive opportunities in the billions of devices that are being rolled out with the capacity to process data locally rather than in the Cloud. The market for Edge AI is massive, ranging from automotive to aviation, industrial, medical and farming, according to industry executives. The devices are already deployed in many of these markets, they said, noting that the next waves of electronic equipment going into specialized manufacturing sectors are being fitted with Edge AI.

“Edge AI is a broad term,” said Sameer Watson, CEO of MIPS, in an interview. “Edge AI could be on an IoT device such as a camera or on robots and dishwashers. Think about a set of wide applications. We’re building technology that will make them better.”

He added: For MIPS, the common thing about all of these is that there is always going to be a moving part. And when you are moving things, there’s got to be three things. You must be precise; you must have low latency, and you have to be functionally safe. If we can build technology which does these three consistently with the right AI models and the right platform with the right RISC-V open-source architecture, you will help move this market.”

As in our 2024 edition of our Embedded Quest program, we asked some industry executives at the Embedded World Exhibition about their focus at the event and condensed their thoughts in the following video compilation. The executives we spoke with are from Arm, MIPS, CEVA and STMicroelectronics.

Click on the video below for short clips from the interviews.

Bolaji Ojo
Publisher and Editor in Chief, TechSplicit (formerly the Ojo-Yoshida Report)

This article was published by TechSplicit (formerly the The Ojo-Yoshida Report). For more in-depth analysis, register today and get a free two-month all-access subscription.

The post Embedded Quest 2025: MCU Vendors Step up Edge AI Play appeared first on Edge AI and Vision Alliance.

Arm - Edge AI and Vision Alliance

How Lenovo is scaling Level 4 autonomous robotaxis on Arm

As L4 robotaxis shift from pilot to production, Arm offers the compute foundation needed to deliver end-to-end physical AI that scales across vehicle fleets.

Inside Lenovo AD1

Efficiency, safety, and foundation for physical AI

The road to autonomy is being built on Arm

The Next Platform Shift: Physical and Edge AI, Powered by Arm

The Arm ecosystem is taking AI beyond the cloud and into the real-world

Built for the real world: Edge-first design and proven software ecosystem

The next frontier: AI that moves

Scaling intelligence from edge to cloud

Arm is the compute foundation powering CES 2026

Why ecosystem scale wins

Arm at NeurIPS 2025: How AI Research is Shaping the Future of Intelligent Computing

NeurIPS 2025 provided Arm with a unique opportunity to share the latest technical trends and insights with the global AI research community.

The future of AI: The breakthroughs and research trends highlighted at NeurIPS

Why NeurIPS matters for Arm

Why Arm is the right partner for the AI research community

The Architecture Shift Powering Next-Gen Industrial AI

How Arm is powering the shift to flexible AI-ready, energy-efficient compute at the “Industrial Edge.”

Edge AI is redefining industrial compute

Why industrial OEMs are migrating to Arm

Designing for AI at the edge

The future is edge-native

Smarter Smartphone Photography: Unlocking the Power of Neural Camera Denoising with Arm SME2

Discover how SME2 brings flexible, high-performance AI denoising to mobile photography for sharper, cleaner low-light images.

Scalable Matrix Extensions for imaging innovation

Neural camera denoising on SME2-enabled C1 CPUs

Real-time performance on a single core

Programmability and developer tools

Results: Extending image quality

Looking ahead

Conclusion

Smarter, Faster, More Personal AI Delivered on Consumer Devices with Arm’s New Lumex CSS Platform, Driving Double-digit Performance Gains

News Highlights:

Arm Lumex CSS platform unlocks real-time on-device AI use cases like assistants, voice translation and personalization, with new SME2-enabled Arm CPUs delivering up to 5x faster AI performance

Developers can access SME2 performance with KleidiAI, now integrated into all major mobile OSes and AI frameworks, including PyTorch ExecuTorch, Google LiteRT, Alibaba MNN and Microsoft ONNX Runtime

For flagship devices, Arm Lumex CSS platform achieves an unprecedented six years of double-digit IPC performance gains

New Mali G1-Ultra redefines mobile entertainment and is built for gamers, with 2x ray tracing uplift

Introducing Arm Lumex

Accelerated AI Everywhere with SME2-Enabled CPUs

Architectural Freedom for Every Product Tier

Enabling Desktop-Class Gaming and Faster AI Inference on Mali GPU

Finally, Developer-Friendly AI for Mobile

Arm Lumex: Platform-Level Intelligence for the AI Era

Supporting Quotes:

The Chiplet Consolidation Wave: How Strategic Acquisitions are Shaping the Future of Silicon

The Disaggregation of the Monolith: A New Paradigm Forged by AI’s Demands

Still in the First Rodeo

The GenAI Imperative: Why Monolithic Architectures Are Cracking

Interconnects, Packaging, and Testability

The Interconnect Imperative and the Rise of UCIe

The Memory Bottleneck and the Evolution of HBM

Holy Grail: A Standardized Framework for Test and Serviceability

The Battle for Ecosystem Control

The Walled Garden with Open Gates: AMD’s Anchor Ecosystem

The Standards-Based Approach: Arm’s Chiplet System Architecture (CSA)

The Fully Open Vision: Tenstorrent’s OCA

Managed Collaboration: Nvidia’s Approach

The Acquisition Spree and The New Value Chain

Anatomy of a Strategic Acquisition: Tenstorrent and Blue Cheetah

The New Chiplet Value Chain and M&A Targets

A New Life for Mature Foundries

Strategic Implications and the Future Trajectory: The Optical Horizon

The Next Frontier: Scaling with Optical I/O

Conclusion and Outlook

Arm Neural Technology Delivers Smarter, Sharper, More Efficient Mobile Graphics for Developers

News Highlights:

Arm neural technology is an industry first, adding dedicated neural accelerators to Arm GPUs, bringing PC-quality, AI powered graphics to mobile for the first time – and laying the foundation for future on-device AI innovation

Neural Super Sampling is the first application, an AI-driven graphics upscaler that enables potential for 2x resolution uplift at 4ms per frame

Developers can start building now with the industry’s first open development kit for neural graphics with an Unreal Engine plugin, emulators, and open models on GitHub and Hugging Face

Meeting Developers Where They Are: An Open Development Kit for Neural Technology

Neural Graphics in Action: Neural Super Sampling

Beyond Upscaling: More Neural Graphics Ahead

Supporting partner quotes

Renesas Sets New MCU Performance Bar with 1-GHz RA8P1 Devices with AI Acceleration

Single- and Dual-Core MCUs Combine Arm Cortex-M85 and M33 Cores with Arm Ethos-U55 NPU to Deliver Superior AI Performance up to 256 GOPs

Unprecedented 7300+ CoreMarks[1] with Dual Arm CPU cores

TSMC 22ULL Process Delivers High Performance and Low Power Consumption

Embedded MRAM with Faster Write Speeds and Higher Endurance and Retention