BrainChip - Edge AI and Vision Alliance

Right Sizing AI for Embedded Applications

pigzippa47 — Tue, 03 Feb 2026 09:00:51 +0000

This blog post was originally published at BrainChip’s website. It is reprinted here with the permission of BrainChip.

We all know the AI revolution train is heading straight for the Embedded Station. Some of us are already in the driver’s seat, while others are waiting for the first movers to pave the way so we can become fast adopters. No matter where you are on this journey, one thing becomes clear: AI must adapt to the embedded application sandbox—not the other way around.

Embedded applications typically operate within a power envelope ranging from milliwatts to around 10 watts. For AI to be effective in many embedded markets, it must respect the power-performance boundaries of the application. Imagine your favorite device that you charge once a day. If adding embedded AI to a product means you now need to charge it every four hours, you are likely to stop using the product altogether.

This is where embedded AI fundamentally differs from cloud AI. In the cloud, adding more computations is often the default solution. But in embedded systems, the level of AI compute must be dictated by what the overall power and performance constraints allow. You can’t just throw more compute silicon at the problem.

There are two key approaches to scaling AI effectively for embedded applications:

1. Process Technology

At the foundational level, advanced process technologies like GlobalFoundries’ 22FDX+ with Adaptive Body Biasing offer a compelling solution. These transistors can deliver high performance during compute-intensive tasks while maintaining low leakage during idle or always-on modes. This dynamic adaptability ensures that the overall power-performance integrity of the application is preserved.

2. Alternative Compute Architectures

Emerging architectures like neuromorphic computing are gaining attention for their ability to run inference at a fraction of the power—and with lower latency—compared to traditional models. These ultra-low-power solutions are particularly promising for applications where energy efficiency is paramount and real-time response is also important.

BrainChip’s AKD1500 Edge AI co-processor, built on GlobalFoundries 22FDX platform, demonstrates how neuromorphic design can make AI practical for the smallest and most power-sensitive devices. Powered by the company’s AkidaTM technology, the chip uses an event-based approach, processing only when there’s information thereby avoiding the constant compute cycles that waste energy by reading and writing to either on-chip SRAM or off-chip DRAM as in traditional AI systems. The co-processor performs event-based convolutions that leverage sparsity throughout the whole network in activation maps and kernels, significantly reducing computation power and latency by running as many layers on the Akida TM fabric. The diagram below shows all the interfaces, as well as the 8 Node Akida IP as the centerpiece of the AI co-processor.

The design further improves efficiency by handling data locally and using operations that cut power consumption dramatically. The result is a chip that delivers real-time intelligence while operating within just a few hundred milliwatts, making it possible to add AI features to wearable, sensors, and other AIoT devices that previously relied on the cloud for such capability.

The Akida low-cost, low-power AI co-processor solution offers a silicon-proven design that has already demonstrated critical performance metrics, substantially reducing risk for developers. With fully functional interfaces tested at operational speeds and proven interoperability across multiple MCU and MPU boards, the platform ensures seamless integration. The AKD1500 co-processor supports both power-conscious MCUs via SPI4 and high-performance MPUs through M.2 and PCIe interfaces, providing flexibility across many configurations. Enabling software development early with silicon prototypes accelerates time to market. Several customers have already advanced to prototype stages, validating the design’s maturity and readiness for deployment. As an example, Onsor Technologies’ Nexa smart glasses utilize the AKD1500 for low power inference to predict epileptic seizures, providing quality-of-life benefits for those suffering from epilepsy.

The best part of this is that the AKD1500 can be used with any low cost existing MCU with a SPI interface or an Applications processor where there is a PCIe connection available for higher performance. Adding the AKD1500 AI co-processor makes the time to market very short with available MCUs today.

Final Thoughts

As AI starts to sweep across the length and breadth of embedded space , right sizing becomes not just a technical necessity but a strategic imperative. The goal isn’t to fit the biggest model into the smallest device – it’s to fit the right model into the right device, with the right balance of performance, power, and user experience.

Anand Rangarajan
Director, End Markets, GlobalFoundries

Todd Vierra
Vice President, Customer Engagement, BrainChip

The post Right Sizing AI for Embedded Applications appeared first on Edge AI and Vision Alliance.

BrainChip Unveils Breakthrough AKD1500 Edge AI Co-Processor at Embedded World North America

pigzippa47 — Tue, 04 Nov 2025 19:05:28 +0000

Laguna Hills, Calif. — November 4th, 2025 — BrainChip Holdings Ltd (ASX: BRN, OTCQX: BRCHF, ADR: BCHPY), a global leader in ultra-low power, fully digital, event-based neuromorphic AI, today announced the launch of its AKD1500, a neuromorphic Edge AI accelerator co-processor chip, at Embedded World North America.

Designed to deliver exceptional performance with minimal power consumption, the AKD1500 achieves 800 giga operations per second (GOPS) while operating under 300 milliwatts—setting a new benchmark for edge AI efficiency. This makes AKD1500 ideal for deployment in battery powered wearables, smart sensors, and heat-constrained environments where battery life and thermal limits are critical.

The AKD1500 integrates seamlessly with x86, ARM, and RISC-V host processing platforms via PCIe or Serial interfaces, enabling rapid adoption across a wide range of applications. The AKD1500 co-processor approach is ideal for a wide range of environments and industries, upgrading multi-processor SoCs within defense, industrial and enterprise settings, and upgrading embedded microcontrollers for AI solutions in healthcare, wearables, and consumer electronics without a complete system redesign. The AKD1500 product has been delivered and designed into several end solutions in AI enabled sensing for medical and defense related applications, including Parsons, Bascom Hunter and Onsor Technologies.

“The AKD1500 is a catalyst for the next wave of intelligent AIoT devices,” said Sean Hehir, CEO of BrainChip. “We’re empowering developers to break free from cloud dependency and bring adaptive learning directly to the edge in a compact, cost-effective package. This technology will make AI truly ubiquitous in smart factories, homes, and wearable devices.”

“BrainChip’s AKD1500 on our 22FDX® platform delivers outstanding compute and memory efficiency,” said Anand Rangarajan, Director of AI & IOT Compute at GlobalFoundries. “Embedded developers are constantly innovating to get the right level of AI to fit within performance, power and area constraints. Using BrainChip’s neuromorphic architecture combined with GlobalFoundries’ 22FDX® process technology, the AKD1500 offers an excellent performance, power and cost envelope that fits into edge devices. We’re proud to support BrainChip’s end-to-end embedded AI solutions using GlobalFoundries silicon.”

AKD1500 is supported by BrainChip’s MetaTF software development tools environment, enabling machine learning engineers to easily convert, quantize, compile and deploy models on Akida using standard TensorFlow/KERAS formats which dramatically reduces development time and cost while expanding accessibility for AI developers. BrainChip’s event-based Akida neuromorphic architecture also enables the AKD1500 to provide on-chip learning, a critical differentiator from conventional AI accelerators that rely solely on cloud-based training.

AKD1500 samples are available today with volume production scheduled for Q3’26.

BrainChip’s Chief Development Officer, Jonathan Tapson, will present “The Impact of GenAI Workloads on Compute-in-Memory Architectures” at Embedded World North America on November 4th.

For more information: join us for a demo of AKD1500 at Booth 3080; visit the BrainChip developer site for free tutorials, tools, and models for neuromorphic computing; and check out the embedded world North America site.

About BrainChip Holdings Ltd (ASX: BRN, OTCQX: BRCHF, ADR: BCHPY)

BrainChip is the worldwide leader in neuromorphic Edge AI on-chip processing and learning. The company’s first-to-market, fully digital, event-based AI processor, Akida, uses neuromorphic principles to mimic the human brain, analyzing only essential sensor inputs at the point of acquisition and processing data with unmatched efficiency, precision, and energy economy. BrainChip’s Temporal Event-based Neural Networks (TENNs) build on State-Space Models (SSMs) with time-sensitive, event-driven frameworks that are ideal for real-time streaming applications. These innovations make low-power Edge AI deployable across industries such as aerospace, autonomous vehicles, robotics, industrial IoT, consumer devices, and wearables. BrainChip is advancing the future of intelligent computing, bringing AI closer to the sensor and closer to real-time.

The post BrainChip Unveils Breakthrough AKD1500 Edge AI Co-Processor at Embedded World North America appeared first on Edge AI and Vision Alliance.

Unleash Real-time LiDAR Intelligence with BrainChip Akida On-chip AI

Brian Dipert — Mon, 27 Oct 2025 08:00:08 +0000

This blog post was originally published at BrainChip’s website. It is reprinted here with the permission of BrainChip.

Accelerating LiDAR Point Cloud with BrainChip’s Akida PointNet++ Model.

LiDAR (Light Detection and Ranging) technology is the key enabler for advanced Spatial AI—the ability of a machine to understand and interact with the physical world in three dimensions. A LiDAR sensor pulses laser beams to create a highly accurate, three-dimensional map of space, which is compiled into a LiDAR Point Cloud.

This 3D map is known as a LiDAR Point Cloud.

A point cloud is a massive collection of data points, where each point represents a specific coordinate (X, Y, Z) in the environment. It essentially creates a rich, detailed digital twin of the surrounding space, packed with geometric information about objects, infrastructure, and terrain.

The Critical Importance of 3D Spatial Perception

For next-generation applications like autonomous vehicles, advanced robotics, and intelligent infrastructure, the point cloud is the gold standard for spatial perception because it provides:

Unmatched Precision: Highly accurate distance and volume measurements, essential for safe navigation and manipulation.
Depth and Geometry: True 3D context that is not susceptible to the lighting and occlusion issues of standard 2D imaging.
Instant Interpretation: Enables devices to instantly interpret complex environments for object classification, obstacle detection, and path planning.

The Problem: Cloud-Dependent LiDAR Creates Dangerous Delays

While the data is invaluable, the sheer volume of a point cloud creates a critical processing challenge. To analyze this data, many systems rely on centralized computing or cloud. The issue? The round-trip journey to the cloud introduces latency. In time-sensitive scenarios—like an autonomous vehicle needing to identify a sudden obstacle or a robotic arm requiring immediate process control—this delay is unacceptable. This reliance on off-device processing prevents systems from turning massive datasets into instant, real-time decisions, posing a safety and operational risk. To achieve true instant action, the heavy lifting of point cloud analysis must happen directly on the device—a requirement known as the Edge AI Imperative.

The Solution: Unleashing Real-Time 3D Intelligence with BrainChip’s Akida^TM

BrainChip addresses this critical latency challenge with the Akida PointNet++ model— an advanced, on-chip point cloud AI solution adapted from the original PointNet++ architecture. *

The Akida PointNet++ model is a compact, neuromorphic-friendly neural network that is uniquely optimized to perform real-time classification of 3D LiDAR point clouds directly at the edge. By running this sophisticated model on a hyper-efficient neuromorphic processor, the key benefits are immediately realized:

Real-Time Responsiveness: Selective data handling delivers instant decision-making for streaming applications where milliseconds are crucial.
Energy Efficiency: The system operates in the milliwatt range, making it ideal for battery-powered, always-on, and field deployments.
Ultra-Compact Design: The processing runs efficiently, even on memory-limited edge devices without compromising performance.

How Akida Point Cloud Delivers Speed and Efficiency

What makes the Akida approach uniquely suited for sparse, unordered LiDAR data is its architecture, which maximizes efficiency and accuracy:

Native 3D Processing: Unlike traditional methods that often convert the 3D point cloud into grids or images, the Akida PointNet++ model works natively on the raw point sets. This preserves data integrity while maximizing efficiency.
Sparsity-Driven Efficiency: Akida’s architecture processes only the most meaningful LiDAR data points. This focus eliminates computational waste associated with processing empty space or redundant data, enhancing both speed and model accuracy simultaneously.
Hierarchical Learning: The model utilizes a Hierarchical PointNet++ Backbone to capture both fine-grained local details and the overall global context of the 3D shape, boosting accuracy on sparse, large-scale data.

Transforming Industries with Real-Time LiDAR Intelligence

The ability to process 3D spatial data instantly at the source is vital for next-generation technology across multiple sectors:

Industry	Application of Real-Time LiDAR Intelligence
Autonomous Vehicles & Drones	Precision navigation, real-time obstacle detection, and environmental mapping from raw 3D scans.
Industrial Automation	Real-time asset location, safety monitoring, and precise process control in large facilities.
Smart Cities & Infrastructure	Scalable urban planning, traffic management, and infrastructure inspection using direct 3D analysis.
Security & Surveillance	Accurate 3D scene understanding for perimeter security and immediate anomaly detection.
Robotics & Warehousing	Advanced pick-and-place, navigation, and inventory control with sophisticated spatial awareness.

Ready to integrate intelligent LiDAR processing into your next product design? BrainChip offers a comprehensive development ecosystem, including the Akida Cloud Platform and essential development packages, to help you convert and optimize your models for Akida deployment and bring your vision to reality.

* Qi et al (2017) “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space” https://arxiv.org/abs/1706.02413

Doug McLelland
Senior Researcher, BrainChip

Sasskia Brüers Freyssinet
Deep Learning R&D Engineer, BrainChip

The post Unleash Real-time LiDAR Intelligence with BrainChip Akida On-chip AI appeared first on Edge AI and Vision Alliance.

Akida Exploits Sparsity For Low Power in Neural Networks

Brian Dipert — Fri, 22 Aug 2025 08:02:44 +0000

This blog post was originally published at BrainChip’s website. It is reprinted here with the permission of BrainChip.

In the rapidly evolving field of artificial intelligence, edge computing has become increasingly vital for deploying intelligent systems in real-world environments where power, latency, and bandwidth are limited: we need neural network models to run efficiently. For most of the AI field, that means developing new model architectures or training techniques to achieve sufficient accuracy from ever smaller models. As hardware providers, however, we have an extra strategy to leverage when we look for more efficient ways to carry out the computations themselves. Crucially, standard CNNs naturally show high levels of sparsity, particularly in their activations, meaning many of the values involved in computation are zero. This creates the opportunity for hardware designs like Akida’s, which can skip unnecessary operations and dramatically improve efficiency without sacrificing model fidelity.

BrainChip’s Akida is principally an accelerator for Convolutional Neural Networks (CNNs). Of course, it includes many optimizations, from co-localization of memory and computing in addition to smaller quantization, enabling int8 or even lower bit-width calculations throughout. The single greatest differentiating factor is the ability to exploit “sparsity”: that is, to avoid doing the computation altogether, when possible. Where there are zeros in the values to be multiplied (Figure 1), the multiplication is simply not scheduled. Most accelerators must compute this calculation regardless of the fact it will produce a zero output since they are formed by tightly coupled multiplier accumulator arrays. While exploiting sparsity can yield significant efficiency gains, it does require additional hardware logic to detect and skip zero-valued operations. This introduces some complexity in design and verification, though the trade-off is often favorable in power-constrained environments.

Of course, for that strategy to work, there must be a significant amount of sparsity in the CNN model being inferenced. Fortunately, that’s not something the average user needs to be concerned with as we’ll see, standard models already include very significant levels of sparsity. For engineers, researchers, and developers working on AI at the edge, understanding how Akida leverages sparsity can be a route to achieving even greater efficiency with your models. By embracing sparsity, Akida not only improves computational efficiency but also opens new pathways for designing intelligent systems that operate where conventional AI solutions fall short.

Figure 1: Left – Conventional hardware for running deep models processes all values. ON the righthand side of the figure, Akida hardware skips those operations where multiplying by zero (i.e., taking advantage of event sparsity) thus leading to better efficiency (i.e., lower latency with no impact on accuracy, because the output from the calculation is unchanged).

What is Sparsity?

Sparsity, in the context of neural networks, refers to the presence of zero-valued elements in the data or parameters involved in computation, specifically in inputs, activations, or weights. Rather than being a single metric, sparsity manifests in different forms depending on where these zeros occur.
Activation sparsity: This is the type of sparsity that Akida principally exploits, precisely what we want to focus on here. Its naming is a little counterintuitive, so let’s take a moment to unwrap that: for every layer of a neural network model after the first, the input to that layer is the output of the preceding layer. Those outputs have historically been called activations, hence “activation sparsity”. It will be demonstrated that activation sparsity is often high in models, typically because the commonly used “ReLU” activation function rectifies all negative values to zero.
Input sparsity: The inputs to the first layer (and thus, to the model) must be considered a special case. For a typical model receiving, say, RBG image inputs, we do not expect there to be any appreciable sparsity at all. However, it’s easy to produce preprocessing schemes (e.g. difference of frames in video input) that can generate significant input sparsity. Equally, there are specialized sensors (e.g. dynamic vision sensors or “event-based” cameras) specifically designed to generate sparse input signals. One key takeaway here is that Akida does not need this kind of input sparsity to do well! The activation sparsity described above will naturally arise between layers in the model regardless of the input to the first layer. That said, Akida is naturally placed to exploit the input sparsity from those specialized sensors, so we will return to consider those cases.
Weight sparsity: This refers to zeros in the model weights. These arise naturally during training and are typically “unstructured” (have no spatial pattern within the weights matrices) although various schemes exist to establish structured sparsity patterns (e.g. 2:4 sparsity): those can be more efficiently exploited than unstructured weight sparsity but nonetheless require dedicated hardware. Programmers can take advantage of weight sparsity with offline preprocessing of the model by pruning model weights that are near zero and compressing the network. For the sake of clarity, this is often what other manufacturers are calling sparsity, and this is also not the type exploited by Akida hardware.

Figure 2: Maps of randomly generated input sparsity at different levels of sparsity. Black pixels indicate zeros. This is used to measure the sparsity of a single layer CNN. See text.

The main distinction is that input sparsity comes from the data itself, weight sparsity is a built-in characteristic of the model, and activation sparsity (a.k.a. event sparsity) is a dynamic trait of the model’s activations, varying with the input from one sample to another.

Here, our interest is in exploiting sparsity to reduce the computation required through the model. Since the key computation in neural network layers boils down to a series of multiplications between the inputs and weights of a layer, those are going to be the sparsity values we care about.

Exploiting Sparsity with Akida: Event-based Convolution

If exploiting sparsity was trivial, then everyone would do it. It comes with a cost: standard efficient implementations of the 2D convolution operations common in CNNs run via vector and matrix instructions. You simply cannot take that approach if trying to exploit unstructured sparsity. Instead, Akida uses a neuromorphic-inspired “event-based convolution” approach: broadly, rather than multiplying input by weights kernel for each position of the output space, the algorithm iterates over the input values and only projects the multiplied kernel to the output space where the input is non-zero. This means computations are triggered only when needed, a strategy that aligns closely with neuromorphic principles, where processing mimics the brain’s efficiency by activating only in response to relevant stimuli. If there is no sparsity, this would not be the most efficient approach. The bet is that for real CNNs there is enough sparsity to make this worthwhile.

How does that work out in practice? We can directly measure the behavior of individual layers in hardware. The following are results for a very typical CNN layer, running a standard convolution with kernel size 3×3, input height and width of 32 with 64 input channels and 64 filters mapped to a single Neural Processing node (NP) on Akida 2 hardware. Processing duration was tested using a set of artificially generated inputs with sparsity at controlled levels, much like Figure 2.

The first thing to note about the results is the almost perfectly linear decrease in processing duration as sparsity increases: every zero in the input really does get skipped (Figure 3). The second point is that, when sparsity reaches 100% the processing time for the layer is very close to zero. That’s important: it means that the processing is really dominated by the inputs to be processed; there is no large input-independent overhead for the layer which could otherwise limit our benefit from exploiting sparsity. One final point is that this is a scatter plot; the measurements at each sparsity level are not averaged but show 10 repeats that are perfectly superimposed: the processing time is extremely repeatable.

Figure 3: Inference duration vs incoming Activation Sparsity for a Convolutional layer (input height and width 32, input channels and output filters 64), mapped to a single Neural Processing Node (NP) on Akida 2 hardware. Duration is reported as the ticks of the hardware clock (in this case, and FPGA running at 50 MHz). Note that the scatter plot does not show averages at each sparsity level, rather measurements from 10 repeats are shown but superimpose perfectly. Inference duration shows a very linear decrease with increasing sparsity. As sparsity approaches 100%, processing duration for the layer approaches zero (that is, there is almost no input-independent overhead).

There are some subtle advantages to this approach that should be mentioned. The algorithm is extremely scalable: unlike some other approaches, it does not require large layer or batch sizes to be optimal (actually, the algorithm runs natively at batch size 1, a distinct advantage in the edge setting).

Sparsity in Actual Models

It should be clear by now that Akida can be extremely efficient, but that it needs models to have significant activation sparsity. Fortunately, that is not a problem: it turns out that models are naturally sparse. If there is just one thing to take away from this blog it should be that: standard CNNs naturally show high levels of sparsity.

To make the point, we’ll turn to that most iconic of models, ResNet50 processing standard images from the ImageNet dataset (we have used the pretrained version of ResNet50 provided via tensorflow.keras.applications). Here, we have measured the sparsity (very simply, the proportion of zeros) in the outputs from each layer, averaging over 1000 input images. Please see Figure 4 for the results.

The results are impressive: except for a very few layers, the model shows around 50% sparsity from even the first blocks. That increases steadily, such that by the final stage, layers are approaching 80% sparsity!

Figure 4. Mean activation sparsity per layer for ResNet50 processing natural images. Each bar shows the average sparsity (proportion of zeros) in the output of a single layer, averaging over 1000 images. Layer names are indicated below; for legibility, only the final layer of each block is labeled.

How does that happen? The initial contribution comes from the commonly used Rectifying Linear Unit (ReLU) activation function, applied to the output of each layer. As its name suggests, this rectifies its inputs (sets all negative values to zero). On average, if the distribution of values entering a ReLU was normally distributed with zero mean (thus, with half the values negative), we would expect to see 50% sparsity coming out of the ReLU. In practice, in early layers of typical CNNs, we see slightly lower values than that: there are few filters in early model layers, and they encode very general, low-level features with high spatial precision. In subsequent layers, very high sparsity levels can naturally arise: there are many more filters, and they encode much higher-level features, only a minority of which will be present in any given image.

ResNet50, while a classic, is not optimized to be efficient and thus is not a good target for Akida. What about some more edge-appropriate models? The following plot (Figure 5) shows sparsity in a selection of models from our Model Zoo (indicated by model name / dataset).

Figure 5. Activation sparsity per layer for a selection of models from the BrainChip Model Zoo (Ready-to Use Akida Neural Network Models – BrainChip). Detailed code and examples are available in the Developer Hub. The selection includes models spanning different tasks, such as object classification, object detection, and keyword spotting, to study how activation sparsity varies depending on both the task and the dataset. This diversity helps illustrate the generality of sparsity across real-world applications. The pattern of sparsity described above is evident repeatedly here. In nearly every case, early network layers show lower sparsity, but always over 20% and typically rising to over 50% within a few layers. By the later network layers, sparsity is high, often greater than 80%. The notable exception to that is CenterNet/VOC, an object detection model with an “hourglass” shape: in the later layers, with a decrease in the number of filters and a requirement for precise spatial information to resolve the task, we see sparsity reduction again.

The final example, Akidanet/Visual Wake Word is of note: sparsity is remarkably high in this case. This is the only case in which extra measures were taken during training to increase activation sparsity. This was achieved by adding “regularization” to the training loss function, to encourage the model to learn with reduced activation values. This is remarkably successful. You can read more about this in our educational materials in our Developer Hub education tab.

Takeaways

As AI matures, it penetrates our daily life. For that to happen, AI models should be able to run on edge devices. This means they should be small, fast, and consume low power. At BrainChip, we are targeting a regime called extreme low power, in which models should run in microwatts or milliwatts. Our approach is a combination of building both hardware and software. Sparsity sits at the core of our technology and is inherent in neural networks. It is here to stay!

To see how sparsity is applied to a real-world case scenario, please see our blog post on Akida in Space where we show how Akida is used to optimize a Satellite Workflow. Please also visit our website for general Use Cases of models we build at BrainChip.

Authors

Doug McLelland holds a doctorate in Computational Neuroscience from the University of Oxford, which he received in 2006. That was followed by two post-doctoral positions, first at Oxford, and then at the University of Toulouse from 2011; studying mechanisms of visual processing and attention. In 2017, he joined BrainChip, where he has been focused on making sure that popular models can make the most of Akida’s hardware advantages.

Ali Kayyam is a Principal Research Scientist at BrainChip, with a Ph.D. in Computational Neuroscience from the Institute for Studies in Fundamental Sciences (IPM) in Tehran, and earlier degrees in Computer Engineering from the Petroleum University of Technology and Shiraz University. He has held academic and research roles at the University of Southern California, University of Wisconsin–Milwaukee, and University of Central Florida. His work focuses on computer vision, machine learning, and neuroscience, particularly in visual attention, active learning, neural networks, and biologically inspired vision models.

The post Akida Exploits Sparsity For Low Power in Neural Networks appeared first on Edge AI and Vision Alliance.

BrainChip Launches Akida Cloud for Instant Access to Latest Akida Neuromorphic Technology

Brian Dipert — Tue, 05 Aug 2025 16:21:09 +0000

Aligns with BrainChip’s long-term strategy to accelerate customer access to its innovations and reduce development cycles

LAGUNA HILLS, Calif.–(BUSINESS WIRE)–BrainChip Holdings Ltd (ASX: BRN, OTCQX: BRCHF, ADR: BCHPY), the world’s first commercial producer of ultra-low power, fully digital, event-based neuromorphic AI, today announced launch of the BrainChip Developer Akida Cloud, a new cloud-based access point to multiple generations and configurations of the company’s Akida neuromorphic technology. The initial Developer Cloud release will feature the latest version of Akida’s 2^nd generation technology, Akida 2.

“Our developer cloud provides instant access to the latest Akida technology, as we deliver our Akida products on our roadmap to customers quickly and easily,” said Jonathan Tapson, chief development officer at BrainChip. “We’re reducing the time and effort to utilize Akida, so developers can program and execute their network models for immediate results so they can accelerate product development. We’ve shortened the time between our innovations and our customers’ ability to seamlessly and cost-effectively access Akida.”

One of the key innovations of this approach is that developers can stream their real-time data to Akida Cloud, perform inferencing, and stream the results back locally. This is demonstrated in a featured use case, eye-tracking, with immediate results used to measure accuracy and allow iteration of the training of the model to improve the results over a range of actual operating conditions. Hélder Rodríguez López, Embedded Software Research Engineer at Arquimea Research, said, “The Akida Cloud’s ability to provide us advanced access to the latest features of Akida and easily test our neuromorphic model innovations remotely is a real advantage for progressing our advanced model development programs.”

BrainChip Developer Akida Cloud key benefits include:

Rapid prototyping: Developers can access and leverage Akida’s latest features without needing physical hardware.
Developer-first access: Engineers can start development in parallel with hardware deployment.
Extensibility: As new versions and configurations of Akida are released, BrainChip can make them available via the cloud.
Partner benefits: Partners can demo working models and prototypes for customers so they can work in parallel before gaining access to Akida boards or chips.
Flexible business model: Includes limited free access and usage-based pricing with credit toward eventual hardware purchases. BrainChip also sells an Akida FPGA Developer Platform if a customer wants an on-premises solution.

Engineered for ultra-efficient, low-power AI, the second-generation Akida 2 is now available in the cloud, delivering a four times performance and efficiency gain over Akida 1. Developers can now build more sophisticated models on Akida 2 with greater accuracy thanks to new architectural support for 8-bit quantization. New models supported on Akida 2 include state-space based Temporal Event-Based Neural Nets (TENNs), which enhance the ability to process raw temporal data from video, audio, and sensors. This combination simplifies the development pipeline, reduces model size, and accelerates the deployment of advanced AI in edge applications across multiple sectors.

About BrainChip Holdings Ltd (ASX: BRN, OTCQX: BRCHF, ADR: BCHPY)

BrainChip is the worldwide leader in Edge AI on-chip processing and learning. The company’s first-to-market, fully digital, event-based AI processor, Akida, uses neuromorphic principles to mimic the human brain, analyzing only essential sensor inputs at the point of acquisition and processing data with unmatched efficiency, precision, and energy economy.

BrainChip’s Temporal Event-based Neural Networks (TENNs) build on State-Space Models (SSMs) with time-sensitive, event-driven frameworks that are ideal for real-time streaming applications. These innovations make low-power Edge AI deployable across industries such as aerospace, autonomous vehicles, robotics, industrial IoT, consumer devices, and wearables. BrainChip is advancing the future of intelligent computing, bringing AI closer to the sensor and closer to real-time.

Explore more at www.brainchip.com.

Follow BrainChip:

Twitter: https://www.twitter.com/BrainChip_inc
LinkedIn: https://www.linkedin.com/company/7792006

The post BrainChip Launches Akida Cloud for Instant Access to Latest Akida Neuromorphic Technology appeared first on Edge AI and Vision Alliance.

How to Think About Large Language Models on the Edge

Brian Dipert — Fri, 25 Jul 2025 08:02:28 +0000

This blog post was originally published at BrainChip’s website. It is reprinted here with the permission of BrainChip.

ChatGPT was released to the public on November 30th, 2022, and the world – at least, the connected world – has not been the same since. Surprisingly, almost three years later, despite massive adoption, we do not seem much closer to understanding how to use Large Language Models effectively in our personal life, but as importantly, in professional and business applications.

What LLMs Really Are

A large part of this uncertainty stems from misunderstandings about what an LLM is, and how it really works. In this article I’ll unpack some of that and hopefully give a clear picture of LLMs that enable good decision-making.

The key to understanding LLMs is that they all start as what are called Foundational LLMs. These are actually really simple mechanisms, despite being composed of billions of neural elements. The simplicity arises from the way they are trained.

The training consists of taking some text from the internet – e.g., the whole of Wikipedia in all its languages – then feeding it to the LLM one word at a time. The LLM is then trained to predict the next word most likely to appear in that context. The entirety of the apparent intelligence of an LLM is based on its ability to predict what comes next in a sentence.

This simple process can be carried out until the LLM has been trained on pretty much any text ever digitized in any language, which builds a model that has an incredible ability to build sentences and paragraphs. LLMs are amazing artifacts, containing a model of all of language, on a scale no human could conceive or visualize. What they do not do, though, is apply any value to the information, or the truthfulness of the sentences and paragraphs they have learned to produce.

An Illusion of Intelligence

I think of LLMs as being the equivalent of that one person we often have in our social circles – that person who can’t bear conversational silence and fills it with an endless stream-of-consciousness babble. What you are hearing is a grammatical flow of words, more or less connected in context, but there’s no information or usefulness to be derived from most of it.

LLMs are powerful pattern-matching machines but lack human-like understanding, common sense, or ethical reasoning. They can generate content that appears clearly inappropriate to humans but is merely a statistically probable sequence of words based on their training. For example, if you train an LLM on racist or deviant content, it will successfully reproduce this in any context, without any understanding of its meaning.

This lack of factualness notwithstanding, LLMs are amazingly convincing to talk to because they are trained that way. They know, way better than a human, precisely what to say, but they don’t in any real sense know any facts; they know what a fact is supposed to sound like, so they can convincingly produce “facts” on cue.

The Risks of Misusing LLMs

The tech industry being what it is, multiple products based on foundational LLMs have been launched, without much thinking about how they will be used to just see how people will use them. LLMs are very good at summarizing and this use case works pretty well, but the inappropriate use of LLMs as search engines has produced lots of unhappy results.

A great way to think of an LLM is that it produces a surface of language, like a giant lumpy golf putting green, in the form of interconnected words. Any input sentence, or “prompt”, is like placing a ball down and putting it. The ball rolls along, connecting words into sentences according to its direction and velocity, until it comes to rest. A different ball, hit from the same point but in a different direction, produces different sentences. An LLM simply takes a bunch of input sentences and extends them along the surface of the language. Just as a golf ball rolls downhill and along the path of least resistance, the LLM output follows the path of the most likely words and assembles them into sentences.

As long as we think of an LLM as a machine for producing the next most likely sentences and paragraphs, we can make great use of it. As soon as we try and use a raw Foundational LLM as a search engine or a source of information, it’s like talking to a pathological liar. We’re going to get a response that sounds great but has only a coincidental relationship with the truth, and the algorithm is only guessing the next words based on the previous words from the text it was trained on.

So, how should we use LLMs? The answers depend on applications, but they are incredibly good at turning pre-existing information into words. Don’t let them find (or make up) the facts, but give them facts and let them explain or impart them.

Enter RAG: Retrieval-Augmented Generation

One way to use LLMs that offers a simple approach to this problem is the RAG-LLM, where RAG stands for Retrieval Augmented Generation. RAG LLMs are usually designed for answering queries in a specific subject, for example, how to operate a particular appliance, tool, or type of machinery. The LLM works by taking as much of the textual information about the subject, user manuals and so forth, then pre-processing it into small chunks containing a few specific facts. When the user asks a question, the software system identifies the chunk of text which is most likely to contain the answer. The question and answer are then fed to an LLM, which generates a human-language answer in response to the query.

When one first builds RAG-LLMs, it seems like a completely counter-intuitive way to use LLMs. All the action of finding the answer happens before LLM involvement; why bother with that? Once you understand the issues with LLMs, it becomes obvious that RAG plays to the strengths of LLMs while mostly addressing their problems. There are many more sophisticated ways to enforce factualness on LLMs, but by and large they follow the RAG pattern in some way.

BrainChip’s Approach to LLMs at the Edge

At BrainChip, we build edge hardware systems that can execute LLMs to provide domain-specific intelligent assistance at the Edge. We also build models using an extremely compact LLM topology, Temporal Event Neural Networks (TENNs) based on state-space models combined with pre-processing information in a RAG system. Using this technology platform of optimized hardware and LLM models, BrainChip is able to demonstrate a stand-alone, battery-powered AI assistant that covers a huge amount of information. Like many companies working in this space, we believe we’re learning how to deploy LLMs in a way that starts to deliver on their massive promise in the Edge AI space.

Dr. Jonathan Tapson
Chief Development Officer, BrainChip

The post How to Think About Large Language Models on the Edge appeared first on Edge AI and Vision Alliance.

DeGirum Demonstration of Its PySDK Running on BrainChip Hardware for Real-time Edge AI

Brian Dipert — Wed, 09 Jul 2025 08:00:58 +0000

Stephan Sokolov, Software Engineer at DeGirum, demonstrates the company’s latest edge AI and vision technologies and products in BrainChip’s booth at the 2025 Embedded Vision Summit. Specifically, Sokolov demonstrates the power of real-time AI inference at the edge, running DeGirum’s PySDK application directly on BrainChip hardware.

This demo showcases low-latency, high-efficiency performance as a script performs live inference on a video stream. Sokolov also highlights the DeGirum AI Hub—a cloud-based platform that allows developers to evaluate models and test deployments.

The post DeGirum Demonstration of Its PySDK Running on BrainChip Hardware for Real-time Edge AI appeared first on Edge AI and Vision Alliance.

BrainChip Demonstration of LLM Inference On an FPGA at the Edge using the TENNs Framework

Brian Dipert — Tue, 08 Jul 2025 08:01:47 +0000

Kurt Manninen, Senior Solutions Architect at BrainChip, demonstrates the company’s latest edge AI and vision technologies and products at the 2025 Embedded Vision Summit. Specifically, Van Manninen demonstrates his company’s large language models (LLMs) running on an FPGA edge device, powered by BrainChip’s proprietary TENNs (Temporal Event-Based Neural Networks) framework.

BrainChip enables real-time generative AI at the edge with ultra-low power consumption and minimal compute resources: ideal for AI developers building smarter and more efficient edge solutions.

The post BrainChip Demonstration of LLM Inference On an FPGA at the Edge using the TENNs Framework appeared first on Edge AI and Vision Alliance.

BrainChip Demonstration of Its Latest Audio AI Models in Action At the Edge

Brian Dipert — Tue, 08 Jul 2025 08:00:37 +0000

Richard Resseguie, Senior Product Manager at BrainChip, demonstrates the company’s latest edge AI and vision technologies and products at the 2025 Embedded Vision Summit. Specifically, Van Resseguie demonstrates the company’s latest advancements in edge audio AI.

The demo features a suite of models purpose-built for real-world applications including automatic speech recognition, denoising, keyword spotting, and LLM integration. See how BrainChip’s neuromorphic technology enables low-power, real-time audio processing for smarter, more responsive edge devices for smart home, automobiles, wearables and more.

The post BrainChip Demonstration of Its Latest Audio AI Models in Action At the Edge appeared first on Edge AI and Vision Alliance.

“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presentation from BrainChip

Brian Dipert — Thu, 05 Jun 2025 08:00:00 +0000

Tony Lewis, Chief Technology Officer at BrainChip, presents the “State-space Models vs. Transformers for Ultra-low-power Edge AI” tutorial at the May 2025 Embedded Vision Summit.

At the embedded edge, choices of language model architectures have profound implications on the ability to meet demanding performance, latency and energy efficiency requirements. In this presentation, Lewis contrasts state-space models (SSMs) with transformers for use in this constrained regime. While transformers rely on a read-write key-value cache, SSMs can be constructed as read-only architectures, enabling the use of novel memory types and reducing power consumption. Furthermore, SSMs require significantly fewer multiply-accumulate units—drastically reducing compute energy and chip area.

New techniques enable distillation-based migration from transformer models such as Llama to SSMs without major performance loss. In latency-sensitive applications, techniques such as precomputing input sequences allow SSMs to achieve sub-100 ms time-to-first-token, enabling real-time interactivity. Lewis presents a detailed side-by-side comparison of these architectures, outlining their trade-offs and opportunities at the extreme edge.

See here for a PDF of the slides.

The post “State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presentation from BrainChip appeared first on Edge AI and Vision Alliance.

BrainChip - Edge AI and Vision Alliance

Right Sizing AI for Embedded Applications

1. Process Technology

2. Alternative Compute Architectures

Final Thoughts

BrainChip Unveils Breakthrough AKD1500 Edge AI Co-Processor at Embedded World North America

About BrainChip Holdings Ltd (ASX: BRN, OTCQX: BRCHF, ADR: BCHPY)

Unleash Real-time LiDAR Intelligence with BrainChip Akida On-chip AI

Accelerating LiDAR Point Cloud with BrainChip’s Akida PointNet++ Model.

The Critical Importance of 3D Spatial Perception

The Problem: Cloud-Dependent LiDAR Creates Dangerous Delays

The Solution: Unleashing Real-Time 3D Intelligence with BrainChip’s AkidaTM

How Akida Point Cloud Delivers Speed and Efficiency

Transforming Industries with Real-Time LiDAR Intelligence

Akida Exploits Sparsity For Low Power in Neural Networks

Exploiting Sparsity with Akida: Event-based Convolution

Sparsity in Actual Models

Takeaways

Authors

BrainChip Launches Akida Cloud for Instant Access to Latest Akida Neuromorphic Technology

Aligns with BrainChip’s long-term strategy to accelerate customer access to its innovations and reduce development cycles

BrainChip Developer Akida Cloud key benefits include:

About BrainChip Holdings Ltd (ASX: BRN, OTCQX: BRCHF, ADR: BCHPY)

How to Think About Large Language Models on the Edge

What LLMs Really Are

An Illusion of Intelligence

The Risks of Misusing LLMs

Enter RAG: Retrieval-Augmented Generation

BrainChip’s Approach to LLMs at the Edge

DeGirum Demonstration of Its PySDK Running on BrainChip Hardware for Real-time Edge AI

BrainChip Demonstration of LLM Inference On an FPGA at the Edge using the TENNs Framework

BrainChip Demonstration of Its Latest Audio AI Models in Action At the Edge

“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presentation from BrainChip

The Solution: Unleashing Real-Time 3D Intelligence with BrainChip’s Akida^TM