AI at the Edge

By Lynnette Reese May 13, 2019 2:51pm

Sensors bring information to data science across a growing variety of markets, including industrial, agriculture, medical, aerospace, military, and many others. Although the descending cost of ever-higher performance processors is key to the burgeoning Internet of Things (IoT), storing and sending enormous amounts of data to cloud servers can be costly, slow, and expand exposure to hackers. Locating computing power near sensors and other data-collection devices can boost the overall efficiency of data processing.

Networks of sensors can send data to high-performance, memory-rich processing systems situated between the sensor network and the cloud, referred to as edge computing. At a minimum, an edge computing platform is meant to parse out irrelevant data and decide what data is worth sending on to the cloud for further action. Edge computing enables numerous sensors to operate without requiring enough energy, memory, or processing power to reach the cloud individually. Edge computing can also reduce traffic spikes, latency (delay), and security risks in transmitting data directly from sensor-to-cloud.

Figure 1: Edge computing, a.k.a. “a smart gateway,” enables numerous sensors to operate without requiring enough energy, memory, or processing power to reach the cloud individually. (Source: OpenFog Consortium “OpenFog Reference Architecture for Fog Computing”)

Edge computing can be accomplished with a simple If-Then statement to weed out predictable unwanted data that sensors pick up. However, Artificial Intelligence (AI) is increasingly deployed in applications for its ability to make intelligent, fine-grained decisions, much like humans do. AI and IoT can work well together to enhance edge computing beyond simple filters and reduce transmission cost and latency, intelligently culling repetitive, incorrect, or improbable data and noise.

A broad term, AI can be applied in different ways; through expert systems, natural language processing, robotics, computer vision, computerized speech recognition, and machine learning (ML). ML, a subset of AI, learns to distinguish patterns by training on enormous datasets. Deep Learning (DL) is a subset of ML. Making intelligent decisions using DL at the edge can require high-performance processors that are low cost, power efficient, and have copious memory.

Edge processors using AI need to run compute-intensive decision-making (inference) models that are created by training layers of memory in the deep learning process. An inference model is an algorithm that uses a data base and working memory to infer (deduce) information from data and analysis. ML is a method of programming a computer without a traditional method of explicitly writing a program. ML creates an inference model by using huge amounts of data to train a neural network. Neural networks are layers of memory that were inspired by the way the human brain works. In layman’s terms, a chunk of computer memory represents a brain cell that holds data and a weighted-value.

The developer’s difficulty is to gather accurate data in a large enough data set, as well as maintain a clear objective. Just gathering the data set can be tedious. For example, an inference engine that compares a real-time image of a skin spot to an established inference model would be more accurate if the inference model were trained on millions of images of various skin spots that are accurately labeled as skin cancer or not. AI is nascent. How does one obtain a million different photos that are accurately labeled? Training with enormous amounts of good, external data makes a difference in the level of accuracy in the inference model.

Training an inference engine requires wide-ranging, objective forethought. Given another example, suppose that Janet is an AI engineer tasked with identifying a gymnast in any video available online. A computer can rapidly analyze each frame of video as an image, so Janet trains an inference model to identify gymnasts by loading a training set of 100,000 images that identify gymnast(s) or no gymnast in each image. Without thinking about how the data set influences the outcome, the inference model might identify an image of only parallel beam bars as “a gymnast.” The data is the source of the problem because many of the data images that affirm the presence of a gymnast also include parallel beam bars. Several images with parallel beam bars and no gymnast need to be in the data set, identifying “no gymnast.” Thus, an inference model must be well planned and tested before deployment.

The challenge in achieving edge computing for IoT with AI is in combining high-performance with low power consumption and large amounts of local memory.

Hardware platforms for supervised machine learning include high-performance Application-Specific Integrated Circuits (ASICs), Graphical Processing Units (GPUs), Field-Effect Programmable Gate Arrays (FPGAs), or Central Processing Units (CPUs) that consume tens to hundreds of watts. However, IoT and their sensors gathering data typically operate on a small power budget for long periods. AI can also require a lot of working memory. Inference engines repetitively fetch weighted data through successive layers stored in memory. Therefore, the challenge in achieving edge computing for IoT with AI is in combining high-performance with low power consumption and large amounts of local memory, which can require accepting design trade-offs. The inference engine performs rapid, repetitive mathematical functions (i.e., convolutions) on the data and stores the result. To achieve lower power consumption, the data can be smaller. Continually processing many repetitive operations on 8-bit wide data consumes less power than 32-bit wide data. The trade-off is that some accuracy is lost, but only in the range of a few percentage points. For some applications, it might be acceptable to increase the number of sensor data transmissions from the edge to the cloud for further analysis.

As for local memory, storing processor output nearby means that the data doesn’t have to travel far (reducing local latency and power consumption). As the cost of memory decreases, inference at the edge becomes more feasible. Possible strategies for lowering energy consumption in fetching and storing millions of chunks of data and their weighted values in layers represented by arrays might succeed with nonvolatile analog memory. For example, an AI processor with arrays of a new storage data technology, Resistive Random Access Memory (ReRAM or RRAM), can keep learned weights’ values by utilizing memory in a clever way.

According to Diederik Verkest, Ph.D. and Distinguished Member of Technical Staff at Imec, “The heart of such an AI processor are thus memory arrays that permanently store the values of the learned weights using analog non-volatile devices, e.g., resistive RAM technology. Each such array represents one layer of the neural network. And in the array, the learned weights are encoded in the individual device conductances. So how are we then to multiply and add these weights with the input value? By setting the input values as the word line voltages of the ReRAM arrays. Each cell’s current will then be the multiplication of the weight and the input value (Ohm’s law). And the word line’s current will be the summation of the cell currents in that line (Kirchhoff's law). That way, we can effectively implement convolutions without having to fetch and move the weights over and over again.”

AI at the edge is gaining traction with new technology at the semiconductor level concerning the need for large amounts of memory. Researchers from Stanford University and CEA-Leti (France) presented a chip at the 2019 International Solid-State Circuits Conference that integrates both ReRAM and multiple-bit non-volatile memory (NVM) in a single chip, claiming to deliver ten times the energy efficiency of standard flash memory. The chip merges both processing capability and memory, eliminating the need to pass data between separate chips. Researchers Mitra and Wong, working on the memory portion of the chip, devised a way to preserve five values rather than two in each memory cell. Their prototype enables “storage density to pack more data into less space than other forms of memory; energy efficiency that won’t overtax limited power supplies; and the ability to retain data when the chip hibernates, as it is designed to do as an energy-saving tactic,” according to a news release from Stanford. The team also improved the endurance of ReRAM, with testing that suggests a ten-year life span.

Cars may someday be IoT on wheels. In some cases, it could be unfeasible for IoT to send enormous amounts of data to a cloud where an AI engine lives. Self-driving vehicles would have no time to send LiDAR data to a cloud-based inference engine for a decision to stop the vehicle if an instant reaction to a sensor’s input saves limb, life, or property. Inference engines need to be where sensors reside. Autonomous cars must act without continuous cloud connectivity, which requires transforming them into high-performance mobile edge computing nodes.

Figure 2: Autonomous cars must act without continuous cloud connectivity, which requires transforming them into high-performance mobile edge computing nodes. (Source: AI Storm.ai)

AIStorm, another company attempting to push AI to the edge, aims to cut power and cost by processing neural networks at the level of sensor signals. AIStorm’s website presents a high-performance “AI-in-Sensor” System-on-a-Chip (SOC). Real-world sensor data is analog, but processors are digital. The extra step of constantly digitizing sensor signals for processor-ready input can introduce unwanted latency, distortion, and noise. AIStorm’s goal is to process a sensor’s analog signal in real-time for AI use. The company recently closed $13.2 million in financing from Egis Technology.

Figure 3: AIStorm aims to cut power and cost by processing neural networks at the level of sensor signals. The goal is to process a sensor’s analog signal in real-time for AI use, skipping signal digitization.

According to Todd Lin, COO of Egis Technology Inc., “Edge applications must process huge amounts of data generated by sensors. Digitizing that data takes time, which means that these applications don’t have time to intelligently select data from the sensor data stream, and instead have to collect volumes of data and process it later. For the first time, AIStorm’s approach allows us to intelligently prune data from the sensor stream in real time and keep up with the massive sensor input tasks.” AIStorm’s AI-in-Sensor, once it’s available, would remove digitizing, latency, distortion, communication buses, and the impediment of high power consumption by going directly from analog signal to machine learning input, skipping Analog-to-Digital converters.

AI is improving by leaps and bounds, and sensors play a huge part in collecting the data that feeds the machine. Part of the allure for technologists is the potential productivity gained with AI. AI can be harnessed by app developers that know little about machine learning, which opens AI to consumers and small businesses, not just large corporate entities, and extends the proliferation of data-collecting sensors. It’s not far-fetched to use data and AI as the next-best-thing to a crystal ball by pulling billions of points of data together in an algorithm that paints a picture minutes or days into the future with a reasonable expectation of accuracy, at least most of the time. However, just like fast computer trading algorithms caused the first notable flash crash in 2010 by quickly buying and reselling contracts, it’s possible for AI to run away from us despite our best intentions.

artificial intelligence autonomous vehicle IoT & Wireless