With world’s first AI sensor, Sony takes aim at the edge

When it comes to video cameras and edge computing for commercial applications, there are two typical strategies: First, use a bunch of dumb cameras with no intelligence, send the data to a localized server for pre-processing and then on to the cloud for storage and analysis (or skip the pre-processing and send directly to the cloud). Second, use a smart camera and do all the processing locally, then send to the cloud for archiving.

There are trade-offs with both.  Cheap cameras typically require a massive and costly IT infrastructure, and there’s a latency issue when the processing takes place somewhere else. Expensive cameras are, well, expensive, power hungry, and likely require some network capability. Both may require installation expertise.

“Imagine a retail outlet with 1,000+ cameras that is streaming a lot of video,” said Sony vice president of business and technology innovation Mark Hanson. “If the camera feeds are pre-processed, you have to put multiple servers in to connect all of these cameras, and if you are streaming 100% of your video to the cloud, you’re paying a lot of backhaul and storage costs, plus there is latency. Those issues, coupled with growing privacy concerns--particularly in Europe where requirements restrict the ability for images of private individuals to be distributed beyond a local LAN--got us thinking there must be a better way ” 

The “us” in that sentence is a new division at Sony. Called the Sony Semiconductor  Solutions Group, its job is to get a little closer to the customer in order to understand their pain points and needs.

In fact, it was a deeper understanding of the inherent challenges and the real struggles that customers, particularly in retail, were having with the implementation of edge computing solutions—many of them custom projects—that led Hanson’s group to develop the world’s first AI image sensor.

“Our thinking was that if we could tackle the things that complicate the process at the edge, we will improve time-to-market for the customer at a lower cost,” said Hanson

The IMX500 image sensor—the first to be equipped with AI processing capability--is a logical extension of Sony’s existing image sensors that utilize backside illumination. This feature was made possible by taking the logic chips historically placed on the front side of the sensor and relocating them to the back side, thereby giving the sensor more pixels to improve the light sensitivity. A wide bus structure carries the data from the image sensor to the image signal processor (ISP).

On Sony’s new IMX500, signals acquired by the pixel chip (with approximately 12.3 effective mega-pixels for capturing information) are similarly run through an ISP and AI processing is done in the process stage on the logic chip. Object recognition happens in as little as 3.1 milliseconds. The extracted data is output as metadata, reducing the amount of data handled.

Sony intelligent vision sensor block diagram

The IMX500's logic chip has a conventional image sensor operation circuit and a Sony digital signal processor dedicated to AI signal processing, and memory for the AI model. 

The lightning speed of image recognition and the ability to throw all extraneous data out before it hits the network opens up the possibility of new applications, from the detection of face masks to differentiating between whether a human or a robot is entering a restricted area of a factory. People counting, including demographic data such as kids versus adults, is another potential application.

Sony started shipping the bare chip in April and had expected to ship a packaged version (the IMX501) of it in June, though Hanson was unable to confirm that with FierceElectronics.

Hanson says that in order for the technology, which he coins an “edge image platform” to gain traction, it must be easy to test and deploy at scale at a cost that doesn’t blow up the customer’s bottom line.

In order to do that, Hanson envisions a suite of simple hardware solutions with a menu of options or use cases for the customer to choose from. No massive customization or implementation nightmares.

Keys to realizing that level of simplicity will be reference designs and partnerships with organizations like Microsoft that have an existing development community, systems integrators, and independent software vendors that have experience with the applications Sony is targeting, as well as expertise in machine learning and tiny ML oriented engineering, he said.

In mid-Sony announced a partnership with Microsoft Azure to implement some smart camera solutions using Microsoft’s infrastructure.

While Hanson said theoretically there is nothing that would prevent the stack-enabled technology from being enhanced, say with depth sensing, the primary focus today is on developing applications for the IMX500. Initially, Sony plans to focus on large deployments of AI-enabled cameras in retail and factory automation and on the development of fundamental inference models.

“We are the first ones to do this, so there is a lot of learning that needs to happen in order to understand how to best to commercialize the technology,” said Hanson. “Part of the strategy is to do proofs of concept so that we can quickly learn what works and what doesn’t and feed our findings back into the engineering process.”

RELATED: The relentless rise of CMOS image sensors