NXP likes to do things differently—to lead and innovate. We’ve being
successfully supporting camera modules interfaces on i.MX applications
processors and enabling machine learning on shared resources such as CPUs and
GPUs on many NXP SoCs. While this still works well, depending on the
application requirements, this blog explains why we decided to boost it up and
add both an image signal processor (ISP) and machine learning accelerator to
select applications processors, including the
i.MX 8M Plus.
The Importance of Machine Learning Continues to Grow
Performing machine learning in the cloud is the key technology supporting
voice assistants in smart phones and smart speakers. It also enables social
media sites and even cell phones to group together photos containing a given
person. While these use cases all rely on machine learning running in a server
somewhere in the cloud, the real challenge that NXP enables is machine
learning at the edge. This is where all the machine learning inference runs
locally on an edge processor, such as the
i.MX 8M Plus. Running the ML inference at the edge means that the application will
continue to run even if access to the network is disrupted — a critical feature for
applications such as surveillance or anomaly detection, or when operating in
remote areas without network access. It also provides much lower latency in
making a decision than would be the case if the data had to be sent to a
server, processed, and the result sent back. Low latency is important, for example, when performing industrial factory floor visual inspection and needing
to decide whether to accept or reject products whizzing by.
Another key benefit of machine learning on the edge is safeguarding data. The
proprietary data collected, such as employee, production and logistics data
captured by the edge device is processed and stays local in the edge.
Information is not sent to the cloud for processing, where it can be recorded
and tracked. The company’s privacy remains intact, giving businesses the
choice to decide whether or not to share information in the cloud.
Machine Learning at the Edge, How Much Do You Need?
So now, given the need for machine learning in the edge, the question becomes
how much machine learning is needed. One way to measure machine learning
accelerators is the number of operations (usually 8-bit integer multiplies or
accumulates) per second, usually referred to as TOPS, tera (trillion)
operations per second. It is a rudimentary benchmark, as overall system
performance will depend on many other factors too, but is one of the most
widely quoted machine learning measurements.
It turns out that to do full speech recognition (not just keyword spotting) in
the edge takes around 1-2 TOPS (depending on algorithm, and more if you
actually wish to understand what the user is saying rather than just
converting from speech to text). And to perform object detection (using an
algorithm such as Yolov3) at 60fps also takes around 2-3 TOPs. That makes
machine learning acceleration such as the 2.3TOPS of i.MX 8M Plus the sweet
spot for these type of applications.
Next: The Image Signal Processor (ISP)
ISP functionality always exists in any camera-based system, although sometimes
it can be integrated into either a camera module or embedded in an
applications processor and potentially hidden to the user. ISPs typically handle many types of image enhancement with their key purpose of converting the one-color per pixel output of a raw image sensor into the RGB or YUV
images that are more commonly used elsewhere in the system.
Applications processors without ISPs work well in vision-based systems when
the camera inputs are coming from network or web cameras, typically connected
to the applications processor by Ethernet or USB. For these applications, the
camera can be some distance, even up to 100m or so away from the processor.
The camera itself has a built-in ISP and processor to convert the image sensor
information and encode the video stream before sending it over the network.
Applications processors without ISPs also work well for relatively
low-resolution cameras. At resolutions of 1 megapixel or below, image sensors
often have an embedded ISP within them and can output RGB or YUV images to an
applications processor, leaving no need for an ISP within the
processor.
But at a resolution of around 2 megapixels (1080p) or higher, most image
sensors do not have an embedded ISP and instead rely on an ISP somewhere else
in the system. This may be a standalone ISP chip (which works, but adds power
and cost to the system) or it may be an ISP integrated within the applications
processor. This is the solution NXP chose to take with the i.MX 8M Plus –
offering high quality imaging and an optimized imaging solution,
particularly at 2 megapixel and higher resolutions.
With intelligent vision-based systems, smart factories can get a boost in
productivity, quality and safety.
Driving an Intelligent Breed of Edge Devices
Putting all of this together, the combination of a 2.3TOPS machine learning
accelerator and an ISP, the i.MX 8M Plus applications processor is well
positioned to be a key element of embedded vision systems at the edge, whether
it be for a smart building, smart city or other industrial IoT applications.
With its embedded ISP, it can be used to create high image quality optimized
systems connecting directly to local image sensors, and even feed this image
data to the latest machine learning algorithms, all offloaded in the local
machine learning accelerator.
The i.MX 8M Plus optimized architecture for machine learning and vision
systems enables edge device designers to do things differently – to lead and
innovate, as NXP does. They have in their hands a powerful machine learning
capability aligned with a high definition camera system that allows devices to
see clearer and further. A new set of innovative opportunities are open and
emerging in the embedded landscape.
For more information on the i.MX 8M Plus visit:
i.MX 8M Plus
and
i.MX 8M Plus Tools and Software.