Real-Time AI Gets Close To A Brainwave
Microsoft has the Azure ML platform to develop machine learning applications in the cloud, and Windows ML lets you bring your models to desktop PCs and edge systems using the ONNX standard.
Now it’s bringing machine learning to a new platform: Azure’s high-performance FPGA (field-programmable gate array) systems with a public beta of its Project Brainwave service, originally announced a year ago.
General-purpose CPUs like those in our PCs, our datacenters, and the public cloud aren’t the fastest way to process data. They’re designed to be adaptable, running many different workloads. That gives them an economic advantage, as manufacturers can make many millions of them with no need to know how they’re going to be used.
But if you go back to the early days of computing, many of the fastest systems were single-purpose, using dedicated silicon to solve specific problems.
That approach isn’t really possible today, outside of scientific research or the military, where there’s the budget to build those single-purpose machines. So how then to give the cloud a supercomputer-like performance on commodity hardware, such as for machine learning?
Luckily, there’s a middle road: programmable silicon. FPGAs are reconfigurable arrays of logic gates, set up as blocks of functionality that can be connected to implement specific functions. Unlike traditional logic circuits, they often also contain memory elements, increasing the complexity of the functions that can be implemented.
But FPGAs aren’t for everyone, because they can’t be programmed in familiar languages or using everyday development tools. To get the most out of a FPGA, you need to use silicon design tools and languages to define the functions you want to deliver in your FPGA.
Internally, Microsoft has developed a C-derived language to program Azure’s FPGAs, and Amazon is doing the same with its AWS FPGA systems. The startup Reconfigure.io is developing tools for programming FPGAs using Google’s Go language, targeting AWS’s F1 instances.
All of these are great first steps, but you need to be careful: You can physically damage hardware if you make a programming error.
Once programmed, FPGAs are powerful tools. They can simulate new CPUs, handle complex mathematical operations, even crack codes. In Azure, they’re being used to accelerate the public cloud, with support for FPGA-based coprocessors in Microsoft’s Project Olympus open hardware designs, and powering much of Azure’s advanced networking features as well as elements of the Bing search hardware.
So, with FPGAs in the public cloud, why not find a simpler way to program them that reduces the associated risk? By using FPGAs as a runtime host for higher-level machine learning models, Project Brainwave tries to create what Microsoft calls a “system for real-time AI.”
You don’t need to write code for the FPGA; instead, you run your model on it, using it as a hardware accelerator. Microsoft’s own developers handle the complex task of programming and updating the FPGAs, providing the underlying neural networks.
It uses Azure’s directly attached FPGA hardware to process data that needs to be analysed as soon as it arrives. After all, there’s no point in using cloud-based machine learning for image processing if it takes hundreds of milli-seconds to process a single frame, when you’re working with streamed data from tens or even hundreds of cameras across an entire production line.
Project Brainwave: a neural net accelerator
At the heart of Project Brainwave is a direct neural net engine that’s been implemented on a FPGA. That FPGA is then used as part of a pool of FPGAs connected to Azure’s high-performance network. Applications using Brainwave run on a dedicated server instance that can call into that pool of digital neural networks (DNNs), running a pre-trained machine learning model.
Data is pushed from the server straight to the DNNs, which deliver results as fast as data arrives. The result is an extremely fast machine learning system that can operate at line speed. If you need more resources, add a new FPGA to the pool and marshal the requests appropriately.
Unlike Google’s TPUs, Brainwave’s FPGAs are flexible and can be reprogrammed to take advantage of new DNN types. As a result, Azure can tune its machine learning engine performance for both changes in loads and in the type of neural networks it supports.
The FPGAs Microsoft uses have a mix of ASIC-based signal processing units and logic blocks, with a turnaround time of weeks between new code being developed and being implemented in hardware. Microsoft’s changes can be made to FPGAs that have already been deployed in Azure datacenters. By contrast, devices like the Google TPU must be redesigned, manufactured, and deployed in a new generation of hardware.
Using Project Brainwave in your Applications
Microsoft is starting to open Brainwave to paying customers, with the launch of a public beta. Full pricing hasn’t been announced, but customers will be able to bring existing machine learning models to the platform. There’s currently support for both Microsoft’s own CNTK and Google’s TensorFlow, with others to follow.
Models are imported and converted to run on FPGAs using software running on the Brainwave server. It’s a relatively simple process, and there’s additional scope for using ONNX to handle imports and conversions from other platforms, taking models and converting them to CNTK on Azure ML, before testing and then downloading to a Brainwave FPGA pool.
While Brainwave currently only uses the ResNet-50 DNN, that’s a common enough neural network that you can train it to work with many different machine learning models.
FPGA-based DNNs are also suitable for use on the edge, providing a new level of intelligence to Microsoft CEO Satya Nadella’s notion of the intelligent edge. Microsoft is working with partners to put Brainwave FPGAs in edge servers, to provide fast machine learning systems where they’re needed, as standalone systems or in the future as part of platforms like its Azure Stack or Azure’s Data Box data ingestion hardware.
AI on the Edge
Bringing programmable hardware from the cloud and into local datacenters or on the edge of the network is a logical step. To get the most from real-time machine learning systems, your models need to be close to the data, especially if you’re working with high-volume data sources where low latency is a business requirement.
Training models on cloud machine learning systems use cloud compute resources where they’re cheap, while bringing the trained model to the data source reduces bandwidth and storage costs.
Real-time AI is a trade-off among compute, network, and storage; Project Brainwave helps balance those elements and gives you high performance machine learning hardware right where you need it.
Support for common machine learning environments reduces risks, and lets you treat machine learning as a cross-platform, cross-cloud resource. Maybe your code runs on Amazon Web Services or on Google Cloud Platform. That’s fine.
If you’re using Project Brainwave at the edge, you can access similar resources to TPUs and F1 servers without paying for MLPS WAN connections to the cloud.
Azure ML may have been originally intended to be a cloud-only service, but with Project Brainwave you can put it where it’s needed: in an intelligent cloud with an intelligent edge.
You Migh Also Read:
A Search Tool That Allows Anyone To Access Cloud Documents:
Microsoft Cloud Is Hosting US Spy Data: