The field of artificial intelligence and machine learning has seen a considerable increase in the amount of computing power needed to achieve specific tasks in recent years. With traditional machine learning methods, the relationship between the computer needed to achieve the intended goal of the model is directly proportional to the amount of data required to train these models. In addition, as models get larger and more sophisticated, the computational requirements increase significantly.
Image Credit: Christoph Burgstedt/Shutterstock.com
Deep learning, however, is a domain of machine learning that deals with neural networks containing a large number of layers in their structure. Each of these layers is usually connected to each other in a dense network, meaning that the more extensive the network, the higher the computational cost. These networks have seen major adoption in the field due to their high performance and relatively easy implementation for various tasks.
Current Processing Power
In the early days, central processing units (CPUs) were the go-to processor for all artificial intelligence work, but these soon were replaced by the graphics processing unit (GPU). This switch was mainly due to the fact that GPUs carry hundreds of cores aptly suited to the simple matrix multiplications required in a deep neural network.
Compared to CPUs, which typically carry sixteen cores at most, saving space is cause enough to explain the use of GPUs for deep learning. In addition to this, graphics processors offer an unmatched level of parallelization, meaning that computation can occur simultaneously throughout a network layer, compared to the serial computation offered by the CPU. As a result, the GPU drastically reduces the time needed for the extensive computation required for deep learning.
Despite these benefits, GPUs still have their own challenges. Firstly, the architecture of a GPU is built specifically for operations on vectorized data, while CPUs largely operate on scalar data. This difference means that GPUs cannot be used in isolation for training a deep learning model.
While the GPU may handle the calculations required to train a network, a CPU would still be required to cope with all other aspects of the training. Furthermore, the onboard memory (or cache) that graphics processors hold is relatively small, meaning that there is a need for data to be stored elsewhere and ferried to the processer when computation is required. This constant transfer of data is the main bottleneck of current deep learning hardware, and as such, various companies have sought ways to combat this.
Advancements in Processing Power
To overcome these limitations, deep learning processors are developed with larger memory and processing power.
The goal is to make deep learning more effective by optimizing the hardware specifically for the task of deep learning. Google has been one of the many companies to develop specialized processors. These “Tensor Processing Units,” or TPUs for short, are chips with custom silicon that are optimized for machine learning. They boast a high onboard cache memory, with extremely high bandwidth for data transfer.
These devices are also designed to run Google’s own task-optimized deep learning software, TensorFlow, which comes on board, making the process of training a network much more streamlined for anyone wishing to use one of these. Similar to GPUs, TPUs are optimized for matrix multiplication and have shown their best performance in the domain of computer vision – notorious for high dimensional data sets and high computer requirements – and operate around 50 times faster than the best GPUs.
Another company to take on the challenge of designing improved AI accelerators is Bristol based company, Graphcore. Their main innovation is their ‘’Intelligence Processing Unit’’ (IPU), which has been shown to outperform many of the current leading tensor processors – such as Nvidia’s A100 – by orders of magnitude on baseline tests. The price of this processor is also cheaper and more compact than the A100, making it the option garnering the most interest in the research community.
With both the IPU and TPU, the data transfer bottleneck ever-decreases with each hardware update released by these companies, allowing for faster learning, reduced computation, and more sophisticated models to be built. Despite the advancements, these devices still require a CPU to run the entire system; a stand-alone system has not yet been achieved.
The Future of Optimized Deep Learning Processors
While many companies have focussed on simply optimizing the hardware for deep learning, the AI start-up Neural Magic believes that there is another way to achieve the same level of computing without extensive chip design.
Their approach is to optimize deep learning algorithms to be able to work efficiently on a CPU. The Deep Sparse software architecture allows CPUs to deliver performance typically only seen on a GPU, a key property in an age where the form factor of processors is equally important to their processing power.
With CPUs being significantly smaller than all other types of processors, machine learning algorithms can be implemented on a much broader range of devices.
Over the past year, there have been extreme supply chain shortages, which, when combined with the depletion of available silicon, have significantly increased the price of manufacturing and purchasing processors. Neural Magic’s approach may allow researchers to use the already existing computing power available without the need for specialist hardware, making deep learning technology accessible to the masses.
Continue reading: Potential Applications of Quantum Computing in Robotics.
References and Further Reading
Berggren, K., et al., (2020) Roadmap on emerging hardware and technology for machine learning. Nanotechnology, 32(1), p.012002. Available at: https://iopscience.iop.org/article/10.1088/1361-6528/aba70f
Iqbal, T. and Qureshi, S., (2020) The survey: Text generation models in deep learning. Journal of King Saud University - Computer and Information Sciences,. Available at: https://doi.org/10.1016/j.jksuci.2020.04.001
EE Times Asia. (2021) News - EE Times Asia. [online] Available at: https://www.eetasia.com/news/
Moore, S., (2021) Here’s How Google’s TPU v4 AI Chip Stacked Up in Training Tests. [online] IEEE Spectrum. Available at: https://spectrum.ieee.org/heres-how-googles-tpu-v4-ai-chip-stacked-up-in-training-tests
Lunden, I., (2021) TechCrunch is now a part of Verizon Media. [online] Techcrunch.com. Available at: https://techcrunch.com/2020/12/28/ai-chipmaker-graphcore-raises-222m-at-a-2-77b-valuation-and-puts-an-ipo-in-its-sights/
Hao, K., (2020) The startup making deep learning possible without specialized hardware. [online] MIT Technology Review. Available at: https://www.technologyreview.com/2020/06/18/1003989/ai-deep-learning-startup-neural-magic-uses-cpu-not-gpu/