AI Models on Analog In-Memory Computing Chips
Category Artificial Intelligence Thursday - October 12 2023, 15:34 UTC - 1 year ago IBM has designed an AI chip that encodes up to 45 million parameters and can perform matrix-vector multiplications in parallel on 'memory tiles' for better energy efficiency. It has 34 tiles, and can achieve up to 12.4 TOPS/W chip-sustained performance. Experiments have been conducted that demonstrate software-equivalent accuracy for a keyword-spotting network, and near-software-equivalent accuracy on a much larger MLPerf8 RNNT.
The design that the team at IBM Research have created can encode 35 million phase-change memory devices per chip; in other words, models with up to 17 million parameters. While this isn’t yet at a size comparable to today’s cutting-edge generative AI models, combining several of these chips together has allowed it to tackle experiments on real AI use cases as effectively as digital chips could.
IBM has optimized the multiply-accumulate (MAC) operations that dominate deep-learning compute. By reading the rows of an array of resistive non-volatile memory (NVM) devices, and then collecting currents along the columns, the team showed they can perform MACs within the memory. This eliminates the need to move the weights between memory and compute regions of a chip, or across chips. The analog chips can also carry out many MAC operations in parallel, which saves time and energy.
Models of artificial intelligence (AI) that have billions of parameters can achieve high accuracy across a range of tasks but they exacerbate the poor energy efficiency of conventional general-purpose processors, such as graphics processing units or central processing units. Analog in-memory computing (analog-AI) can provide better energy efficiency by performing matrix–vector multiplications in parallel on ‘memory tiles’. However, analog-AI has yet to demonstrate software-equivalent (SWeq) accuracy on models that require many such tiles and efficient communication of neural-network activations between the tiles. Here we present an analog-AI chip that combines 35 million phase-change memory devices across 34 tiles, massively parallel inter-tile communication and analog, low-power peripheral circuitry that can achieve up to 12.4 tera-operations per second per watt (TOPS/W) chip-sustained performance. We demonstrate fully end-to-end SWeq accuracy for a small keyword-spotting network and near-SWeq accuracy on the much larger MLPerf8 recurrent neural-network transducer (RNNT), with more than 45 million weights mapped onto more than 140 million phase-change memory devices across five chips.
Share