MIT and NVIDIA have developed two techniques that improve the processing of sparse tensors, a type of data structure used in high-performance computing tasks. These techniques have the potential to enhance the performance and energy-efficiency of systems like generative artificial intelligence models. Tensors are data structures used by machine-learning models. The new methods focus on efficiently utilizing the zero values in the tensors. By skipping over these zeros, both computation and memory can be saved. Additionally, the tensor can be compressed to store a larger portion in on-chip memory. However, there are challenges in exploiting sparsity. Finding nonzero values in a large tensor is difficult, and the number of these values can vary across different regions of the tensor. MIT and NVIDIA researchers have developed solutions to address these problems. One technique enables hardware to efficiently find nonzero values in a wider variety of sparsity patterns. The other technique handles cases where data doesn’t fit in memory, increasing the utilization of the storage buffer and reducing off-chip memory traffic. These methods improve the performance and reduce the energy demands of hardware accelerators designed for processing sparse tensors. The MIT researchers have designed a hardware accelerator called HighLight that can handle various sparsity patterns and perform well even in models without zero values. They utilize a technique called “hierarchical structured sparsity” to represent different sparsity patterns efficiently. By dividing the tensor values into smaller blocks with their own sparsity patterns, and combining those blocks into a hierarchy with simple patterns, HighLight can find and skip zeros more efficiently. On average, the accelerator design has about six times better energy efficiency than other approaches. The researchers plan to apply hierarchical structured sparsity to more machine-learning models and tensor types in the future. Another technique developed by the researchers leverages sparsity to more effectively move and process data on computer chips. Since tensors are often larger than the memory buffer on a chip, the chip processes one chunk at a time, called a tile. To maximize buffer utilization and limit off-chip memory access, researchers seek to use the largest tile that fits. However, in sparse tensors, many values are zero, allowing for a larger tile size. But since the number of zero values can vary across different regions, selecting the right tile size is challenging. To address this, the researchers propose “overbooking,” where a larger tile size is selected, knowing that usually, the tiles will fit. In cases where a tile has more nonzero values than can fit, those values are bumped out of the buffer. The hardware only re-fetches the bumped values without processing the entire tile again. This technique, called Tailors, improves memory utilization. The researchers also developed an approach called Swiftiles to estimate the ideal tile size, taking advantage of overbooking. This reduces the number of times the hardware needs to check the tensor for the ideal tile size. These techniques enhance the processing of sparse tensors and improve energy-efficiency. Swiftiles Swiftly estimates the ideal tile size, while Tailors handles overbooking. Both techniques are named after Taylor Swift, in homage to her recent tour with overbooked tickets. In conclusion, the techniques developed by MIT and NVIDIA researchers optimize the processing of sparse tensors, improving performance and energy-efficiency. These advancements have implications for high-performance computing tasks and generative artificial intelligence systems.