Algorithm-Hardware Co-Design for Ultra-Low-Power Large Language Models
Algorithm-Hardware Co-Design for Ultra-Low-Power Large Language Models
Steven Abreu,Jason Eshraghian
TLDR
This introductory review examines the algorithm-hardware co-design strategies necessary for developing ultra-low-power LLMs, and offers a blueprint for enabling low-power LLMs, emphasizing the necessity of cross-disciplinary collaboration for efficient AI at scale.
Abstract
Large Language Models (LLMs) have demonstrated unprecedented capabilities in language understanding and generation, yet their significant computational requirements pose substantial challenges for scalability and environmental sustainability. In this introductory review, we examine the algorithm-hardware co-design strategies necessary for developing ultra-low-power LLMs. We begin by reviewing model-level approaches– efficient architectures, quantization, sparsity, distillation–that reduce parameter count and memory movement. We discuss hardware-centric innovations, including event-driven neuromor-phic accelerators, near-memory computing paradigms, and spe-cialized number formats, illustrating how these platforms leverage compressed models for substantial gains in energy efficiency. Finally, we address emerging frontiers beyond digital systems that may further reduce power consumption. We offer a blueprint for enabling low-power LLMs, emphasizing the necessity of cross-disciplinary collaboration for efficient AI at scale.
