Skip to yearly menu bar Skip to main content


Poster

The Hidden Bloat in Machine Learning Systems

Huaifeng Zhang · Ahmed Ali-Eldin Hassan


Abstract:

Software bloat refers to code and features that is not used by a software during runtime. For Machine Learning (ML) systems, bloat is a major contributor to their technical debt leading to decreased performance and resource wastage. In this work, we present, Negativa-ML, a novel tool to identify and remove bloat in ML frameworks by analyzing their shared libraries.Our approach includes novel techniques to detect and locate unnecessary code within device code - a key area overlooked by existing research, which focuses primarily on host code.We evaluate Negativa-ML using four popular ML frameworks across ten workloads over 300 shared libraries.The results demonstrate that the ML frameworks are highly bloated on both the device and host code side.On average, Negativa-ML reduces the device code size in these frameworks by up to 75\% and the host code by up to 72\%, resulting in total file size reductions of up to 55\%.The device code is a primary source of bloat within ML frameworks.Through debloating, we achieve reductions in peak host memory usage, peak GPU memory usage, and execution time by up to 74.6\%, 69.6\%, and 44.6\%, respectively.

Chat is not available.