Deep Learning (DL) algorithms are the central focus of modern machine learning systems. As data volumes keep growing, it has become customary to train large neural networks with hundreds of millions of parameters with enough capacity to memorize these volumes and obtain state-of-the-art accuracy. To get around the costly computations associated with large models and data, the community is increasingly investing in specialized hardware for model training. However, specialized hardware is expensive and hard to generalize to a multitude of tasks. The progress on the algorithmic front has failed to demonstrate a direct advantage over powerful hardware such as NVIDIA-V100 GPUs. This paper provides an exception. We propose SLIDE (Sub-LInear Deep learning Engine) that uniquely blends smart randomized algorithms, with multi-core parallelism and workload optimization. Using just a CPU, SLIDE drastically reduces the computations during both training and inference outperforming an optimized implementation of Tensorflow (TF) on the best available GPU. Our evaluations on industry-scale recommendation datasets, with large fully connected architectures, show that training with SLIDE on a 44 core CPU is more than 3.5 times (1 hour vs. 3.5 hours) faster than the same network trained using TF on Tesla V100 at any given accuracy level. On the same CPU hardware, SLIDE is over 10x faster than TF. We provide codes and scripts for reproducibility.
Beidi Chen (Rice University)
Beidi Chen is a Ph.D. student at Rice University, working with Dr. Anshumali Shrivastava. She has been working on Large-scale machine learning Algorithms. Most of her research involves designing algorithms for efficient, accurate and secure representation of data. She received her B.S. in Electrical Engineering and Computer Science from UC Berkeley. She was selected for EECS Rising Stars 2019 at UIUC. She published in top conferences and journals on machine learning and statistics including Neurips, UAI, ICLR, AoAS, SYSML, etc. She had done several internships in NVDIA, Amazon, Apple, VMWare etc.
Tharun Medini (Rice University)
I'm a 4th year PhD student at Rice University. I work in the RUSHLAB with Prof.Anshumali Shrivastava. My research area is Large Scale Machine Learning using Randomized Hashing. I previously worked as an Applied Scientist Intern at Amazon Search, Palo Alto from May 2018 - Aug 2019. I worked on a myriad of projects like query to product prediction using Extreme Classification, super fast query reformulation for zero result queries and fast approximate nearest neighbor search. I published papers at NeurIPS 2019 and MLSys 2020. I completed my bachelors in Electrical Engineering from IIT Bombay.
James Farwell (Intel Corporation)
sameh gobriel (Intel Corp.)
Charlie Tai (Intel Corporation)
Anshumali Shrivastava (Rice University)
Related Events (a corresponding poster, oral, or spotlight)
2020 Poster: SLIDE : Training Deep Neural Networks with Large Outputs on a CPU faster than a V100-GPU »
Mon Mar 2nd 06:30 -- 09:00 PM Room Ballroom A