Skip to yearly menu bar Skip to main content


Invited Talk 4
in
Workshop: Personalized Recommendation Systems and Algorithms

Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

Summer Deng


Abstract:

This talk will present the low-precision techniques, analysis and tool chain we explored to optimize the performance of production scale recommendation models while maintaining the stringent accuracy requirements. We also share the unique challenges and learnings from the deployment of Facebook’s production recommendation models in low precision on existing hardware platforms including CPUs and accelerators. We hope that the methodologies we are sharing are applicable to many ML domains and low precision architectures in general.