MLSys Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

Invited Talk 4
in
Workshop: Personalized Recommendation Systems and Algorithms

Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

Summer Deng

[ Abstract ]

2021 Invited Talk 4

Abstract:

This talk will present the low-precision techniques, analysis and tool chain we explored to optimize the performance of production scale recommendation models while maintaining the stringent accuracy requirements. We also share the unique challenges and learnings from the deployment of Facebook’s production recommendation models in low precision on existing hardware platforms including CPUs and accelerators. We hope that the methodologies we are sharing are applicable to many ML domains and low precision architectures in general.

Invited Talk 4 in Workshop: Personalized Recommendation Systems and Algorithms

Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

Summer Deng

Invited Talk 4
in
Workshop: Personalized Recommendation Systems and Algorithms