Feature Engineering Systems for Machine Learning


Room 203
[ Abstract ]
Tue 30 Aug 8 a.m. PDT — 10:15 a.m. PDT


For machine learning (ML) applications, feature engineering is playing an important role to generate derived features that are informative and useful for subsequent ML model training and online inference. In practice, it is challenging to build a feature engineering system for large-scale real-world ML applications. This is because the system should not only meet the different performance requirements of offline training and online inference, but also be very careful about the online-offline feature consistency. In this tutorial, we will cover three topics about feature engineering systems. (1) We will first introduce the concept of feature engineering and its design requirements. We will show that online-offline feature consistency, efficiency, and development cost are three major concerns. (2) Then we will review existing solutions from industry, which are usually categorized as feature store, such as Amazon SageMaker Feature Store, Tecton, and Databricks Feature Store. Furthermore, we will present the detailed design methodology and architecture of our opensource project OpenMLDB, which is a machine learning database that meets all requirements of feature store and provides unified SQL APIs for both offline and online development. It originates from our in-house commercial production and has been deployed in hundreds of real-world applications. (3) Finally, we will demonstrate a hands-on example of building a complete machine learning application that can go live for production in a few minutes based on the opensource machine learning database OpenMLDB.

Chat is not available.