Poster
Optimizing LLM Queries in Relational Data Analytics Workloads
Shu Liu · Asim Biswal · Audrey Cheng · Amog Kamsetty · Luis Gaspar Schroeder · Liana Patel · Shiyi Cao · Xiangxi Mo · Ion Stoica · Joseph Gonzalez · Matei Zaharia
Batch data analytics has become a growing application for Large Language Models (LLMs). LLMs enable usersto perform a wide range of natural language tasks, such as classification, entity extraction, and translation, overlarge datasets. However, LLM inference is highly expensive in both computational and monetary costs: forexample, an NVIDIA L4 GPU running Llama3-8B can only process 6 KB of text per second, taking about a dayto handle 15 GB of data; and processing a similar amount of data costs around $10K on OpenAI’s GPT-4o. In thispaper, we propose novel techniques that can significantly reduce the cost of LLM calls for relational data analyticsworkloads. Our key contribution is developing efficient algorithms for reordering the rows and the fields witheach row of an input table to maximize key-value (KV) cache reuse when performing LLM serving. Our approachcan be easily applied to existing analytics systems and serving platforms. Evaluations show that our solution canyield up to 3.4× improvement in end-to-end latency on a benchmark of diverse LLM-based queries using Llama 3models. Our solutions also achieve 32% cost savings using OpenAI and Anthropic prefix cache pricing models.