MLSys 2026 Career Opportunities
Here we highlight career opportunities submitted by our Exhibitors, and other top industry, academic, and non-profit leaders. We would like to thank each of our exhibitors for supporting MLSys 2026.
Search Opportunities
Inception creates the world’s fastest, most efficient AI models. Our Mercury model is the world’s fastest reasoning LLM and first commercially available diffusion LLM, delivering 5x greater speed and efficiency than today’s LLMs, with best-in-class quality.
We are the AI researchers and engineers behind such breakthrough AI technologies as diffusion models, flash attention, and DPO. The Role We're looking for engineers and scientists to design, optimize, and scale the systems that power our diffusion LLMs in production. Your work will make inference faster, more cost-effective, and more reliable.
Key Responsibilities - Build and optimize high-performance model serving systems for low-latency inference of diffusion LLMs. - Extend orchestration frameworks (Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving. - Implement and manage load balancing, autoscaling, and traffic routing for model endpoints. - Build systems for model versioning, canary deployments, and zero-downtime rollouts. - Develop monitoring, alerting, and observability tooling to ensure SLA compliance and rapid incident response. - Collaborate with ML researchers to translate model advances (new architectures, quantization techniques, batching strategies) into production-ready serving improvements.
Qualifications - BS/MS/PhD in Computer Science, Engineering, or a related field (or equivalent experience). - Knowledge of ML serving frameworks (SGLang, vLLM, Triton Inference Server, TensorRT-LLM). - Understanding of ML frameworks (PyTorch, TensorFlow) from a systems perspective. - Familiarity with high-performance computing and GPU programming (CUDA). - Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines. - Background in performance optimization and profiling of ML systems.
Preferred Skills - Experience building and maintaining large-scale language models with tens of billions of parameters or more. - Experience with distributed systems and cloud computing platforms (AWS/GCP/Azure). - Experience with ML workflow orchestration tools (Kubeflow, Airflow). - Experience with model optimization techniques (quantization, distillation, speculative decoding, continuous batching). - Knowledge of ML-specific infrastructure challenges (checkpointing, resource scheduling, etc.).
Why Join Inception - Work with World-Class Talent: Collaborate with the inventors of diffusion models and leading AI researchers - Shape Foundational Technology: Your decisions will influence how the next generation of AI products are built and used - Immediate Impact: Join at the ground floor where your contributions directly shape product direction and company trajectory
Perks & Benefits - Competitive salary and equity in a rapidly growing startup - Flexible vacation and paid time off (PTO) - Health, dental, and vision insurance - Catered meals (breakfast, lunch, & dinner) - Commuter subsidies - A collaborative and inclusive culture
About the job
Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.
At YouTube, we believe that everyone deserves to have a voice, and that the world is a better place when we listen, share, and build community through our stories. We work together to give everyone the power to share their story, explore what they love, and connect with one another in the process. Working at the intersection of cutting-edge technology and boundless creativity, we move at the speed of culture with a shared goal to show people the world. We explore new ideas, solve real problems, and have fun — and we do it all together.
The US base salary range for this full-time position is $147,000-$211,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.
Responsibilities
Write product or system development code. Collaborate with peers and stakeholders through design and code reviews to ensure best practices amongst available technologies (e.g., style guidelines, checking code in, accuracy, testability, and efficiency). Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback. Triage product or system issues and debug/track/resolve by analyzing the sources of issues and the impact on hardware, network, or service operations and quality. Implement solutions in one or more specialized ML areas, utilize ML infrastructure, and contribute to model optimization and data processing.
Team Description:
The AI Foundations team is at the center of bringing our vision for AI at Capital One to life. Our work touches every aspect of the research life cycle, from partnering with Academia to building production systems. We work with product, technology and business leaders to apply the state of the art in AI to our business.
In this role, you will:
Partner with a cross-functional team of data scientists, software engineers, machine learning engineers and product managers to deliver AI-powered products that change how customers interact with their money.
Leverage a broad stack of technologies — Pytorch, AWS Ultraclusters, Huggingface, Lightning, VectorDBs, and more — to reveal the insights hidden within huge volumes of numeric and textual data.
Build AI foundation models through all phases of development, from design through training, evaluation, validation, and implementation.
Engage in high impact applied research to take the latest AI developments and push them into the next generation of customer experiences.
Flex your interpersonal skills to translate the complexity of your work into tangible business goals.
The Ideal Candidate:
You love the process of analyzing and creating, but also share our passion to do the right thing. You know at the end of the day it’s about making the right decision for our customers.
Innovative. You continually research and evaluate emerging technologies. You stay current on published state-of-the-art methods, technologies, and applications and seek out opportunities to apply them.
Creative. You thrive on bringing definition to big, undefined problems. You love asking questions and pushing hard to find answers. You’re not afraid to share a new idea.
A leader. You challenge conventional thinking and work with stakeholders to identify and improve the status quo. You’re passionate about talent development for your own team and beyond.
Technical. You’re comfortable with open-source languages and are passionate about developing further. You have hands-on experience developing AI foundation models and solutions using open-source tools and cloud computing platforms.
Has a deep understanding of the foundations of AI methodologies.
Experience building large deep learning models, whether on language, images, events, or graphs, as well as expertise in one or more of the following: training optimization, self-supervised learning, robustness, explainability, RLHF.
An engineering mindset as shown by a track record of delivering models at scale both in terms of training data and inference volumes.
Experience in delivering libraries, platform level code or solution level code to existing products.
A professional with a track record of coming up with new ideas or improving upon existing ideas in machine learning, demonstrated by accomplishments such as first author publications or projects.
Possess the ability to own and pursue a research agenda, including choosing impactful research problems and autonomously carrying out long-running projects.
Location: San Francisco · On-site
ABOUT THE COMPANY
We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site
ABOUT THE ROLE
You build and operate the inference systems that serve our models in production. The work spans serving infrastructure, runtime optimization, and the long tail of production infrastructure that come with running real workloads.
This is an engineering role, not a research role. You'll measure, profile, debug, and ship. You'll work alongside researchers, but your job is to make their work fast and reliable in production. Real ownership, real autonomy.
WHAT YOU'LL DO
-
Build, operate, and harden production inference systems serving large models at high throughput
-
Own the performance characteristics of those systems end-to-end: throughput, latency, cost-per-token, reliability under load
-
Profile real workloads to identify bottlenecks; ship fixes that move the metric you set out to improve
-
Implement and integrate inference optimizations from the research team (quantization, custom kernels, scheduling improvements, memory management) into production
-
Design observability into the inference layer: metrics, tracing, alerting that surface regressions before users notice them
-
Run capacity planning, autoscaling, and load testing for varied workload shapes (batch, online, mixed, agentic)
-
Diagnose and resolve production incidents; write postmortems that turn bugs into systemic fixes
WHAT WE'RE LOOKING FOR
-
Senior ML systems engineer with 3+ years building production-grade, large-scale serving infrastructure
-
Strong distributed systems experience ; you've been on-call for systems that matter
-
Performance profiling and optimization fluency: you read flame graphs, you are analytical and measured before you change
-
Experience with GPU-accelerated inference at scale (multi-GPU, multi-node, batched and streaming workloads), preferably experience with AMD GPUs
-
Fluent Python; comfortable reading and writing systems-level code in at least one of the following languages: C++,CUDA, ROCm or Triton
-
Track record of shipping production infrastructure, preferably surfaces serving millions of requests across diverse workloads
-
Good written communication; you can write a runbook that someone else can follow at 3am
NICE TO HAVE
-
Open-source contributions to inference / serving frameworks
-
Experience with mixed cloud and on-premises deployments
-
Familiarity with hardware-aware optimization (memory hierarchy, NCCL/RDMA, NUMA)
-
Background in compilers, runtimes, or accelerator software stacks
THIS ROLE IS PROBABLY NOT FOR YOU IF
-
You're primarily a researcher, the work here is building, not exploring
-
You want to focus narrowly on one component; this role spans the stack
-
Production responsibility (incidents, on-call, ownership of running systems) isn't appealing
P.S. We’re also hosting a small private dinner during MLSys for people interested in agents, recursive self-improvement, and AI infrastructure. Apply to join us here: https://luma.com/u6yt1gri
Team Description:
The AI Foundations team is at the center of bringing our vision for AI at Capital One to life. Our work touches every aspect of the research life cycle, from partnering with Academia to building production systems. We work with product, technology and business leaders to apply the state of the art in AI to our business.
This is an individual contributor (IC) role driving strategic direction through collaboration with Applied Science, Engineering and Product leaders across Capital One. As a well-respected IC leader, you will guide and mentor a team of applied scientists and their managers without being a direct people leader. You will be expected to be an external leader representing Capital One in the research community, collaborating with prominent faculty members in the relevant AI research community.
In this role, you will:
Partner with a cross-functional team of data scientists, software engineers, machine learning engineers and product managers to deliver AI-powered products that change how customers interact with their money.
Leverage a broad stack of technologies — Pytorch, AWS Ultraclusters, Huggingface, Lightning, VectorDBs, and more — to reveal the insights hidden within huge volumes of numeric and textual data.
Build AI foundation models through all phases of development, from design through training, evaluation, validation, and implementation.
Engage in high impact applied research to take the latest AI developments and push them into the next generation of customer experiences.
Flex your interpersonal skills to translate the complexity of your work into tangible business goals.
The Ideal Candidate:
You love the process of analyzing and creating, but also share our passion to do the right thing. You know at the end of the day it’s about making the right decision for our customers.
Innovative. You continually research and evaluate emerging technologies. You stay current on published state-of-the-art methods, technologies, and applications and seek out opportunities to apply them.
Creative. You thrive on bringing definition to big, undefined problems. You love asking questions and pushing hard to find answers. You’re not afraid to share a new idea.
A leader. You challenge conventional thinking and work with stakeholders to identify and improve the status quo. You’re passionate about talent development for your own team and beyond.
Technical. You’re comfortable with open-source languages and are passionate about developing further. You have hands-on experience developing AI foundation models and solutions using open-source tools and cloud computing platforms.
Has a deep understanding of the foundations of AI methodologies.
Experience building large deep learning models, whether on language, images, events, or graphs, as well as expertise in one or more of the following: training optimization, self-supervised learning, robustness, explainability, RLHF.
An engineering mindset as shown by a track record of delivering models at scale both in terms of training data and inference volumes.
Experience in delivering libraries, platform level code or solution level code to existing products.
A professional with a track record of coming up with new ideas or improving upon existing ideas in machine learning, demonstrated by accomplishments such as first author publications or projects.
Possess the ability to own and pursue a research agenda, including choosing impactful research problems and autonomously carrying out long-running projects.
About the job
Google Cloud’s mission is to make every business successful through AI by combining cutting-edge technology, infrastructure, and talent. AI/ML software engineers in Cloud bridge the gap between pioneering models and a massive product vehicle reaching billions. Our talent density and AI-powered tools drive rapid development, rooted in a culture of empowerment and a bias to action. In this role, you aren’t just building technology; you’re shaping the frontier of enterprise and driving the evolution of advanced models.
We build the industry's best data agents to help customers make more, better, and faster data-driven decisions—achieved by enriching the customer knowledge layer, automating data preparation, providing tailored agent harnesses, and leveraging the advanced capabilities of BigQuery and its ecosystem.
The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.
We're the driving channel behind Google's groundbreaking innovations, empowering the development of our cutting-edge AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.
The US base salary range for this full-time position is $207,000-$300,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.
Responsibilities
Lead the technical strategy and architectural design of the core reasoning engine that translates natural language into reliable SQL insights, ensuring the platform scales to support complex enterprise data exploration. Drive cross-functional collaboration with AI/ML, UX, and Product teams to define the "agentic" future of BigQuery, bridging the gap between raw data and business-ready answers. Establish and maintain engineering excellence by setting the bar for performance, reliability, and observability of production-critical agent services across the BigQuery ecosystem. Mentor and influence a broad group of engineers, identifying and refining ambiguous, high-impact problems into tractable projects that advance our data-centric AI capabilities.