MLSys 2026 Career Opportunities
Here we highlight career opportunities submitted by our Exhibitors, and other top industry, academic, and non-profit leaders. We would like to thank each of our exhibitors for supporting MLSys 2026.
Search Opportunities
The Intelligent Foundations and Experiences (IFX) team is at the center of bringing our vision for AI at Capital One to life. We work hand-in-hand with our partners across the company to advance the state of the art in science and AI engineering, and we build and deploy proprietary solutions that are central to our business and deliver value to millions of customers. Our AI models and platforms empower teams across Capital One to enhance their products with the transformative power of AI, in responsible and scalable ways for the highest leverage impact.
In this role, you will:
Partner with a cross-functional team of engineers, research scientists, technical program managers, and product managers to deliver AI-powered products that change how our associates work and how our customers interact with Capital One.
Design, develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc.
Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more.
Invent and introduce state-of-the-art LLM optimization techniques to improve the performance — scalability, cost, latency, throughput — of large scale production AI systems.
Contribute to the technical vision and the long term roadmap of foundational AI systems at Capital One.
The Ideal Candidate:
You love to build systems, take pride in the quality of your work, and also share our passion to do the right thing. You want to work on problems that will help change banking for good.
Passion for staying abreast of the latest research, and an ability to intuitively understand scientific publications and judiciously apply novel techniques in production.
You adapt quickly and thrive on bringing clarity to big, undefined problems. You love asking questions and digging deep to uncover the root of problems and can articulate your findings concisely with clarity. You have the courage to share new ideas even when they are unproven.
You are deeply Technical. You possess a strong foundation in engineering and mathematics, and your expertise in hardware, software, and AI enable you to see and exploit optimization opportunities that others miss.
You are a resilient trail blazer who can forge new paths to achieve business goals when the route is unknown.
About Unconventional
Since 2022, AI has entered the mainstream, reshaping entire industries from education and software development to fundamental consumer behaviors. This revolution has created an unprecedented demand for computation - a demand that is now fundamentally limited by energy, not just in the datacenter, but at a global scale.
At Unconventional, our mission is to solve this. We are rethinking computing from the ground up to build a new foundation for AI that is 1000x more efficient. We're doing this by exploiting the rich physics of semiconductors, mapping neural networks directly to the device physics rather than relying on layers of inefficient abstraction.
The Role
As a Member of Technical Staff, Language & Reasoning Models, you will drive the development of foundational language and reasoning models that fundamentally leverage the dynamics of our novel silicon. Your goal is to map the behaviors of modern language models directly onto the physics of our hardware.
You will sit at the intersection of NLP/reasoning research and hardware codesign, proving that high-fidelity, large-scale language understanding and generation can be achieved natively on an unconventional computing substrate.
What You'll Do
- Model Development: Design, train, and scale next-generation language and reasoning architectures (such as transformers, state space models, diffusion/flow models, and deep equilibrium models) specifically tailored for unconventional compute.
- Physics-Informed Architecture: Rethink standard sequence modeling to exploit the continuous-time dynamics of silicon, moving away from layers of inefficient digital abstraction.
- Evaluation & Scaling: Establish the training recipes, loss functions, and evaluation metrics needed to reach the frontier of language comprehension, logical reasoning, and generation speed while maintaining the massive energy efficiency of our platform.
- Extreme Codesign: Collaborate with hardware designers and theorists, and system builders to co-design the model architecture alongside the underlying physical compute primitives.
Minimum Qualifications
- Education: An MS/PhD or equivalent research/project experience in a quantitative field such as AI/Machine Learning, Computer Science, Physics, Electrical Engineering, or Applied Math.
- Experience: Deep, hands-on expertise in the theory, architecture, and training of modern foundation models (transformers, SSMs, text diffusion/flow, etc.).
- Systems Fluency: Hands-on, battle-tested experience dealing with model scaling. You have successfully designed and executed full-scale, distributed training runs for large language or reasoning models, managing the complexities of massive compute clusters.
- Software Development: You are fluent in modern deep learning frameworks (PyTorch or JAX) and have a proven track record of writing clean, scalable training code for large language models.
Preferred Qualifications (Nice to Have)
- Unconventional Experience: As a bonus, you may have experience working with hardware-in-the-loop training, mixed-signal hardware, quantization, or physics-informed neural networks
Why Join Us?
- The Mission: Redefine computing for the next 50 years by solving the fundamental energy limitation of AI at a global scale.
- The Impact: Shape the company's future as a foundational team member. Enjoy massive ownership and an outsized opportunity to drive change.
- The Perks: A comprehensive package including best-in-class health benefits, 401k matching, truly unlimited PTO, and complimentary meals in our Palo Alto office.
About the Role
We are seeking a Member of Technical Staff, ML Kernels to design, optimize, and benchmark high-performance compute kernels for modern ML workloads. This role is for a deeply technical engineer who enjoys working close to hardware — writing CUDA kernels, investigating performance artifacts, building benchmarks, and serving as a go-to expert on accelerator behavior.
You will partner closely with research, systems, and infrastructure teams to unlock efficiency gains across GPUs today and other accelerators (e.g., TPU, Trainium) as we expand our hardware partnerships.
This role will be performed onsite in Santa Clara, CA or Boston, MA.
Essential Duties & Responsibilities
- Design, implement, and optimize high-performance ML kernels targeting GPUs (CUDA), with an emphasis on throughput, latency, and memory efficiency.
- Profile, benchmark, and analyze performance across hardware configurations, identifying bottlenecks.
- Debug low-level performance issues involving memory hierarchy, scheduling, synchronization, and numerical formats.
- Build and maintain benchmarking tools to compare performance across GPUs and other accelerators.
- Advise internal teams on GPU and accelerator performance characteristics, tradeoffs, and best practices.
- Explore and prototype support for alternative accelerator platforms (e.g., TPU, Trainium) as needs evolve.
- Collaborate with ML researchers and systems engineers to translate algorithmic needs into efficient kernel implementations.
Qualifications
- Strong experience writing and optimizing CUDA kernels or equivalent low-level accelerator code.
- Deep understanding of GPU architecture, including memory systems, parallel execution, and performance tradeoffs.
- Experience with profiling and benchmarking tools (e.g., Nsight Systems/Compute, nvprof).
- Proficiency in C++ and low-level performance-oriented programming.
- Ability to independently investigate ambiguous performance issues and drive them to resolution.
Preferred Qualifications
- Experience with ML framework internals (e.g., PyTorch, TensorFlow, XLA) and custom operator development.
- Prior work with non-GPU accelerators such as TPU, Trainium, or IPU.
- Familiarity with mixed-precision compute (e.g., FP16, BF16, FP8).
- Contributions to open-source performance, systems, or ML infrastructure projects.
Compensation & Benefits
- Competitive base salary, performance-based bonus, and early stage equity grant
- Comprehensive health, dental, vision, and life insurance
- Relocation assistance and visa sponsorship
- Daily lunch stipend, 401k match, and more
- Sunny offices in Santa Clara, CA and Boston, MA
The Opportunity
- Impact: We are tackling a fundamental challenge at the infrastructure layer: unlocking greater AI capability while dramatically improving efficiency. The work we do here compounds across state-of-the-art AI models, systems, and real-world applications.
- Timing: Joining now means real ownership of the company and meaningful influence over product direction and execution. You'll work from first principles, move quickly from insight to execution, and see your contributions directly reflected in what we build.
- Culture: You'll work alongside a group of people who care deeply about rigor, clarity, and impact. We value thoughtful disagreement, fast learning, and intellectual fearlessness. This is a place where strong ideas shine, curiosity is encouraged, and growth is a daily practice.
Location: San Francisco · On-site
ABOUT THE COMPANY
We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site
ABOUT THE ROLE
As a Researcher on our team, you'll design experiments and develop methods that drive how our autonomous research agents make decisions. You'll work across the full ML research stack (problem formulation, method design, experimentation, analysis, write-up) and you'll do it on problems that don't always have established benchmarks because we're inventing the workloads.
The work is open-ended and concrete at the same time. Open-ended because the research problems are constantly evolving and we don’t prescribe approaches. Concrete because the research questions are motivated by real-world applications. Open-ended because we don't have prescribed research directions; concrete because every experiment ties to something the agents will actually do. You'll have real autonomy (and the corresponding responsibility for choosing well).
WHAT YOU'LL DO
- Identify research questions that, when answered, would meaningfully change what our agents are capable of
- Design and run experiments end-to-end (from problem framing through method design, infrastructure, evaluation, and write-up)
- Develop new methods spanning RL, LLMs, agentic systems, multi-agent coordination, search, evaluation, or wherever the problem leads
- Work closely with engineers to take the most promising methods from research code into production
- Read deeply across the literature; bring useful work from outside in
- Help shape how the team picks problems
WHAT WE'RE LOOKING FOR
- Strong track record of ML research at the frontier: RL, LLMs, agentic ML, multi-agent systems, evaluation, or adjacent
- 5+ years of hands-on research experience in industry or academia
- Comfortable designing experiments and running them at scale, not just proposing them
- Strong written communication: you can summarize your research findings into actionable insights for next steps
- Fluent in PyTorch, Jax or equivalent; comfortable working with large-scale training infrastructure
- Bias toward shipping research rather than handing it off
- Comfortable with ambiguity: many of our problems don't have a known right answer, and navigating that uncertainty is core to the role.
- Published research at NeurIPS, ICML, ICLR, COLM, RLC, or comparable venues
NICE TO HAVE
- PhD in ML, statistics, computer science, or adjacent
- Open-source contributions to ML research infrastructure
- Experience with agentic systems, tool use, long-horizon planning, or multi-agent coordination
THIS ROLE IS PROBABLY NOT FOR YOU IF
- You want to focus on one specific benchmark and watch the metric tick up (our problems are broader and shift)
- You prefer more pure research that never touches a production system
- You'd rather work alone than share research taste openly with a small team
P.S. We’re also hosting a small private dinner during MLSys for people interested in agents, recursive self-improvement, and AI infrastructure. Apply to join us here: https://luma.com/u6yt1gri
About the Role
We are seeking a Member of Technical Staff, ML Kernels to design, optimize, and benchmark high-performance compute kernels for modern ML workloads. This role is for a deeply technical engineer who enjoys working close to hardware — writing CUDA kernels, investigating performance artifacts, building benchmarks, and serving as a go-to expert on accelerator behavior.
You will partner closely with research, systems, and infrastructure teams to unlock efficiency gains across GPUs today and other accelerators (e.g., TPU, Trainium) as we expand our hardware partnerships.
This role will be performed onsite in Santa Clara, CA or Boston, MA.
Essential Duties & Responsibilities
- Design, implement, and optimize high-performance ML kernels targeting GPUs (CUDA), with an emphasis on throughput, latency, and memory efficiency.
- Profile, benchmark, and analyze performance across hardware configurations, identifying bottlenecks.
- Debug low-level performance issues involving memory hierarchy, scheduling, synchronization, and numerical formats.
- Build and maintain benchmarking tools to compare performance across GPUs and other accelerators.
- Advise internal teams on GPU and accelerator performance characteristics, tradeoffs, and best practices.
- Explore and prototype support for alternative accelerator platforms (e.g., TPU, Trainium) as needs evolve.
- Collaborate with ML researchers and systems engineers to translate algorithmic needs into efficient kernel implementations.
Qualifications
- Strong experience writing and optimizing CUDA kernels or equivalent low-level accelerator code.
- Deep understanding of GPU architecture, including memory systems, parallel execution, and performance tradeoffs.
- Experience with profiling and benchmarking tools (e.g., Nsight Systems/Compute, nvprof).
- Proficiency in C++ and low-level performance-oriented programming.
- Ability to independently investigate ambiguous performance issues and drive them to resolution.
Preferred Qualifications
- Experience with ML framework internals (e.g., PyTorch, TensorFlow, XLA) and custom operator development.
- Prior work with non-GPU accelerators such as TPU, Trainium, or IPU.
- Familiarity with mixed-precision compute (e.g., FP16, BF16, FP8).
- Contributions to open-source performance, systems, or ML infrastructure projects.
Compensation & Benefits
- Competitive base salary, performance-based bonus, and early stage equity grant
- Comprehensive health, dental, vision, and life insurance
- Relocation assistance and visa sponsorship
- Daily lunch stipend, 401k match, and more
- Sunny offices in Santa Clara, CA and Boston, MA
The Opportunity
- Impact: We are tackling a fundamental challenge at the infrastructure layer: unlocking greater AI capability while dramatically improving efficiency. The work we do here compounds across state-of-the-art AI models, systems, and real-world applications.
- Timing: Joining now means real ownership of the company and meaningful influence over product direction and execution. You'll work from first principles, move quickly from insight to execution, and see your contributions directly reflected in what we build.
- Culture: You'll work alongside a group of people who care deeply about rigor, clarity, and impact. We value thoughtful disagreement, fast learning, and intellectual fearlessness. This is a place where strong ideas shine, curiosity is encouraged, and growth is a daily practice.
About the Role
We're looking for a motivated LLM Systems Engineer willing to explore new and unconventional inference systems based on emerging hardware.
This role is part engineering, part research — you'll prototype algorithms suitable for our inference hardware and guide our hardware team on product definition. The ideal candidate has a proven track record of pursuing ML systems research and is very familiar with industry-standard LLM inference systems.
This role will be performed onsite in Santa Clara, CA or Boston, MA.
Essential Duties & Responsibilities
- Prototype and optimize emerging ML inference systems.
- Develop novel memory models for expandable vRAM.
- Write efficient GPU kernels for data movement.
- Perform design-space exploration, implementation, and benchmarking of inference engines, both in simulation and on real hardware.
Qualifications
- MS or PhD in computer systems, ideally with a focus on LLM inference and/or distributed systems.
- Prior experience contributing to core LLM inference infrastructures (vLLM, SGLang, TensorRT, etc.).
- Prior experience in accelerator programming (e.g. CUDA, JAX/Pallas, ROCm).
- Advanced computer architecture and performance engineering skills is a big plus.
Compensation & Benefits
- Competitive base salary, incentive-based bonus, and early stage equity grant
- Comprehensive health, dental, vision, and life insurance
- Relocation assistance and visa sponsorship
- Daily lunch stipend, 401k match, and more
- Sunny offices in Santa Clara, CA and Boston, MA
The Opportunity
- Impact: We are tackling a fundamental challenge at the infrastructure layer: unlocking greater AI capability while dramatically improving efficiency. The work we do here compounds across state-of-the-art AI models, systems, and real-world applications.
- Timing: Joining now means real ownership of the company and meaningful influence over product direction and execution. You'll work from first principles, move quickly from insight to execution, and see your contributions directly reflected in what we build.
- Culture: You'll work alongside a group of people who care deeply about rigor, clarity, and impact. We value thoughtful disagreement, fast learning, and intellectual fearlessness. This is a place where strong ideas shine, curiosity is encouraged, and growth is a daily practice.
About the job
Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.
Google's engineers develop next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at a massive scale. You'll be at the forefront of innovation, developing systems and AI and Machine Learning solutions.
As a PhD graduate, your research expertise is invaluable to us. Explore a variety of projects, collaborate with various teams, and contribute to products that are changing the world, across many product areas, including AI & Infrastructure, Cloud, YouTube, Search, Ads and more!
Our engineering teams include thousands of PhDs who bring their deep knowledge and research experience to enhance our systems and products. As a Google PhD Software Engineer, you will work on critical projects, with many opportunities to learn and follow your interests. We expect our engineers to be creative and versatile, leading and identifying new problems to push the field and Google technology forward.
Google offers you exciting opportunities as it is one of the world’s leading producers and consumers of ML and AI technology, with decades of experience in designing, deploying, and using ML software and custom ML hardware infrastructure at massive scale.
The US base salary range for this full-time position is $147,000-$211,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.
Responsibilities
Collaborate or lead on team projects to carry out design, analysis, and development of advanced ML systems across the stack using your research expertise. Support building end-to-end ML Systems that involves working across the full stack, from low-level hardware acceleration and compiler optimizations to high-level model architecture and production APIs, transforming your research expertise into robust, scalable products. Optimize complex system performance by analyzing and fixing performance bottlenecks, memory inefficiencies, and errors in production systems to meet stringent customer goals. Elevate engineering excellence by writing well-tested code, conducting code reviews and fostering a culture of quality by advocating best engineering practices.