MLSys 2026 Career Opportunities
Here we highlight career opportunities submitted by our Exhibitors, and other top industry, academic, and non-profit leaders. We would like to thank each of our exhibitors for supporting MLSys 2026.
Search Opportunities
Location Santa Clara, US or Toronto, Canada
Description At Lemurian Labs, we’re on a mission to bring the power of AI to everyone—without leaving a massive environmental footprint. We care deeply about the impact AI has on our society and planet, and we’re building a rock-solid foundation for its future, ensuring AI grows sustainably and responsibly. Because let’s face it, what good is innovation if it doesn’t help the world?
We are building a high-performance, portable compiler that lets developers “build once, deploy anywhere.” Yes, anywhere. We’re talking about seamless cross-platform compatibility, so you can train your models in the cloud, deploy them to the edge, and everything in between—all while optimizing for resource efficiency and scalability.
If the idea of sustainably scaling AI motivates you and you’re excited about making AI development both powerful and accessible, then we’d love to have you. Join us at Lemurian Labs, where you can have fun building the future—without leaving a mess behind.
Here is what you will do: - Design, develop, maintain and improve our multi-target runtime - Use the latest techniques in parallelization and partitioning to automate generation and exploit highly optimized kernels - Rapid prototyping and data driven exploration of new ideas - Benchmark and analyze the outputs produced by our optimizing compiler on target hardware - Work closely with our product team to understand the evolving needs of ML engineers and drive improvements in architecture - Build tools to collect and analyze performance bottleneck
Essential Skills and Experience: - BS degree in computer science, computer engineering, electrical engineering, or equivalent practical experience - 4+ years of experience working with compilers. - A deep understanding of asynchronous, concurrent programming. - 4+ years of experience with C/C++ (C++14 or newer). - An understanding of HW architecture (vector vs scalar registers and instructions, memory hierarchies). - Knowledge of operating system kernel development or hypervisor development.
Preferred Skills and Experience: - Masters or PhD degree in computer science, computer engineering, electrical engineering, or equivalent practical experience. - Experience developing or maintaining libraries like CUDA or ROCm. - Experience with GPU programming. - Experience with high performance computing (HPC). - Masters or PhD degree in computer science, or equivalent practical experience. - Knowledge of DL frameworks such as PyTorch, JAX or Triton. - Experience with programming large compute clusters.
Salary depends on experience and geographical location.
This salary range may be inclusive of several career levels and will be narrowed during the interview process based on a number of factors, such as the candidate’s experience, knowledge, skills, and abilities, as well as internal equity among our team.
Additional benefits for this role may include: equity, company bonus opportunities, medical, dental, and vision benefits; retirement savings plan; and supplemental wellness benefits.
Lemurian Labs ensures equal employment opportunity without discrimination or harassment based on race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity or expression, age, disability, national origin, marital or domestic/civil partnership status, genetic information, citizenship status, veteran status, or any other characteristic protected by law.
EOE
Team Description:
The AI Foundations team is at the center of bringing our vision for AI at Capital One to life. Our work touches every aspect of the research life cycle, from partnering with Academia to building production systems. We work with product, technology and business leaders to apply the state of the art in AI to our business.
This is a people manager role that will lead teams to drive strategic direction through collaboration with Applied Science, Engineering and Product leaders across Capital One. As a well-respected people leader, you will guide and mentor a team of applied scientists. You will be expected to be an external leader representing Capital One in the research community, collaborating with prominent faculty members in the relevant AI research community.
In this role, you will:
Partner with a cross-functional team of data scientists, software engineers, machine learning engineers and product managers to deliver AI-powered products that change how customers interact with their money.
Leverage a broad stack of technologies — Pytorch, AWS Ultraclusters, Huggingface, Lightning, VectorDBs, and more — to reveal the insights hidden within huge volumes of numeric and textual data.
Build AI foundation models through all phases of development, from design through training, evaluation, validation, and implementation.
Engage in high impact applied research to take the latest AI developments and push them into the next generation of customer experiences.
Flex your interpersonal skills to translate the complexity of your work into tangible business goals.
The Ideal Candidate:
You love the process of analyzing and creating, but also share our passion to do the right thing. You know at the end of the day it’s about making the right decision for our customers.
Innovative. You continually research and evaluate emerging technologies. You stay current on published state-of-the-art methods, technologies, and applications and seek out opportunities to apply them.
Creative. You thrive on bringing definition to big, undefined problems. You love asking questions and pushing hard to find answers. You’re not afraid to share a new idea.
A leader. You challenge conventional thinking and work with stakeholders to identify and improve the status quo. You’re passionate about talent development for your own team and beyond.
Technical. You’re comfortable with open-source languages and are passionate about developing further. You have hands-on experience developing AI foundation models and solutions using open-source tools and cloud computing platforms.
Has a deep understanding of the foundations of AI methodologies.
Experience building large deep learning models, whether on language, images, events, or graphs, as well as expertise in one or more of the following: training optimization, self-supervised learning, robustness, explainability, RLHF.
An engineering mindset as shown by a track record of delivering models at scale both in terms of training data and inference volumes.
Experience in delivering libraries, platform level code or solution level code to existing products.
A professional with a track record of coming up with new ideas or improving upon existing ideas in machine learning, demonstrated by accomplishments such as first author publications or projects.
Possess the ability to own and pursue a research agenda, including choosing impactful research problems and autonomously carrying out long-running projects.
Key Responsibilities:
Partner with a cross-functional team of scientists, machine learning engineers, software engineers, and product managers to deliver AI-powered platforms and solutions that change how customers interact with their money.
Build AI foundation models through all phases of development, from design through training, evaluation, validation, and implementation.
Engage in high impact applied research to take the latest AI developments
Location: San Francisco · On-site
ABOUT THE COMPANY
We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site
ABOUT THE ROLE
You'll build and maintain the ML systems and pipelines that our research runs on top of: data pipelines, training infrastructure, evaluation tooling, deployment, observability. The work bridges research and production, and you'll be the person who makes "we ran an experiment" actually mean "we ran it correctly, at scale, with results we trust."
This is a senior ML engineering role. You'll own systems end-to-end. You'll work with researchers daily and translate research code into infrastructure that the team can rely on. You'll move fast and you'll be measured on whether your systems make the team faster.
WHAT YOU'LL DO
-
Build and maintain the training, evaluation, and deployment pipelines that our research runs on
-
Take research code from prototype to production: refactor, harden, instrument, test
-
Design observability into our ML systems (metrics, logs, traces, eval dashboards) so failures surface fast
-
Own data pipelines for training and evaluation: ingest, dedup, version, validate
-
Work closely with researchers to understand what they need, what's slow, and what's brittle
-
Set engineering standards across our ML stack (testing, reviews, runbooks) so the team scales
-
Contribute to architectural decisions that shape how research and production interact
WHAT WE'RE LOOKING FOR
-
Senior ML engineer with 6+ years building production-grade ML systems
-
Track record across the full lifecycle: data, training, evaluation, deployment, monitoring
-
Strong distributed systems experience; you've shipped systems that have to be on
-
Fluent Python, fluent with at least one of (PyTorch, JAX); comfortable at the systems-level when needed
-
Comfortable with experimentation infrastructure (Ray, Slurm, Kubernetes, or similar)
-
Bias toward shipping; you prefer working code over working diagrams
-
Strong written communication
NICE TO HAVE
-
Experience building experimentation platforms or research infrastructure at a frontier ML lab
-
Background in distributed training systems
-
Open-source contributions to ML infrastructure
-
History of working effectively with small senior teams
THIS ROLE IS PROBABLY NOT FOR YOU IF
-
You want to do research with engineering as a side activity: this is engineering as the main thing
-
Cross-functional work with researchers (translation, scoping, education) doesn't appeal
-
Long-running ownership of running systems isn't appealing: this role has it
P.S. We’re also hosting a small private dinner during MLSys for people interested in agents, recursive self-improvement, and AI infrastructure. Apply to join us here: https://luma.com/u6yt1gri
About Unconventional
Since 2022, AI has entered the mainstream, reshaping entire industries from education and software development to fundamental consumer behaviors. This revolution has created an unprecedented demand for computation - a demand that is now fundamentally limited by energy, not just in the datacenter, but at a global scale.
At Unconventional, our mission is to solve this. We are rethinking computing from the ground up to build a new foundation for AI that is 1000x more efficient. We're doing this by exploiting the rich physics of semiconductors, mapping neural networks directly to the device physics rather than relying on layers of inefficient abstraction.
The Role
As a Member of Technical Staff, you will be a foundational member of our small, multi-disciplinary R&D team. We are looking for 'first principles' thinkers who are excited to tackle the hardest, most ambiguous technical challenges at the intersection of AI, physics, and computer architecture. You will be responsible for driving invention, prototyping, and validation of the core components of our novel computing platform.
Your work will be fluid and could span from theoretical modeling and simulation to algorithm development, hardware/software co-design, or experimental validation in collaboration with other team members. We're hiring exceptional problem-solvers who can navigate deep uncertainty and help chart our technical roadmap.
What We're Looking For - Exceptional technical ability in a quantitative field (e.g., Physics, Computer Science, Electrical Engineering, Applied Math, or a related discipline). - An MS/PhD or equivalent research/project experience is strongly preferred. - A "0-to-1" mindset. You have a demonstrated history of tackling complex, ambiguous R&D problems, often from a blank slate. - Deep curiosity. You are comfortable diving into new domains, whether it's semiconductor physics, machine learning theory, or systems-level design. - A creative and unconventional approach to problem-solving.
Core Technical Competencies
- Analytic Foundations: Core competences in the analysis of nonlinear dynamical systems (ODEs, PDEs, SDEs), ideally with experience analyzing the stability, noise robustness, and capacity of such systems. Strong candidates are able to leverage the properties of (or impose constraints on) such systems to develop analytic insights.
- Practical Eye: The ability to leverage analytic insights to build practical tools – metrics, algorithmic optimizations, and automated analyses – that can be used to study dynamical systems. Strong candidates are comfortable navigating the tradeoff space between theoretic purity and practicality to realize useful tooling.
- Programming Proficiency: Strong command of Python and expertise with using numeric computing and visualization libraries, such as numpy, scipy, and matplotlib. Experience with libraries geared toward analysis, such as computer algebra libraries (e.g., sympy) also recommended.
- ML/AI Familiarity: Familiarity with dynamics-based ML model architectures, such as diffusion models and energy-based models, and general experience with ML model training flows. Experience with using high-level ML model frameworks, such as PyTorch and JAX.
Bonus Points (Nice to Have)
- Compute model technical staff may focus primarily on the above skillsets, or may be cross-disciplinary with hardware or AI/ML algorithms expertise.
- Collaboration & Communication
- Cross-Functional Leadership: Excellent ability to translate complex technical concepts for diverse teams. You will act as a translator, discussing algorithmic/model trade-offs with ML/AI teams and eliciting hardware constraints and features from hardware engineering teams.
Why Join Us?
- The Mission: Tackle a fundamental problem that could redefine computing for the next 50 years.
- The Impact: Be a foundational member of a world-class team with an outsized opportunity for ownership and impact.
Team Description:
The Intelligent Foundations and Experiences (IFX) team is at the center of bringing our vision for AI at Capital One to life. We work hand-in-hand with our partners across the company to advance the state of the art in science and AI engineering, and we build and deploy proprietary solutions that are central to our business and deliver value to millions of customers. Our AI models and platforms empower teams across Capital One to enhance their products with the transformative power of AI, in responsible and scalable ways for the highest leverage impact.
In this role, you will:
Partner with a cross-functional team of engineers, research scientists, technical program managers, and product managers to deliver AI-powered products that change how our associates work and how our customers interact with Capital One.
Design, develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc.
Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more.
Invent and introduce state-of-the-art LLM optimization techniques to improve the performance — scalability, cost, latency, throughput — of large scale production AI systems.
Contribute to the technical vision and the long term roadmap of foundational AI systems at Capital One.
The Ideal Candidate:
You love to build systems, take pride in the quality of your work, and also share our passion to do the right thing. You want to work on problems that will help change banking for good.
Passion for staying abreast of the latest research, and an ability to intuitively understand scientific publications and judiciously apply novel techniques in production.
You adapt quickly and thrive on bringing clarity to big, undefined problems. You love asking questions and digging deep to uncover the root of problems and can articulate your findings concisely with clarity. You have the courage to share new ideas even when they are unproven.
You are deeply Technical. You possess a strong foundation in engineering and mathematics, and your expertise in hardware, software, and AI enable you to see and exploit optimization opportunities that others miss.
You are a resilient trail blazer who can forge new paths to achieve business goals when the route is unknown.
Basic Qualifications:
Bachelor's degree in Computer Science, AI, Electrical Engineering, Computer Engineering, or related fields plus at least 10 years of experience developing AI and ML algorithms or technologies, or a Master's degree in Computer Science, AI, Electrical Engineering, Computer Engineering, or related fields plus at least 8 years of experience developing AI and ML algorithms or technologies
At least 10 years of experience programming with Python, Go, Scala, or Java
Preferred Qualifications:
9 years of experience deploying scalable and responsible AI solutions on cloud platforms (e.g. AWS, Google Cloud, Azure, or equivalent private cloud)
Experience architecting, designing, developing, integrating, delivering, and supporting complex enterprise AI systems
Demonstrated ability to lead and mentor an engineering organization and influence cross-functional stakeholders up to the SVP level
Experience developing AI and ML algorithms or technologies (e.g. LLM Inference, Similarity Search and VectorDBs, Guardrails, Memory) using Python, C++, C#, Java, or Golang
Experience developing and applying state-of-the-art techniques for optimizing training and inference software to improve hardware utilization, latency, throughput, and cost
Passion for staying abreast of the latest AI research and AI systems, and judiciously apply novel techniques in production
Excellent communication and presentation skills
Location: San Francisco · On-site
ABOUT THE COMPANY
We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site
ABOUT THE ROLE
You'll be researching the agents at the core of our work: multi-agent systems that conduct automated machine learning research and discovery. You'll design how these agents plan, decompose problems, choose what to try next, evaluate their own outputs, and recover from mistakes.
This is a deeply open-ended research role. The benchmarks for agents that do real research don't exist yet, and inventing them is part of the job. You'll move between method design, careful experimentation, building evaluation frameworks, and shipping into production. Real autonomy, real ownership, and the corresponding responsibility for choosing well.
WHAT YOU'LL DO
-
Design methods that improve how our agents plan, decompose tasks, use tools, manage context, and recover from failures across long-horizon research workflows
-
Develop multi-agent coordination patterns: how multiple agents share context, divide labor, supervise each other, and combine their outputs
-
Build and maintain evaluation frameworks for agent capability on open-ended tasks (the kind where the right answer isn't pre-specified)
-
Run rigorous experiments to characterize what works, what doesn't, and why: controls, ablations, statistical significance
-
Co-design agent architectures with engineering teammates; ship the most promising methods into production
-
Read deeply across the agentic ML, planning, RL, and tool-use literature; bring useful work from outside in
-
Share findings internally so the rest of the team builds on them
-
Help shape research direction across the team: agentic research taste compounds when discussed openly
WHAT WE'RE LOOKING FOR
-
Strong track record of ML research with focus on agents, RL, LLMs, planning, tool use, or multi-agent systems
-
5+ years of hands-on research experience in industry or academia
-
Comfort designing experiments and running them end-to-end at scale
-
Track record of building evaluation frameworks for capabilities that aren't easily benchmarked
-
Bias toward shipping research, not handing it off
-
Strong written communication: you can compress a result into a paragraph that changes what someone else does next
-
Comfort with ambiguity: open-ended problems without fixed benchmarks are the work, not a frustration
-
Published research at NeurIPS, ICML, ICLR, COLM, RLC, or comparable venues
NICE TO HAVE
-
PhD in ML, statistics, CS, or adjacent
-
Published research on agentic systems, tool use, long-horizon planning, multi-agent coordination, or self-improvement methods
-
Open-source contributions in the agentic ML ecosystem (coding agents, research assistants, autonomous workflows)
-
Experience with reasoning models, chain-of-thought / scratchpad methods, or supervised fine-tuning for agentic behaviors
-
Background in evaluation methodology for capabilities that don't have established benchmarks
THIS ROLE IS PROBABLY NOT FOR YOU IF
- You want to focus on a single stable benchmark: our agents work on open-ended problems and the targets shift
- You prefer to keep research paper-only; these agents need to actually work
- You'd rather work alone than share research taste openly with a small team
P.S. We’re also hosting a small private dinner during MLSys for people interested in agents, recursive self-improvement, and AI infrastructure. Apply to join us here: https://luma.com/u6yt1gri