MLSys 2026 Career Opportunities
Here we highlight career opportunities submitted by our Exhibitors, and other top industry, academic, and non-profit leaders. We would like to thank each of our exhibitors for supporting MLSys 2026.
Search Opportunities
About Unconventional
Since 2022, AI has entered the mainstream, reshaping entire industries from education and software development to fundamental consumer behaviors. This revolution has created an unprecedented demand for computation - a demand that is now fundamentally limited by energy, not just in the datacenter, but at a global scale.
At Unconventional, our mission is to solve this. We are rethinking computing from the ground up to build a new foundation for AI that is 1000x more efficient. We're doing this by exploiting the rich physics of semiconductors, mapping neural networks directly to the device physics rather than relying on layers of inefficient abstraction.
The Role
As a Member of Technical Staff, AI Systems, you will develop state-of-the-art architectural components, write their bespoke implementations for our unconventional software framework, and map them efficiently down to the physical silicon. You are critical to preparing our software stack for upcoming tapeouts by acting as the bridge between model architecture and physical compute.
What You'll Do
-
AI Architectural Modeling: Co-design and evaluate next-generation AI models (e.g, transformers, diffusion, flow, and energy-based models).
-
You will collaborate closely across the team to combine, modify, and implement core modeling components, including both conventional (e.g., attention, normalization, Mixture-of-Experts, FFNs) and unconventional components.
-
You will ensure that they function optimally across our novel compute substrates.
-
Performance Modeling & Scaling: Establish and test scaling laws specific to our novel hardware. Develop rigorous performance models to evaluate compute vs. memory trade-offs
-
Advanced Mapping & Partitioning: Drive the partitioning and mapping of complex AI models down to hardware. Apply and invent advanced optimization strategies from first principles, including custom quantization schemes, sparsity/pruning, and distillation to fit the physical constraints of our substrates.
-
GPU Optimization & Kernel Development: Develop and optimize GPU kernels using low-level programming models like CUDA, Triton, or CUTLASS. Profile and debug complex ML codebases to resolve performance bottlenecks (training and inference).
-
Cross-Functional Collaboration: Act as a translator, discussing algorithmic trade-offs with theorists and converting model requirements into concrete specifications for infrastructure and hardware engineering teams.
Minimum Qualifications
-
Education: An MS/PhD or equivalent research/project experience in a quantitative field such as AI/Machine Learning, Computer Science, Physics, Electrical Engineering, or Applied Math.
-
Experience: Deep, practical understanding of the modern AI/ML stack and optimized compilation and execution of algorithms on modern GPU systems.
-
Proven experience in profiling, identifying, and resolving performance bottlenecks in complex ML codebases.
-
Systems Fluency: Demonstrated ability to map state-of-the-art AI model architectures (e.g., Transformers, Mixture of Experts, diffusion models) to system performance implications and apply advanced efficiency techniques such as sparsity, quantization, and distillation.
-
Software Development: Deep experience with PyTorch, including its internals, torch.compile, and distributed data parallel (DDP) / fully sharded data parallel (FSDP) libraries.
Preferred Qualifications (Nice to Have)
-
Unconventional Co-Design: A forward-looking perspective on co-designing algorithms for unconventional computing paradigms that map closely to the physics of underlying systems.
-
Next-Gen Efficiency: Theoretical or research experience in advanced approximation/compression techniques beyond standard quantization.
About Modular
At Modular, we’re on a mission to revolutionize AI infrastructure by systematically rebuilding the AI software stack from the ground up. Our team, made up of industry leaders and experts, is building cutting-edge, modular infrastructure that simplifies AI development and deployment. By rethinking the complexities of AI systems, we’re empowering everyone to unlock AI’s full potential and tackle some of the world’s most pressing challenges.
If you’re passionate about shaping the future of AI and creating tools that make a real difference in people’s lives, we want you on our team. You can read about our culture and careers to understand how we work and what we value.
About the role:
ML developers today face significant friction in taking trained models into deployment. They work in a highly fragmented space, with incomplete and patchwork solutions that require significant performance tuning and non-generalizable, model-specific enhancements. At Modular, we are building the Modular platform: a next generation AI platform that will radically improve the way developers build and deploy AI models.
We're continuously working to improve the performance and scalability of the Modular platform through advanced systems programming and distributed architectures. As a GenAI Systems Engineer, you'll architect flexible, robust, and scalable frameworks, supporting advanced inference optimizations like Disaggregated Inference, Speculative Decoding, and Distributed KV Caching.
LOCATION: Candidates based in the US or Canada are welcome to apply. To support growth and collaboration, those in earlier career stages work in a hybrid capacity at our Los Altos, CA office (minimum 2 days per week on-site) with relocation assistance provided for out-of-state candidates. Onboarding for new hires is conducted in-person in our Los Altos, CA office.
What you will do:
Leverage a broad understanding of available libraries and concurrency techniques to inform high impact architecture decisions Identify and implement architecture-level optimizations in complex distributed systems Architect and implement building blocks and APIs to accelerate the development of advanced distributed optimizations Lead cross-functional projects spanning multiple teams and multiple layers of a deep tech stack Build beautiful abstractions to seamlessly weave async RESTful layers with intensive data processing layers Collaborate with cloud inference team to maximize flexibility in scalable cluster deployments Develop extensible customization interfaces to support open source community models and features Develop detailed and intuitive metrics, logging, and profiling tools
What you bring to the table:
Expert-level Python programming with deep understanding of asyncio and event loops 5+ years of systems programming experience with focus on performance and concurrency Hands on experience with robust low-latency applications running production workloads Extensive experience designing software architecture, interfaces, and collaboration Deep understanding of the fundamentals of profiling, benchmarking, and performance optimization Creativity and curiosity for learning and solving complex distributed systems problems
Helpful, but not required
Experience working inside high-performance ML inference systems (e.g. vLLM, SGLang, etc.) Experience with Kubernetes, containers, microservices, and cloud-native architectures Experience with graph based (e.g. dataflow, actors) programming models and runtimes Experience with distributed runtimes such as Ray, Open MPI, Dask, Spark, etc
Location: San Francisco · On-site
ABOUT THE COMPANY
We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site
ABOUT THE ROLE
You build and operate the inference systems that serve our models in production. The work spans serving infrastructure, runtime optimization, and the long tail of production infrastructure that come with running real workloads.
This is an engineering role, not a research role. You'll measure, profile, debug, and ship. You'll work alongside researchers, but your job is to make their work fast and reliable in production. Real ownership, real autonomy.
WHAT YOU'LL DO
- Build, operate, and harden production inference systems serving large models at high throughput
- Own the performance characteristics of those systems end-to-end: throughput, latency, cost-per-token, reliability under load
- Profile real workloads to identify bottlenecks; ship fixes that move the metric you set out to improve
- Implement and integrate inference optimizations from the research team (quantization, custom kernels, scheduling improvements, memory management) into production
- Design observability into the inference layer: metrics, tracing, alerting that surface regressions before users notice them
- Run capacity planning, autoscaling, and load testing for varied workload shapes (batch, online, mixed, agentic)
- Diagnose and resolve production incidents; write postmortems that turn bugs into systemic fixes
WHAT WE'RE LOOKING FOR
- Senior ML systems engineer with 3+ years building production-grade, large-scale serving infrastructure
- Strong distributed systems experience ; you've been on-call for systems that matter
- Performance profiling and optimization fluency: you read flame graphs, you are analytical and measured before you change
- Experience with GPU-accelerated inference at scale (multi-GPU, multi-node, batched and streaming workloads), preferably experience with AMD GPUs
- Fluent Python; comfortable reading and writing systems-level code in at least one of the following languages: C++,CUDA, ROCm or Triton
- Track record of shipping production infrastructure, preferably surfaces serving millions of requests across diverse workloads
- Good written communication; you can write a runbook that someone else can follow at 3am
NICE TO HAVE
- Open-source contributions to inference / serving frameworks
- Experience with mixed cloud and on-premises deployments
- Familiarity with hardware-aware optimization (memory hierarchy, NCCL/RDMA, NUMA)
- Background in compilers, runtimes, or accelerator software stacks
THIS ROLE IS PROBABLY NOT FOR YOU IF
- You're primarily a researcher, the work here is building, not exploring
- You want to focus narrowly on one component; this role spans the stack
- Production responsibility (incidents, on-call, ownership of running systems) isn't appealing
P.S. We’re also hosting a small private dinner during MLSys for people interested in agents, recursive self-improvement, and AI infrastructure. Apply to join us here: https://luma.com/u6yt1gri
Inception creates the world’s fastest, most efficient AI models. Our Mercury model is the world’s fastest reasoning LLM and first commercially available diffusion LLM, delivering 5x greater speed and efficiency than today’s LLMs, with best-in-class quality.
We are the AI researchers and engineers behind such breakthrough AI technologies as diffusion models, flash attention, and DPO. The Role We're looking for engineers and scientists to design, optimize, and scale the systems that power our diffusion LLMs in production. Your work will make inference faster, more cost-effective, and more reliable.
Key Responsibilities - Build and optimize high-performance model serving systems for low-latency inference of diffusion LLMs. - Extend orchestration frameworks (Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving. - Implement and manage load balancing, autoscaling, and traffic routing for model endpoints. - Build systems for model versioning, canary deployments, and zero-downtime rollouts. - Develop monitoring, alerting, and observability tooling to ensure SLA compliance and rapid incident response. - Collaborate with ML researchers to translate model advances (new architectures, quantization techniques, batching strategies) into production-ready serving improvements.
Qualifications - BS/MS/PhD in Computer Science, Engineering, or a related field (or equivalent experience). - Knowledge of ML serving frameworks (SGLang, vLLM, Triton Inference Server, TensorRT-LLM). - Understanding of ML frameworks (PyTorch, TensorFlow) from a systems perspective. - Familiarity with high-performance computing and GPU programming (CUDA). - Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines. - Background in performance optimization and profiling of ML systems.
Preferred Skills - Experience building and maintaining large-scale language models with tens of billions of parameters or more. - Experience with distributed systems and cloud computing platforms (AWS/GCP/Azure). - Experience with ML workflow orchestration tools (Kubeflow, Airflow). - Experience with model optimization techniques (quantization, distillation, speculative decoding, continuous batching). - Knowledge of ML-specific infrastructure challenges (checkpointing, resource scheduling, etc.).
Why Join Inception - Work with World-Class Talent: Collaborate with the inventors of diffusion models and leading AI researchers - Shape Foundational Technology: Your decisions will influence how the next generation of AI products are built and used - Immediate Impact: Join at the ground floor where your contributions directly shape product direction and company trajectory
Perks & Benefits - Competitive salary and equity in a rapidly growing startup - Flexible vacation and paid time off (PTO) - Health, dental, and vision insurance - Catered meals (breakfast, lunch, & dinner) - Commuter subsidies - A collaborative and inclusive culture
About the Role
We're looking for a motivated LLM Systems Engineer willing to explore new and unconventional inference systems based on emerging hardware.
This role is part engineering, part research — you'll prototype algorithms suitable for our inference hardware and guide our hardware team on product definition. The ideal candidate has a proven track record of pursuing ML systems research and is very familiar with industry-standard LLM inference systems.
This role will be performed onsite in Santa Clara, CA or Boston, MA.
Essential Duties & Responsibilities
- Prototype and optimize emerging ML inference systems.
- Develop novel memory models for expandable vRAM.
- Write efficient GPU kernels for data movement.
- Perform design-space exploration, implementation, and benchmarking of inference engines, both in simulation and on real hardware.
Qualifications
- MS or PhD in computer systems, ideally with a focus on LLM inference and/or distributed systems.
- Prior experience contributing to core LLM inference infrastructures (vLLM, SGLang, TensorRT, etc.).
- Prior experience in accelerator programming (e.g. CUDA, JAX/Pallas, ROCm).
- Advanced computer architecture and performance engineering skills is a big plus.
Compensation & Benefits
- Competitive base salary, incentive-based bonus, and early stage equity grant
- Comprehensive health, dental, vision, and life insurance
- Relocation assistance and visa sponsorship
- Daily lunch stipend, 401k match, and more
- Sunny offices in Santa Clara, CA and Boston, MA
The Opportunity
- Impact: We are tackling a fundamental challenge at the infrastructure layer: unlocking greater AI capability while dramatically improving efficiency. The work we do here compounds across state-of-the-art AI models, systems, and real-world applications.
- Timing: Joining now means real ownership of the company and meaningful influence over product direction and execution. You'll work from first principles, move quickly from insight to execution, and see your contributions directly reflected in what we build.
- Culture: You'll work alongside a group of people who care deeply about rigor, clarity, and impact. We value thoughtful disagreement, fast learning, and intellectual fearlessness. This is a place where strong ideas shine, curiosity is encouraged, and growth is a daily practice.
About Modular
At Modular, we’re on a mission to revolutionize AI infrastructure by systematically rebuilding the AI software stack from the ground up. Our team, made up of industry leaders and experts, is building cutting-edge, modular infrastructure that simplifies AI development and deployment. By rethinking the complexities of AI systems, we’re empowering everyone to unlock AI’s full potential and tackle some of the world’s most pressing challenges.
If you’re passionate about shaping the future of AI and creating tools that make a real difference in people’s lives, we want you on our team. You can read about our culture and careers to understand how we work and what we value.
About the role:
In the Cloud Inference team, we are focused on building end to end distributed LLM inference deployments that are fully vertically integrated with the MAX stack. Our goal is to make inference both the fastest and most scalable while also building an easiest platform for deploying and scaling models for enterprises and developers alike. We're seeking engineers who are passionate about pushing the boundaries of distributed inference systems and enjoy working at the intersection of large-scale systems and machine learning. We are looking for candidates based on their breadth and depth of experience in backend engineering, AI inference, and distributed systems development. If this sounds exciting, we invite you to join our world-leading AI infrastructure team and help drive our industry forward!
LOCATION: Candidates based in the US or Canada are welcome to apply. You can work out of our office in Los Altos, CA or remotely from home. Onboarding for new hires is conducted in-person in our Los Altos, CA office.
What you will do:
Build & ship a LLM focused inference platform using best in class inference techniques (disaggregated inference, multi-node deployment of large models, high performance networking, distributed kv-cache management, high throughput batch processing, etc) Push the envelope for operational excellence with request-to-kernel observability, multi-cloud deployments, clever autoscaling, cold-start optimizations, and more. Collaborate with our kernels and genAI teams to achieve SOTA application performance by integrating SOTA kernel & serving optimizations with SOTA cluster optimizations. Build helm charts, kubernetes operators, and more to make a create simple, effective, maintainable deployments.
What you bring to the table:
5+ years of experience working in backend engineering Experience with kubernetes and operating your own services Ability to create durable, reusable software tools and libraries that are leveraged across teams and functions Experience in machine learning technologies and use cases Creativity and curiosity for solving complex problems, a team-oriented attitude that enables you to work well with others, and alignment with our culture Strongly identifies with our core company cultural values.
Helpful but not required:
Experience with high performance computing / networking Experience working on high scale ML inference infrastructure (traditional AI or genAI) Familiarity with golang
About the job
Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.
In this role, you will be advancing fundamental capabilities of AI to drive significant benefits to humanity. You will pioneer AI research in Singapore, focused on delivering the most performant, efficient and capable generative AI models.
Google Research is building the next generation of intelligent systems for all Google products. To achieve this, we’re working on projects that utilize the latest computer science techniques developed by skilled software developers and research scientists. Google Research teams collaborate closely with other teams across Google, maintaining the flexibility and versatility required to adapt new projects and foci that meet the demands of the world's fast-paced business needs.
Responsibilities
Abstract out key problems, design elegant and deep solutions for these problems through theoretical or empirical insights. Prototype, profile and benchmark solutions to showcase effectiveness. Lead and collaborate with research teams located across the globe. Drive and grow collaborations with product teams to land product innovations. Collaborate with hardware architects/infrastructure teams to inform design and algorithm decisions.
Team Description:
The Intelligent Foundations and Experiences (IFX) team is at the center of bringing our vision for AI at Capital One to life. We work hand-in-hand with our partners across the company to advance the state of the art in science and AI engineering, and we build and deploy proprietary solutions that are central to our business and deliver value to millions of customers. Our AI models and platforms empower teams across Capital One to enhance their products with the transformative power of AI, in responsible and scalable ways for the highest leverage impact.
In this role, you will:
Partner with a cross-functional team of engineers, research scientists, technical program managers, and product managers to deliver AI-powered products that change how our associates work and how our customers interact with Capital One.
Design, develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc.
Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more.
Invent and introduce state-of-the-art LLM optimization techniques to improve the performance — scalability, cost, latency, throughput — of large scale production AI systems.
Contribute to the technical vision and the long term roadmap of foundational AI systems at Capital One.
The Ideal Candidate:
You love to build systems, take pride in the quality of your work, and also share our passion to do the right thing. You want to work on problems that will help change banking for good.
Passion for staying abreast of the latest research, and an ability to intuitively understand scientific publications and judiciously apply novel techniques in production.
You adapt quickly and thrive on bringing clarity to big, undefined problems. You love asking questions and digging deep to uncover the root of problems and can articulate your findings concisely with clarity. You have the courage to share new ideas even when they are unproven.
You are deeply Technical. You possess a strong foundation in engineering and mathematics, and your expertise in hardware, software, and AI enable you to see and exploit optimization opportunities that others miss.
You are a resilient trail blazer who can forge new paths to achieve business goals when the route is unknown.
Basic Qualifications:
Bachelor's degree in Computer Science, AI, Electrical Engineering, Computer Engineering, or related fields plus at least 10 years of experience developing AI and ML algorithms or technologies, or a Master's degree in Computer Science, AI, Electrical Engineering, Computer Engineering, or related fields plus at least 8 years of experience developing AI and ML algorithms or technologies
At least 10 years of experience programming with Python, Go, Scala, or Java
Preferred Qualifications:
9 years of experience deploying scalable and responsible AI solutions on cloud platforms (e.g. AWS, Google Cloud, Azure, or equivalent private cloud)
Experience architecting, designing, developing, integrating, delivering, and supporting complex enterprise AI systems
Demonstrated ability to lead and mentor an engineering organization and influence cross-functional stakeholders up to the SVP level
Experience developing AI and ML algorithms or technologies (e.g. LLM Inference, Similarity Search and VectorDBs, Guardrails, Memory) using Python, C++, C#, Java, or Golang
Experience developing and applying state-of-the-art techniques for optimizing training and inference software to improve hardware utilization, latency, throughput, and cost
Passion for staying abreast of the latest AI research and AI systems, and judiciously apply novel techniques in production
Excellent communication and presentation skills
Location: San Francisco · On-site
ABOUT THE COMPANY
We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site
ABOUT THE ROLE
You'll build and maintain the ML systems and pipelines that our research runs on top of: data pipelines, training infrastructure, evaluation tooling, deployment, observability. The work bridges research and production, and you'll be the person who makes "we ran an experiment" actually mean "we ran it correctly, at scale, with results we trust."
This is a senior ML engineering role. You'll own systems end-to-end. You'll work with researchers daily and translate research code into infrastructure that the team can rely on. You'll move fast and you'll be measured on whether your systems make the team faster.
WHAT YOU'LL DO
- Build and maintain the training, evaluation, and deployment pipelines that our research runs on
- Take research code from prototype to production: refactor, harden, instrument, test
- Design observability into our ML systems (metrics, logs, traces, eval dashboards) so failures surface fast
- Own data pipelines for training and evaluation: ingest, dedup, version, validate
- Work closely with researchers to understand what they need, what's slow, and what's brittle
- Set engineering standards across our ML stack (testing, reviews, runbooks) so the team scales
- Contribute to architectural decisions that shape how research and production interact
WHAT WE'RE LOOKING FOR
- Senior ML engineer with 6+ years building production-grade ML systems
- Track record across the full lifecycle: data, training, evaluation, deployment, monitoring
- Strong distributed systems experience; you've shipped systems that have to be on
- Fluent Python, fluent with at least one of (PyTorch, JAX); comfortable at the systems-level when needed
- Comfortable with experimentation infrastructure (Ray, Slurm, Kubernetes, or similar)
- Bias toward shipping; you prefer working code over working diagrams
- Strong written communication
NICE TO HAVE
- Experience building experimentation platforms or research infrastructure at a frontier ML lab
- Background in distributed training systems
- Open-source contributions to ML infrastructure
- History of working effectively with small senior teams
THIS ROLE IS PROBABLY NOT FOR YOU IF
- You want to do research with engineering as a side activity: this is engineering as the main thing
- Cross-functional work with researchers (translation, scoping, education) doesn't appeal
- Long-running ownership of running systems isn't appealing: this role has it
P.S. We’re also hosting a small private dinner during MLSys for people interested in agents, recursive self-improvement, and AI infrastructure. Apply to join us here: https://luma.com/u6yt1gri