Skip to yearly menu bar Skip to main content


MLSys 2026 Career Opportunities

Here we highlight career opportunities submitted by our Exhibitors, and other top industry, academic, and non-profit leaders. We would like to thank each of our exhibitors for supporting MLSys 2026.

Search Opportunities

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

In this role, you will be advancing fundamental capabilities of AI to drive significant benefits to humanity. You will pioneer AI research in Singapore, focused on delivering the most performant, efficient and capable generative AI models.

Google Research is building the next generation of intelligent systems for all Google products. To achieve this, we’re working on projects that utilize the latest computer science techniques developed by skilled software developers and research scientists. Google Research teams collaborate closely with other teams across Google, maintaining the flexibility and versatility required to adapt new projects and foci that meet the demands of the world's fast-paced business needs.

Responsibilities

Abstract out key problems, design elegant and deep solutions for these problems through theoretical or empirical insights. Prototype, profile and benchmark solutions to showcase effectiveness. Lead and collaborate with research teams located across the globe. Drive and grow collaborations with product teams to land product innovations. Collaborate with hardware architects/infrastructure teams to inform design and algorithm decisions.

Inception creates the world’s fastest, most efficient AI models. Our Mercury model is the world’s fastest reasoning LLM and first commercially available diffusion LLM, delivering 5x greater speed and efficiency than today’s LLMs, with best-in-class quality.

We are the AI researchers and engineers behind such breakthrough AI technologies as diffusion models, flash attention, and DPO.

The Role We're looking for engineers and scientists to design, optimize, and maintain the core systems that enable scalable, efficient reinforcement learning for large models. This role sits at the intersection of research and large-scale systems engineering: you'll wear many hats, from optimizing rollout and reward pipelines to enhancing reliability, observability, and orchestration, collaborating closely with researchers to make RL stable, fast, and production-ready.

Key Responsibilities - Design, build, and optimize the infrastructure that powers large-scale reinforcement learning and post-training workloads. - Improve the reliability and scalability of RL training pipelines, distributed RL workloads, and training throughput. - Develop shared monitoring and observability tools to ensure high uptime, debuggability, and reproducibility for RL systems.

Qualifications - BS/MS/PhD in Computer Science, Engineering, or a related field (or equivalent experience). - Understanding of ML frameworks (PyTorch, TensorFlow, Ray, Megatron) from a systems perspective. - Experience working with reinforcement learning workloads (PPO, DPO, RLHF, or reward modeling). - Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines.

Preferred Skills - Experience building and maintaining large-scale language models with tens of billions of parameters or more. - Experience with ML workflow orchestration tools (Kubeflow, Airflow). - Background in performance optimization and profiling of ML systems.

Why Join Inception - Work with World-Class Talent: Collaborate with the inventors of diffusion models and leading AI researchers - Shape Foundational Technology: Your decisions will influence how the next generation of AI products are built and used - Immediate Impact: Join at the ground floor where your contributions directly shape product direction and company trajectory

Perks & Benefits - Competitive salary and equity in a rapidly growing startup - Flexible vacation and paid time off (PTO) - Health, dental, and vision insurance - Catered meals (breakfast, lunch, & dinner) - Commuter subsidies - A collaborative and inclusive culture

Location: San Francisco · On-site


ABOUT THE COMPANY

We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site

ABOUT THE ROLE

You build and operate the inference systems that serve our models in production. The work spans serving infrastructure, runtime optimization, and the long tail of production infrastructure that come with running real workloads.

This is an engineering role, not a research role. You'll measure, profile, debug, and ship. You'll work alongside researchers, but your job is to make their work fast and reliable in production. Real ownership, real autonomy.

WHAT YOU'LL DO

  • Build, operate, and harden production inference systems serving large models at high throughput
  • Own the performance characteristics of those systems end-to-end: throughput, latency, cost-per-token, reliability under load
  • Profile real workloads to identify bottlenecks; ship fixes that move the metric you set out to improve
  • Implement and integrate inference optimizations from the research team (quantization, custom kernels, scheduling improvements, memory management) into production
  • Design observability into the inference layer: metrics, tracing, alerting that surface regressions before users notice them
  • Run capacity planning, autoscaling, and load testing for varied workload shapes (batch, online, mixed, agentic)
  • Diagnose and resolve production incidents; write postmortems that turn bugs into systemic fixes

WHAT WE'RE LOOKING FOR

  • Senior ML systems engineer with 3+ years building production-grade, large-scale serving infrastructure
  • Strong distributed systems experience ; you've been on-call for systems that matter
  • Performance profiling and optimization fluency: you read flame graphs, you are analytical and measured before you change
  • Experience with GPU-accelerated inference at scale (multi-GPU, multi-node, batched and streaming workloads), preferably experience with AMD GPUs
  • Fluent Python; comfortable reading and writing systems-level code in at least one of the following languages: C++,CUDA, ROCm or Triton
  • Track record of shipping production infrastructure, preferably surfaces serving millions of requests across diverse workloads
  • Good written communication; you can write a runbook that someone else can follow at 3am

NICE TO HAVE

  • Open-source contributions to inference / serving frameworks
  • Experience with mixed cloud and on-premises deployments
  • Familiarity with hardware-aware optimization (memory hierarchy, NCCL/RDMA, NUMA)
  • Background in compilers, runtimes, or accelerator software stacks

THIS ROLE IS PROBABLY NOT FOR YOU IF

  • You're primarily a researcher, the work here is building, not exploring
  • You want to focus narrowly on one component; this role spans the stack
  • Production responsibility (incidents, on-call, ownership of running systems) isn't appealing

P.S. We’re also hosting a small private dinner during MLSys for people interested in agents, recursive self-improvement, and AI infrastructure. Apply to join us here: https://luma.com/u6yt1gri

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward. At YouTube, we believe that everyone deserves to have a voice, and that the world is a better place when we listen, share, and build community through our stories. We work together to give everyone the power to share their story, explore what they love, and connect with one another in the process. Working at the intersection of cutting-edge technology and boundless creativity, we move at the speed of culture with a shared goal to show people the world. We explore new ideas, solve real problems, and have fun — and we do it all together. The US base salary range for this full-time position is $174,000-$252,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities

Write and test product or system development code. Collaborate with peers and stakeholders through design and code reviews to ensure best practices amongst available technologies (e.g., style guidelines, checking code in, accuracy, testability, and efficiency). Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback. Triage product or system issues and debug/track/resolve by analyzing the sources of issues and the impact on hardware, network, or service operations and quality. Design and implement solutions in one or more specialized ML areas, leverage ML infrastructure, and demonstrate expertise in a chosen field.

Location: San Francisco · On-site


ABOUT THE COMPANY

We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site

ABOUT THE ROLE

You'll be researching the agents at the core of our work: multi-agent systems that conduct automated machine learning research and discovery. You'll design how these agents plan, decompose problems, choose what to try next, evaluate their own outputs, and recover from mistakes.

This is a deeply open-ended research role. The benchmarks for agents that do real research don't exist yet, and inventing them is part of the job. You'll move between method design, careful experimentation, building evaluation frameworks, and shipping into production. Real autonomy, real ownership, and the corresponding responsibility for choosing well.

WHAT YOU'LL DO

  • Design methods that improve how our agents plan, decompose tasks, use tools, manage context, and recover from failures across long-horizon research workflows
  • Develop multi-agent coordination patterns: how multiple agents share context, divide labor, supervise each other, and combine their outputs
  • Build and maintain evaluation frameworks for agent capability on open-ended tasks (the kind where the right answer isn't pre-specified)
  • Run rigorous experiments to characterize what works, what doesn't, and why: controls, ablations, statistical significance
  • Co-design agent architectures with engineering teammates; ship the most promising methods into production
  • Read deeply across the agentic ML, planning, RL, and tool-use literature; bring useful work from outside in
  • Share findings internally so the rest of the team builds on them
  • Help shape research direction across the team: agentic research taste compounds when discussed openly

WHAT WE'RE LOOKING FOR

  • Strong track record of ML research with focus on agents, RL, LLMs, planning, tool use, or multi-agent systems
  • 5+ years of hands-on research experience in industry or academia
  • Comfort designing experiments and running them end-to-end at scale
  • Track record of building evaluation frameworks for capabilities that aren't easily benchmarked
  • Bias toward shipping research, not handing it off
  • Strong written communication: you can compress a result into a paragraph that changes what someone else does next
  • Comfort with ambiguity: open-ended problems without fixed benchmarks are the work, not a frustration
  • Published research at NeurIPS, ICML, ICLR, COLM, RLC, or comparable venues

NICE TO HAVE

  • PhD in ML, statistics, CS, or adjacent
  • Published research on agentic systems, tool use, long-horizon planning, multi-agent coordination, or self-improvement methods
  • Open-source contributions in the agentic ML ecosystem (coding agents, research assistants, autonomous workflows)
  • Experience with reasoning models, chain-of-thought / scratchpad methods, or supervised fine-tuning for agentic behaviors
  • Background in evaluation methodology for capabilities that don't have established benchmarks

THIS ROLE IS PROBABLY NOT FOR YOU IF

  • You want to focus on a single stable benchmark: our agents work on open-ended problems and the targets shift
  • You prefer to keep research paper-only; these agents need to actually work- You'd rather work alone than share research taste openly with a small team

P.S. We’re also hosting a small private dinner during MLSys for people interested in agents, recursive self-improvement, and AI infrastructure. Apply to join us here: https://luma.com/u6yt1gri

We’re looking for an exceptional Performance Engineer to join our growing technology organization. Interviewing at PDT is intentionally focused on finding great people who can build long-term, impactful careers with us.

Performance Engineers at PDT are responsible for deeply understanding and optimizing the systems that enable our trading strategies at scale. You will work at the intersection of software, systems, and hardware to analyze performance, drive infrastructure efficiency, and free up critical compute capacity. Your work directly amplifies researcher velocity and scales our core models, creating massive impact through both cost savings and accelerated innovation. You'll thrive at PDT if you love open-ended problems, diving into GPU optimization and system optimization/design, and are excited to take your discoveries all the way to production at scale.

This is a hybrid position and will require the person to work from our New York City office at a minimum of 3 days a week.

Why join us

PDT Partners has a stellar 30+ year track record and a reputation for excellence. Our goal is to be the best quantitative investment manager in the world, measured by the quality of our products, not their size. PDT’s very high employee-retention rate speaks for itself. Our people are intellectually extraordinary, and our community is close-knit, down-to-earth, and diverse.

Key Responsibilities

Analyze and understand system performance to enhance researcher throughput and velocity.

Focus on infrastructure/system-level efficiency, working across Python, PyTorch, OS, networking, storage, and CPU/GPU layers to optimize compute resource utilization

Read and understand software layers, providing suggestions/PRs that optimize parts of codebases.

Free up capacity and reduce costs by improving computational efficiency

Support scaling of core models by ensuring efficient implementation

Propose and implement systems to improve performance telemetry

Conduct proof-of-concept (PoC) evaluations and contribute to system design

Identify and act on optimization opportunities across the stack

Below is a list of skills and experiences we think are relevant. Even if you don’t think you’re a perfect match, we still encourage you to apply because we are committed to developing our people.

Strong proficiency in Linux and its associated performance engineering toolset.

Experience with PyTorch, GPUs and CUDA for optimization.

Deep understanding and appreciation of what happens at the hardware-software interface.

Versatile engineering mindset: ability to learn quickly, tackle diverse challenges, and adapt.

Skills in coding, micro-optimization, and understanding multiple programming languages.

Ability to analyze performance without being solely focused on heads-down optimization.

The salary range for this role is between $195,000 and $225,000. This range is not inclusive of any potential bonus amounts. Factors that may impact the agreed upon salary within the range for a particular candidate include years of experience, level of education obtained, skill set, and other external factors.

PRIVACY STATEMENT: For information on ways PDT may collect, use, and process your personal information, please see PDT’s privacy notices.

Inception creates the world’s fastest, most efficient AI models. Our Mercury model is the world’s fastest reasoning LLM and first commercially available diffusion LLM, delivering 5x greater speed and efficiency than today’s LLMs, with best-in-class quality.

We are the AI researchers and engineers behind such breakthrough AI technologies as diffusion models, flash attention, and DPO. The Role We're looking for engineers and scientists to design, optimize, and scale the systems that power our diffusion LLMs in production. Your work will make inference faster, more cost-effective, and more reliable.

Key Responsibilities - Build and optimize high-performance model serving systems for low-latency inference of diffusion LLMs. - Extend orchestration frameworks (Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving. - Implement and manage load balancing, autoscaling, and traffic routing for model endpoints. - Build systems for model versioning, canary deployments, and zero-downtime rollouts. - Develop monitoring, alerting, and observability tooling to ensure SLA compliance and rapid incident response. - Collaborate with ML researchers to translate model advances (new architectures, quantization techniques, batching strategies) into production-ready serving improvements.

Qualifications - BS/MS/PhD in Computer Science, Engineering, or a related field (or equivalent experience). - Knowledge of ML serving frameworks (SGLang, vLLM, Triton Inference Server, TensorRT-LLM). - Understanding of ML frameworks (PyTorch, TensorFlow) from a systems perspective. - Familiarity with high-performance computing and GPU programming (CUDA). - Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines. - Background in performance optimization and profiling of ML systems.

Preferred Skills - Experience building and maintaining large-scale language models with tens of billions of parameters or more. - Experience with distributed systems and cloud computing platforms (AWS/GCP/Azure). - Experience with ML workflow orchestration tools (Kubeflow, Airflow). - Experience with model optimization techniques (quantization, distillation, speculative decoding, continuous batching). - Knowledge of ML-specific infrastructure challenges (checkpointing, resource scheduling, etc.).

Why Join Inception - Work with World-Class Talent: Collaborate with the inventors of diffusion models and leading AI researchers - Shape Foundational Technology: Your decisions will influence how the next generation of AI products are built and used - Immediate Impact: Join at the ground floor where your contributions directly shape product direction and company trajectory

Perks & Benefits - Competitive salary and equity in a rapidly growing startup - Flexible vacation and paid time off (PTO) - Health, dental, and vision insurance - Catered meals (breakfast, lunch, & dinner) - Commuter subsidies - A collaborative and inclusive culture