MLSys 2025 Career Opportunities
Here we highlight career opportunities submitted by our Exhibitors, and other top industry, academic, and non-profit leaders. We would like to thank each of our exhibitors for supporting MLSys 2025.
Search Opportunities
VLM Run – Founding ML Systems Engineer / Researcher
We're building bleeding-edge infrastructure for Vision Language Models (VLMs). Join us as a founding engineer to reimagine the visual AI infrastructure layer for enterprises.
📍 Location: Santa Clara, CA (3+ days/week)
🧠 Roles: ML Systems Engineer & Applied ML/CV Researcher
💰 Comp: $150K – $220K + 0.5 – 3% equity
📬 Apply: hiring@vlm.run with GitHub + standout work
🧱 What We’re Building
VLM Run is a horizontal platform to fine-tune, serve, and specialize VLMs with structured JSON outputs — for docs, images, video, and beyond.
Think of it as the orchestration layer for next-gen visual agents — built on a developer-friendly API and production-grade runtime.
We're tackling: - Fast inference: High-throughput, low-latency inference for multimodal ETL (vLLM-style, but for visual content like images, videos, streaming content) - Fine-tuning infra: Scalable fine-tuning and distillation for structured, multi-modal tasks (OCR++, layout parsing, video QA) - Compiler infra: All kinds of optimizations to make our GPUs go brrr (OpenAI Triton kernels, speculative/guided decoding etc)
We’re early — you’ll define the infrastructure backbone of VLMs in production.
💡 Why This Matters
Most VLMs are stuck in demos — slow, flaky, and hard to deploy.
We're fixing that with: - Developer-native APIs (not chat-based hacks) - Structured JSON outputs for automation - Fast, predictable inference on non-text modalities
You'll work on core ML systems — not glue code — with full ownership over compiler paths, serving infra, and fine-tuning pipelines.
👩💻 What You’ll Do
You'll shape the future of how VLMs are trained, served, and used in production. Your work could include: - Building low-latency runtimes and speculative decoders - Shipping distillation pipelines that power real-time visual agents - Designing APIs that make visual data programmable for developers
✅ You Might Be a Fit If:
- Built or optimized ML compilers, kernels, or serving infra (Triton, vLLM, TVM, XLA, ONNX)
- Deep PyTorch/HuggingFace experience; trained ViTs or LLaMA/Qwen-class models
- 2+ YOE post-MS or 4+ YOE post-BS in ML infra, CV systems, or compiler teams
- Bonus: Published OSS or papers, shipped SaaS infra, or scaled training/serving infra
🌎 Logistics
- Compensation: $150K – $220K + 0.5 – 3% equity
- In-Person: 3+ days/week in Santa Clara, CA
- Benefits: Top-tier healthcare, 401K, early ownership
🔗 Apply Now
📧 hiring@vlm.run
🌐 www.vlm.run
💼 LinkedIn
📎 Send GitHub, standout projects, or a quick note on why this is a fit.
Let’s build the future of visual intelligence — fast, structured, and programmable.
Location Seattle Cupertino
Description
AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.
Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.
AWS Neuron is the complete software stack for the AWS Inferentia (Inf1/Inf2) and Trainium (Trn1), our cloud-scale Machine Learning accelerators. This role is for a senior machine learning engineer in the Distribute Training team for AWS Neuron, responsible for development, enablement and performance tuning of a wide variety of ML model families, including massive-scale Large Language Models (LLM) such as GPT and Llama, as well as Stable Diffusion, Vision Transformers (ViT) and many more.
The ML Distributed Training team works side by side with chip architects, compiler engineers and runtime engineers to create, build and tune distributed training solutions with Trainium instances. Experience with training these large models using Python is a must. FSDP (Fully-Sharded Data Parallel), Deepspeed and other distributed training libraries are central to this and extending all of this for the Neuron based system is key.
Key job responsibilities You will help lead the efforts building distributed training support into Pytorch, Tensorflow using XLA and the Neuron compiler and runtime stacks. You will help tune these models to ensure highest performance and maximize the efficiency of them running on the custom AWS Trainium and Inferentia silicon and the Trn1, Inf1/2 servers. Strong software development and Machine Learning knowledge are both critical to this role.
About the team Annapurna Labs was a startup company acquired by AWS in 2015, and is now fully integrated. If AWS is an infrastructure company, then think Annapurna Labs as the infrastructure provider of AWS. Our org covers multiple disciplines including silicon engineering, hardware design and verification, software, and operations. AWS Nitro, ENA, EFA, Graviton and F1 EC2 Instances, AWS Neuron, Inferentia and Trainium ML Accelerators, and in storage with scalable NVMe, are some of the products we have delivered, over the last few years.
Location Seattle Cupertino
Description The Product: AWS Machine Learning accelerators are at the forefront of AWS innovation and one of several AWS tools used for building Generative AI on AWS. The Inferentia chip delivers best-in-class ML inference performance at the lowest cost in cloud. Trainium will deliver the best-in-class ML training performance with the most teraflops (TFLOPS) of compute power for ML in the cloud. This is all enabled by cutting edge software stack, the AWS Neuron Software Development Kit (SDK), which includes an ML compiler, runtime and natively integrates into popular ML frameworks, such as PyTorch, TensorFlow and MxNet. AWS Neuron and Inferentia are used at scale with customers like Snap, Autodesk, Amazon Alexa, Amazon Rekognition and more customers in various other segments.
The Team: As a whole, the Amazon Annapurna Labs team is responsible for silicon development at AWS. The team covers multiple disciplines including silicon engineering, hardware design and verification, software and operations.
The AWS Neuron team works to optimize the performance of complex neural net models on our custom-built AWS hardware. More specifically, the AWS Neuron team is developing a deep learning compiler stack that takes neural network descriptions created in frameworks such as TensorFlow, PyTorch, and MXNET, and converts them into code suitable for execution. As you might expect, the team is comprised of some of the brightest minds in the engineering, research, and product communities, focused on the ambitious goal of creating a toolchain that will provide a quantum leap in performance.
You: As a Sr. Machine Learning Compiler Engineer III on the AWS Neuron team, you will be a thought leader supporting the ground-up development and scaling of a compiler to handle the world's largest ML workloads. Architecting and implementing business-critical features, publish cutting-edge research, and mentoring a brilliant team of experienced engineers excites and challenges you. You will leverage your technical communications skill as a hands-on partner to AWS ML services teams and you will be involved in pre-silicon design, bringing new products/features to market, and many other exciting projects. A background in Machine Learning and AI accelerators is preferred, but not required.
About the team Why AWS Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.
New York
Quantitative Strategies / Technology
Overview
At the D. E. Shaw group, technology is integral to virtually everything we do. We’re seeking exceptional software developers with expertise in generative AI (GAI) to join our team. As a lead software developer in GAI, you’ll lead innovative projects and teams, leveraging your extensive experience and leadership skills to advance our GAI initiatives. By making GAI more accessible for both technical and non-technical users across the firm, you’ll drive substantial business impact.
What you’ll do day-to-day
You’ll join a dynamic environment, leading efforts in advancing GAI capabilities. Depending on your skills and interests, potential areas of focus may include:
- Leading the development and maintenance of shared GAI infrastructure and applications, ensuring data is prepared and integrated for effective use in GAI initiatives, and enhancing software development team productivity through GAI.
- Building sophisticated retrieval-augmented generation (RAG) pipelines over large document sets to improve data utility and accessibility across the firm.
- Managing collaboration with internal groups and end users, accelerating AI product development and deployment, and customizing solutions to their needs.
- Leading experimentation with new AI-driven tools and applications, integrating them into various platforms, and fostering collaboration to enhance AI effectiveness.
- Driving greenfield projects, which offer significant opportunities for ownership and growth in a rapidly expanding GAI landscape.
Who we’re looking for
- We’re looking for candidates who have a strong background in software development and a solid understanding of GAI technologies.
- Successful developers have traditionally been top performers in their academic programs and possess a strong foundation in AI-related projects.
- We’re particularly interested in outstanding candidates who have 6+ years of overall experience; who are eager to thrive in an inclusive, collaborative, and fast-paced environment; and who have a proven track record of leading projects and successfully leading or managing teams.
- The expected annual base salary for this position is USD 275,000 USD to USD 350,000. Our compensation and benefits package includes substantial variable compensation in the form of a year-end bonus, guaranteed in the first year of hire, and benefits including medical and prescription drug coverage, 401(k) contribution matching, wellness reimbursement, family building benefits, and a charitable gift match program.
Location: San Jose, California, US
Alternate Location: San Francisco, CA; Seattle, WA
Why You’ll Love Cisco
Everything is converging on the Internet, making networked connections more meaningful than ever before in our lives. Our employees' groundbreaking ideas impact everything. Here, that means we take creative ideas from the drawing board to build dynamic solutions that have real world impact. You'll collaborate with Cisco leaders, partner with mentors, and develop incredible relationships with colleagues who share your interest in connecting the unconnected. You'll be part a team that cares about its customers, enjoys having fun, and you'll take part in changing the lives of those in our local communities. Come prepared to be encouraged and inspired.
Who We Are
The Cisco’s AI Research team consists of AI research scientists, data scientists, and network engineers with subject matter expertise who collaborate on both basic and applied research projects. We are motivated by tackling unique research challenges that arise when connecting people and devices at a world-wide scale.
Who You’ll Work With
You will join a newly formed, dynamic AI team as one of the core members, and have the opportunity to influence the culture and direction of the growing team. Our team includes AI experts and networking domain experts who work together and learn from each other. We work closely with engineers, product managers and strategists who have deep expertise and experience in AI and/or distributed systems.
What You’ll Do
Your primary role is to produce research advances in the field of Generative AI that improve the capabilities of models or agents for networking automation, human-computer interaction, model safety, or other strategic gen-AI powered networking areas. You will research building domain-specific foundational representations relevant to networking, etc. that provide differentiative value across diverse sets of applications. You will be a thought leader in the global research community via publishing papers, giving technical talks, organizing workshops etc.
Minimum qualifications
- PhD in Computer Science or a relevant technical field and 2+ years of experience within an industry or academic research lab or a Masters Degree and 6+ years of experience within an industry or academic research lab and a minimum of 3 publications within top AI Venues such as ACL, EMNLP, ICLR, ICML, NAACL, NeurIPS
- Experience working with Machine Learning Models (MLMs) and familiarity with associated frameworks, such as TensorFlow, PyTorch, Hugging Face, or equivalent platforms
Preferred qualifications
- Experience driving research projects within an industry or university lab
- Interest in combining representation learning and problem-specific properties
- Experience in building, fine-tuning foundation models including LLMs and multi-modal models or domain specific models
- Ability to maintain cutting-edge knowledge in generative AI, Large Language Models (LLMs), and multi-modal models and apply these technologies innovatively to emerging business problems, use cases, and scenarios
- Outstanding communication, interpersonal, relationship building skills conducive to collaboration
- Experience working in an industrial research lab (full-time, internship, sabbatical, etc.)
Lecturer/ Senior Lecturer
The Department of Computer Science at the University of Bath invites applications for up to seven faculty positions at various ranks from candidates who are passionate about research and teaching in artificial intelligence and machine learning. These are permanent positions with no tenure process. The start date is flexible.
The University of Bath is based on an attractive, single-site campus that facilitates interdisciplinary research. It is located on the edge of the World Heritage City of Bath and offers the lifestyle advantages of working and living in one of the most beautiful areas in the United Kingdom.
For more information and to apply, please visit: https://www.bath.ac.uk/campaigns/join-the-department-of-computer-science/
Location
Cupertino
Description
AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators. As a part of the Neuron Frameworks team you'll develop and enhance support PyTorch and JAX for AWS Neuron, working with the open source ecosystem.
You will develop and extend support for the leading ML frameworks, delivering an outstanding user experience for PyTorch and JAX ML model development on the Trainium and Inferentia accelerators. You will work closely with teams across AWS Neuron including compiler, training and inference optimization to optimize frameworks for AWS's accelerator architectures, and engage closely with the PyTorch and JAX and other ML Framework communities to take advantage of their latest capabilities and improve performance and usability for ML model developers.
A successful candidate will have a experience developing Machine Learning infrastructure and/or ML Frameworks, a demonstrated ability to work with open source communities to influence future community direction, a robust technical ability and a motivation to achieve results. Experience with technologies and tools such as XLA, vLLM or Hugging Face transformers is highly valued.
Utility Computing (UC) AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services.
SF Bay Area or New York City
About the role We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.
Responsibilities:
Provide infrastructure support to our ML research and product
Build tooling to diagnose cluster issues and hardware failures
Monitor deployments, manage experiments, and generally support our research
Maximize GPU allocation and utilization for both serving and training
Requirements:
4+ years of experience supporting the infrastructure within an ML environment
Experience in developing tools used to diagnose ML infrastructure problems and failures
Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)
Experience working with GPUs
Nice to have
Experience with large GPU clusters and high-performance computing/networking
Experience with supporting large language model training
Experience with ML frameworks like Pytorch/TensorFlow/JAX
Experience with GPU kernel development
San Francisco, California
Founded in late 2020 by a small group of machine learning researchers, Mosaic AI enables companies to create state-of-the-art AI models from scratch on their own data. From a business perspective, Mosaic AI is committed to the belief that a company’s AI models are just as valuable as any other core IP, and that high-quality AI models should be available to all. From a scientific perspective, Mosaic AI is committed to reducing the cost of training state-of-the-art models - and sharing our knowledge about how to do so with the world - to allow everyone to innovate and create models of their own.
Now part of Databricks since July 2023 as the GenAI Team, we are passionate about enabling our customers to solve the world's toughest problems by building and running the world's best data and AI platform. We leap at every opportunity to solve technical challenges, striving to empower our customers with the best data and AI capabilities.
You will: - Explore and analyze performance bottlenecks in ML training and inference - Design, implement and benchmark libraries and methods to overcome aforementioned bottlenecks - Build tools for performance profiling, analysis, and estimation for ML training and inference - Balance the tradeoff between performance and usability for our customers - Facilitate our community through documentation, talks, tutorials, and collaborations - Collaborate with external researchers and leading AI companies on various efficiency methods
We look for: - Hands on experience the internals of deep learning frameworks (e.g. PyTorch, TensorFlow) and deep learning models - Experience with high-performance linear algebra libraries such as cuDNN, CUTLASS, Eigen, MKL, etc. - General experience with the training and deployment of ML models - Experience with compiler technologies relevant to machine learning - Experience with distributed systems development or distributed ML workloads - Hands on experience with writing CUDA code and knowledge of GPU internals (Preferred) - Publications in top tier ML or System Conferences such as MLSys, ICML, ICLR, KDD, NeurIPS (Preferred)
We value candidates who are curious about all parts of the company's success and are willing to learn new technologies along the way.
Location: San Jose, California, US
Alternate Location: San Francisco, CA; Seattle, WA
Meet the Team
The Cisco’s AI Research team consists of AI research scientists, data scientists, and network engineers with subject matter expertise who collaborate on both basic and applied research projects. We are motivated by tackling unique research challenges that arise when connecting people and devices at a world-wide scale.
Who You’ll Work With
You will join a newly formed, dynamic AI team as one of the core members, and have the opportunity to influence the culture and direction of the growing team. Our team includes AI experts and networking domain experts who work together and learn from each other. We work closely with engineers, product managers and strategists who have deep expertise and experience in AI and/or distributed systems.
What You’ll Do
Your primary role is to produce research advances in the field of Generative AI that improve the capabilities of models or agents for networking automation, human-computer interaction, model safety, or other strategic gen-AI powered networking areas. You will research building domain-specific foundational representations relevant to networking, etc. that provide differentiative value across diverse sets of applications. You will be a thought leader in the global research community via publishing papers, giving technical talks, organizing workshops etc.
Minimum qualifications
- PhD in Computer Science or a relevant technical field and experience within an industry or academic research lab or a Masters Degree with strong LLM pre-training and post training experience within an industry or academic research lab and a minimum of 3 publications within top AI Venues such as ACL, EMNLP, ICLR, ICML, NAACL, NeurIPS
- Experience working with Machine Learning Models (MLMs) and familiarity with associated frameworks, such as TensorFlow, PyTorch, Hugging Face, or equivalent platforms
Preferred qualifications
- Experience driving research projects within an industry or university lab
- Interest in combining representation learning and problem-specific properties
- Experience in building, fine-tuning foundation models including LLMs and multi-modal models or domain specific models
- Ability to maintain cutting-edge knowledge in generative AI, Large Language Models (LLMs), and multi-modal models and apply these technologies innovatively to emerging business problems, use cases, and scenarios
- Outstanding communication, interpersonal, relationship building skills conducive to collaboration
- Experience working in an industrial research lab (full-time, internship, etc.)