Skip to yearly menu bar Skip to main content


MLSys 2026 Career Opportunities

Here we highlight career opportunities submitted by our Exhibitors, and other top industry, academic, and non-profit leaders. We would like to thank each of our exhibitors for supporting MLSys 2026.

Search Opportunities

The D. E. Shaw group seeks exceptional software engineers with expertise in applied AI, AI agents, and agentic systems to join the firm. This role offers the chance to work directly with a variety of groups at the firm on innovative, greenfield projects that transform how teams operate—leveraging quantitative and programming skills to design, build, and deploy AI solutions that drive efficiency, enhance analytical capabilities, and accelerate decision-making across the firm.

What you'll do day-to-day

You’ll join a dynamic team, with the potential to: - Collaborate directly with internal groups and end users across various functions to build bespoke AI agents and applications tailored to nuanced, real-world business needs. - Lead and contribute to greenfield AI projects, taking ownership from concept through production and helping shape internal AI strategy and adoption. - Experiment with emerging AI tools and model capabilities, rapidly prototyping and integrating them across platforms to enhance usability, scalability, and effectiveness. - Scale the adoption of AI tools firmwide by developing best practices, frameworks, and reusable components that drive innovation and productivity. - Build foundational AI components, such as agent frameworks, reusable “skills,” and large-scale retrieval systems, to support AI tools and applications. - Design, develop, and maintain shared AI infrastructure and agentic applications, ensuring firmwide data integration and enhancing software development efficiency.

Who we're looking for
  • A bachelor’s degree in any field is required, along with an extensive background in software development, and hands-on experience building and scaling AI solutions at the product, system, or company level.
  • Solid understanding of AI technologies and an interest in developing advanced AI applications and frameworks.
  • Demonstrated ability to thrive in technical or entrepreneurial environments, along with the capability to solve complex challenges and lead projects from inception to deployment.
  • A record of strong academic or professional achievement, with analytical depth and creativity in AI-related projects.
  • We welcome outstanding candidates at all experience levels who are excited to work in a collegial, collaborative, and fast-paced environment.
  • The expected annual base salary for this position is $225,000 to $275,000. Our compensation and benefits package includes substantial variable compensation in the form of a year-end bonus, guaranteed in the first year of hire, a sign-on bonus, and benefits including medical and prescription drug coverage, 401(k) contribution matching, wellness reimbursement, family building benefits, and a charitable gift match program.

Overview:

At Capital One, we are creating trustworthy and reliable AI systems, changing banking for good. For years, Capital One has been leading the industry in using machine learning to create real-time, intelligent, automated customer experiences. From informing customers about unusual charges to answering their questions in real time, our applications of AI & ML are bringing humanity and simplicity to banking. We are committed to building world-class applied science and engineering teams and continue our industry leading capabilities with breakthrough product experiences and scalable, high-performance AI infrastructure. At Capital One, you will help bring the transformative power of emerging AI capabilities to reimagine how we serve our customers and businesses who have come to love the products and services we build.

Team Description:

The AI Foundations team is at the center of bringing our vision for AI at Capital One to life. Our work touches every aspect of the research life cycle, from partnering with Academia to building production systems. We work with product, technology and business leaders to apply the state of the art in AI to our business.

In this role, you will:

Partner with a cross-functional team of data scientists, software engineers, machine learning engineers and product managers to deliver AI-powered products that change how customers interact with their money.

Leverage a broad stack of technologies — Pytorch, AWS Ultraclusters, Huggingface, Lightning, VectorDBs, and more — to reveal the insights hidden within huge volumes of numeric and textual data.

Build AI foundation models through all phases of development, from design through training, evaluation, validation, and implementation.

Engage in high impact applied research to take the latest AI developments and push them into the next generation of customer experiences.

Flex your interpersonal skills to translate the complexity of your work into tangible business goals.

The Ideal Candidate:

You love the process of analyzing and creating, but also share our passion to do the right thing. You know at the end of the day it’s about making the right decision for our customers.

Innovative. You continually research and evaluate emerging technologies. You stay current on published state-of-the-art methods, technologies, and applications and seek out opportunities to apply them.

Creative. You thrive on bringing definition to big, undefined problems. You love asking questions and pushing hard to find answers. You’re not afraid to share a new idea.

A leader. You challenge conventional thinking and work with stakeholders to identify and improve the status quo. You’re passionate about talent development for your own team and beyond.

Technical. You’re comfortable with open-source languages and are passionate about developing further. You have hands-on experience developing AI foundation models and solutions using open-source tools and cloud computing platforms.

Has a deep understanding of the foundations of AI methodologies.

Experience building large deep learning models, whether on language, images, events, or graphs, as well as expertise in one or more of the following: training optimization, self-supervised learning, robustness, explainability, RLHF.

An engineering mindset as shown by a track record of delivering models at scale both in terms of training data and inference volumes.

Experience in delivering libraries, platform level code or solution level code to existing products.

A professional with a track record of coming up with high quality ideas or improving upon existing ideas in machine learning, demonstrated by accomplishments such as first author publications or projects.

Possess the ability to own and pursue a research agenda, including choosing impactful research problems and autonomously carrying out long-running projects.

About the job

Like Google's own ambitions, the work of a Software Engineer goes beyond just Search. Software Engineering Managers have not only the technical expertise to take on and provide technical leadership to major projects, but also manage a team of Engineers. You not only optimize your own code but make sure Engineers are able to optimize theirs. As a Software Engineering Manager you manage your project goals, contribute to product strategy and help develop your team. Teams work all across the company, in areas such as information retrieval, artificial intelligence, natural language processing, distributed computing, large-scale system design, networking, security, data compression, user interface design; the list goes on and is growing every day. Operating with scale and speed, our exceptional software engineers are just getting started -- and as a manager, you guide the way.

With technical and leadership expertise, you manage engineers across multiple teams and locations, a large product budget and oversee the deployment of large-scale projects across multiple sites internationally.

As a Software Engineering Manager in the Google Distributed Cloud (GDC) and Sovereign Cloud organization, you will lead teams bringing Google’s infrastructure and AI to secure on-premises and edge environments. You will architect orchestration platforms that define the industry’s approach to distributed computing. Your teams will solve challenges across the hardware footprints, including accelerator fleets, to optimize performance and efficiency. By driving sovereign initiatives, you ensure advanced capabilities operate within complex regulated frameworks globally. We seek a leader to manage distributed engineering teams and deploy, large-scale projects at the intersection of global cloud infrastructure and modern security.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

The US base salary range for this full-time position is $207,000-$300,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities

Set team priorities supporting GDC and Sovereign Cloud goals, specifically for GenAI and air-gapped runtimes. Align direction and decision-making across distributed teams to ensure high-velocity platform execution. Scale and mentor an engineering organization specialized in container orchestration and AI infrastructure. Provide continuous coaching to help individuals navigate technical complexities and career growth within highly regulated environments. Architect the mid-term technical road map for GKE AI, optimizing for accelerator (GPU/TPU) efficiency and cost-per-token. Evolve systems to meet future infrastructure needs for autonomous, self-managing cloud operations in restricted settings. Design and vet complex system architectures, solving ambiguous problems balancing performance and sovereignty. Advocate the engineering best practices, style, testability, and efficiency to ensure, secure, and auditable deployments.

About Unconventional

Since 2022, AI has entered the mainstream, reshaping entire industries from education and software development to fundamental consumer behaviors. This revolution has created an unprecedented demand for computation - a demand that is now fundamentally limited by energy, not just in the datacenter, but at a global scale.

At Unconventional, our mission is to solve this. We are rethinking computing from the ground up to build a new foundation for AI that is 1000x more efficient. We're doing this by exploiting the rich physics of semiconductors, mapping neural networks directly to the device physics rather than relying on layers of inefficient abstraction.

The Role

You will be a key contributor to our training ecosystem. Your goal is to build the next-generation ML model training platform tailored for a world where compute is no longer constrained by the digital abstraction.

You will co-design and implement training systems alongside novel AI models and hardware platforms that push the boundaries of physics-based compute.

What You’ll Do

  • The Model Architectures: Build and maintain highly optimized, model-specific training stacks specifically tuned for state-of-the-art generative vision, language, and world models. 
  • The Training Infrastructure: Design and scale multi-node distributed training systems, implementing elastic sharding and robust data streaming pipelines for fast, large-scale iteration. Implement and robust model checkpointing and recovery mechanisms.
  • Optimization & Benchmarking: Develop and optimize kernels using low-level programming models like CUDA andTriton. Design rigorous benchmarking suites to track Model Flops Utilization (MFU), memory bandwidth, and convergence stability.
  • Cross-Functional Collaboration: Act as a translator, discussing algorithmic trade-offs with theorists and converting model requirements into concrete specifications for infrastructure and hardware engineering teams.

Minimum Qualifications

  • Education: An MS/PhD or equivalent research/project experience in a quantitative field such as AI/Machine Learning, Computer Science, Physics, Electrical Engineering, or Applied Math.
  • Experience: Veteran of the modern ML software stack.  Demonstrated ability to map state-of-the-art AI model architectures (e.g., transformers, Mixture of Experts, diffusion models) to system performance implication.  Deep expertise in how models are partitioned across a cluster, with a mastery of communication primitives, and parallelism strategies.
  • Software Development: Proven track record of implementing, debugging, and maintaining production-grade training frameworks—such as Megatron-LM, DeepSpeed, Ray, PyTorch Lightning—turning raw compute into a reliable model-building factory.  

Preferred Qualifications (Nice to Have)

  • Unconventional Co-Design: A forward-looking perspective on co-designing algorithms for unconventional computing paradigms that map closely to the physics of underlying systems.

Why Join Us?

  • The Mission: Redefine computing for the next 50 years by solving the fundamental energy limitation of AI at a global scale.
  • The Impact: Shape the company's future as a foundational team member. Enjoy massive ownership and an outsized opportunity to drive change.
  • The Perks: A comprehensive package including best-in-class health benefits, 401k matching, truly unlimited PTO, and complimentary meals in our Palo Alto office.

About Unconventional

Since 2022, AI has entered the mainstream, reshaping entire industries from education and software development to fundamental consumer behaviors. This revolution has created an unprecedented demand for computation - a demand that is now fundamentally limited by energy, not just in the datacenter, but at a global scale.

At Unconventional, our mission is to solve this. We are rethinking computing from the ground up to build a new foundation for AI that is 1000x more efficient. We're doing this by exploiting the rich physics of semiconductors, mapping neural networks directly to the device physics rather than relying on layers of inefficient abstraction.

The Role

As a Member of Technical Staff, you will be a foundational member of our small, multi-disciplinary R&D team. We are looking for 'first principles' thinkers who are excited to tackle the hardest, most ambiguous technical challenges at the intersection of AI, physics, and computer architecture. You will be responsible for driving invention, prototyping, and validation of the core components of our novel computing platform.

Your work will be fluid and could span from theoretical modeling and simulation to algorithm development, hardware/software co-design, or experimental validation in collaboration with other team members. We're hiring exceptional problem-solvers who can navigate deep uncertainty and help chart our technical roadmap.

What We're Looking For   - Exceptional technical ability in a quantitative field (e.g., Physics, Computer Science, Electrical Engineering, Applied Math, or a related discipline). - An MS/PhD or equivalent research/project experience is strongly preferred. - A "0-to-1" mindset. You have a demonstrated history of tackling complex, ambiguous R&D problems, often from a blank slate. - Deep curiosity. You are comfortable diving into new domains, whether it's semiconductor physics, machine learning theory, or systems-level design. - A creative and unconventional approach to problem-solving.

Core Technical Competencies

  • Analytic Foundations: Core competences in the analysis of nonlinear dynamical systems (ODEs, PDEs, SDEs), ideally with experience analyzing the stability, noise robustness, and capacity of such systems. Strong candidates are able to leverage the properties of (or impose constraints on) such systems to develop analytic insights.
  • Practical Eye: The ability to leverage analytic insights to build practical tools – metrics, algorithmic optimizations, and automated analyses – that can be used to study dynamical systems. Strong candidates are comfortable navigating the tradeoff space between theoretic purity and practicality to realize useful tooling.
  • Programming Proficiency: Strong command of Python and expertise with using numeric computing and visualization libraries, such as numpy, scipy, and matplotlib. Experience with libraries geared toward analysis, such as computer algebra libraries (e.g., sympy) also recommended.
  • ML/AI Familiarity: Familiarity with dynamics-based ML model architectures, such as diffusion models and energy-based models, and general experience with ML model training flows. Experience with using high-level ML model frameworks, such as PyTorch and JAX.

Bonus Points (Nice to Have)

  • Compute model technical staff may focus primarily on the above skillsets, or may be cross-disciplinary with hardware or AI/ML algorithms expertise.
  • Collaboration & Communication
  • Cross-Functional Leadership: Excellent ability to translate complex technical concepts for diverse teams. You will act as a translator, discussing algorithmic/model trade-offs with ML/AI teams and eliciting hardware constraints and features from hardware engineering teams.

Why Join Us?

  • The Mission: Tackle a fundamental problem that could redefine computing for the next 50 years.
  • The Impact: Be a foundational member of a world-class team with an outsized opportunity for ownership and impact.

The D. E. Shaw group seeks a highly motivated and entrepreneurial technical product engineer to join its newly formed private equity venture, Cove, and help build the AI-powered platform at its core. This role sits at the intersection of product strategy and technical execution, offering the opportunity to define, shape, and deliver technology solutions that will become the operational backbone of the group. As an early team member, this product engineer will play a key role in addressing the open challenge of applying AI to private equity investments and operations, with the backing of one of the most technologically sophisticated investment firms in the world.

What you'll do day-to-day

You'll be involved in all aspects of building and scaling technology products for the fund's investment activities, including: - Work closely with the investment and operations teams to surface high-impact opportunities, pressure-test ideas, and translate workflow challenges into clear product direction. - Own product design end-to-end—from how data is structured and connected to the business logic that determines how a tool actually behaves—bringing both conceptual clarity and technical precision to each iteration. - Design and build AI-native products that use LLMs to change how investment teams work, with a solid intuition for how model behavior shapes user experience and where AI can add genuine leverage. - Drive products from prototype to production, contributing code directly—especially in early stages—when tight product and business judgment matters most.

Who we're looking for
  • A bachelor’s degree or higher, an impressive record of academic and professional achievement, and at least five years of relevant experience.
  • At least two years of experience developing technology products in direct collaboration with engineering teams, including at least one year focused on workflow products that streamline business operations and processes.
  • Experience successfully taking a product from conception to completion, ideally in a startup environment; prior experience developing products for vertical-specific or industry-focused applications is a plus.
  • A solid technical foundation in full-stack product development—spanning APIs, databases, and user interfaces—with the ability to read and execute code, and proficiency in overseeing technical aspects from architecture decisions to implementation details.
  • At least one year of experience developing and integrating LLM-powered systems into production applications, with knowledge of agentic frameworks and their practical implementation; demonstrated ability to translate AI capabilities (including autonomous agents, tool use, and multi-step reasoning) into practical product features that solve real-world problems.
  • Well-developed communication skills, a collaborative and entrepreneurial mindset, and the ability to successfully manage multiple projects at once.
  • The expected annual base salary for this position is $185,000 to $250,000. Our compensation and benefits package includes variable compensation in the form of a year-end bonus, guaranteed in the first year of hire, and benefits including medical and prescription drug coverage, 401(k) contribution matching, wellness reimbursement, family building benefits, and a charitable gift match program.

Inception creates the world’s fastest, most efficient AI models. Our Mercury model is the world’s fastest reasoning LLM and first commercially available diffusion LLM, delivering 5x greater speed and efficiency than today’s LLMs, with best-in-class quality.

We are the AI researchers and engineers behind such breakthrough AI technologies as diffusion models, flash attention, and DPO. The Role We're looking for engineers and scientists to design, optimize, and maintain the compute foundations that power large-scale language model training and inference. You will develop high-performance ML kernels, enable efficient low-precision arithmetic, and improve the distributed compute stack that makes training and serving large models possible.

Key Responsibilities - Design and implement custom ML kernels (CUDA, CuTe, Triton) for core dLLM operations such as attention, matrix multiplication, gating, and normalization, optimized for modern GPU architectures. - Design compute primitives to reduce memory bandwidth bottlenecks and improve kernel efficiency. - Contribute to infrastructure stability and scalability, ensuring reproducibility, consistency across precision formats, and high utilization of compute resources.

Qualifications - BS/MS/PhD in Computer Science, Engineering, or a related field (or equivalent experience). - Proficiency in CUDA, CuTe, Triton, or other GPU programming frameworks. - Understanding of ML frameworks (PyTorch, TensorFlow) from a systems perspective. - Background in performance optimization and profiling of ML systems. - Experience implementing low-precision formats (FP8, INT8, block floating point) or contributing to related compiler stacks (XLA, TVM). - Familiarity with distributed training techniques (data parallel, model parallel, pipeline parallel). - Proficiency in Python and at least one systems programming language (C++/Rust/Go). - Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines.

Preferred Skills - Experience building and maintaining large-scale language models with tens of billions of parameters or more. - Experience with distributed systems and cloud computing platforms (AWS/GCP/Azure). - Familiarity with distributed frameworks such as PyTorch/XLA, DeepSpeed, Megatron-LM. - Prior contributions to open-source deep learning infrastructure such as PyTorch, DeepSpeed, or XLA.

Why Join Inception - Work with World-Class Talent: Collaborate with the inventors of diffusion models and leading AI researchers - Shape Foundational Technology: Your decisions will influence how the next generation of AI products are built and used - Immediate Impact: Join at the ground floor where your contributions directly shape product direction and company trajectory

Perks & Benefits - Competitive salary and equity in a rapidly growing startup - Flexible vacation and paid time off (PTO) - Health, dental, and vision insurance - Catered meals (breakfast, lunch, & dinner) - Commuter subsidies - A collaborative and inclusive culture

Location: San Francisco · On-site


ABOUT THE COMPANY

We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site

ABOUT THE ROLE

As a Researcher on our team, you'll design experiments and develop methods that drive how our autonomous research agents make decisions. You'll work across the full ML research stack (problem formulation, method design, experimentation, analysis, write-up) and you'll do it on problems that don't always have established benchmarks because we're inventing the workloads.

The work is open-ended and concrete at the same time. Open-ended because the research problems are constantly evolving and we don’t prescribe approaches. Concrete because the research questions are motivated by real-world applications. Open-ended because we don't have prescribed research directions; concrete because every experiment ties to something the agents will actually do. You'll have real autonomy (and the corresponding responsibility for choosing well).

WHAT YOU'LL DO

  • Identify research questions that, when answered, would meaningfully change what our agents are capable of
  • Design and run experiments end-to-end (from problem framing through method design, infrastructure, evaluation, and write-up)
  • Develop new methods spanning RL, LLMs, agentic systems, multi-agent coordination, search, evaluation, or wherever the problem leads
  • Work closely with engineers to take the most promising methods from research code into production
  • Read deeply across the literature; bring useful work from outside in
  • Help shape how the team picks problems

WHAT WE'RE LOOKING FOR

  • Strong track record of ML research at the frontier: RL, LLMs, agentic ML, multi-agent systems, evaluation, or adjacent
  • 5+ years of hands-on research experience in industry or academia
  • Comfortable designing experiments and running them at scale, not just proposing them
  • Strong written communication: you can summarize your research findings into actionable insights for next steps
  • Fluent in PyTorch, Jax or equivalent; comfortable working with large-scale training infrastructure
  • Bias toward shipping research rather than handing it off
  • Comfortable with ambiguity: many of our problems don't have a known right answer, and navigating that uncertainty is core to the role.
  • Published research at NeurIPS, ICML, ICLR, COLM, RLC, or comparable venues

NICE TO HAVE

  • PhD in ML, statistics, computer science, or adjacent
  • Open-source contributions to ML research infrastructure
  • Experience with agentic systems, tool use, long-horizon planning, or multi-agent coordination

THIS ROLE IS PROBABLY NOT FOR YOU IF

  • You want to focus on one specific benchmark and watch the metric tick up (our problems are broader and shift)
  • You prefer more pure research that never touches a production system
  • You'd rather work alone than share research taste openly with a small team

P.S. We’re also hosting a small private dinner during MLSys for people interested in agents, recursive self-improvement, and AI infrastructure. Apply to join us here: https://luma.com/u6yt1gri