Skip to yearly menu bar Skip to main content




MLSys 2024 Career Website

The MLSys 2024 conference is not accepting applications to post at this time.

Here we highlight career opportunities submitted by our Exhibitors, and other top industry, academic, and non-profit leaders. We would like to thank each of our exhibitors for supporting MLSys 2024. Opportunities can be sorted by job category, location, and filtered by any other field using the search box. For information on how to post an opportunity, please visit the help page, linked in the navigation bar above.

Search Opportunities

Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a big group of makers, breakers, doers and disruptors, who solve real problems and meet real customer needs. We are seeking Full Stack Software Engineers who are passionate about marrying data with emerging technologies. As a Capital One Software Engineer, you’ll have the opportunity to be on the forefront of driving a major transformation within Capital One.


Apply

As a Capital One Machine Learning Engineer (MLE), you'll be part of an Agile team dedicated to productionizing machine learning applications and systems at scale. You’ll participate in the detailed technical design, development, and implementation of machine learning applications using existing and emerging technology platforms. You’ll focus on machine learning architectural design, develop and review model and application code, and ensure high availability and performance of our machine learning applications. You'll have the opportunity to continuously learn and apply the latest innovations and best practices in machine learning engineering.


Apply

Our mission at Capital One is to create trustworthy, reliable and human-in-the-loop AI systems, changing banking for good. For years, Capital One has been leading the industry in using machine learning to create real-time, intelligent, automated customer experiences. From informing customers about unusual charges to answering their questions in real time, our applications of AI & ML are bringing humanity and simplicity to banking. Because of our investments in public cloud infrastructure and machine learning platforms, we are now uniquely positioned to harness the power of AI. We are committed to building world-class applied science and engineering teams and continue our industry leading capabilities with breakthrough product experiences and scalable, high-performance AI infrastructure. At Capital One, you will help bring the transformative power of emerging AI capabilities to reimagine how we serve our customers and businesses who have come to love the products and services we build.

We are looking for an experienced Director, AI Platforms to help us build the foundations of our enterprise AI Capabilities. In this role you will work on developing generic platform services to support applications powered by Generative AI. You will develop SDKs and APIs to build agents, information retrieval and to build models as a service for powering generative AI workflows such as optimizing LLMs via RAG.

Additionally you will manage end-to-end coordination with operations and manage creation of high quality curated datasets and productionizing of models along with working with applied research and product teams to identify and prioritize ongoing and upcoming services.


Apply

San Francisco, CA


As a Systems Research Engineer specialized in Machine Learning Systems, you will play a crucial role in researching and building the next generation AI platform at Together. Working closely with the modeling, algorithm, and engineering teams, you will design large-scale distributed training systems and a low-latency/high-throughput inference engine that serves a diverse, rapidly growing user base. Your research skills will be vital in staying up-to-date with the latest advancements in machine learning systems, ensuring that our AI infrastructure remains at the forefront of innovation.

Requirements

Strong background in machine learning systems, such as distributed learning and efficient inference for large language models and diffusion models Knowledge of ML/AI applications and models, especially foundation models such as large language models and diffusion models, how they are constructed and how they are used Knowledge of system performance profiling and optimization tools for ML systems Excellent problem-solving and analytical skills Bachelor's, Master's, or Ph.D. degree in Computer Science, Electrical Engineering, or equivalent practical experience Responsibilities

Optimize and fine-tune existing training and inference platform to achieve better performance and scalability Collaborate with cross-functional teams to integrate cutting edge research ideas into existing software systems Develop your own ideas of optimizing the training and inference platforms and push the frontier of machine learning systems research Stay up-to-date with the latest advancements in machine learning systems techniques and apply many of them to the Together platform About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.


Apply

MatX is on a mission to be the compute platform for AGI. We are developing vertically integrated full-stack solutions from silicon to systems including hardware and software to train and run the largest ML workloads for AGI. We are looking for people who are excited about systems-focused ML research.

Responsibilities include: - Train and optimize LLMs for our hardware - Run quality evaluations - Build and set up distributed infrastructure for training and inference - Advise on the hardware architecture from an ML perspective

Requirements: - Excellent software engineering skills - Experience training and tweaking neural networks, ideally LLMs - Perhaps: experience optimizing neural networks for hardware efficiency, for example regarding FLOPs, memory bandwidth, communication bandwidth, precision, parallel layout, batch sizes

Compensation: The US base salary for this full-time position is $120,000 - $400,000 + equity + benefits

As part of our dedication to the diversity of our team and our focus on creating an inviting and inclusive work experience, MatX is committed to a policy of Equal Employment Opportunity and will not discriminate against an applicant or employee on the basis of race, color, religion, creed, national origin or ancestry, sex, gender, gender identity, gender expression, sexual orientation, age, physical or mental disability, medical condition, marital/domestic partner status, military and veteran status, genetic information or any other legally recognized protected basis under federal, state or local laws, regulations or ordinances.

All candidates must be authorized to work in the United States and work from our offices in Mountain View Tuesdays-Thursdays.

This position requires access to information that is subject to U.S. export controls. This offer of employment is contingent upon the applicants capacity to perform job functions in compliance with U.S. export control laws without obtaining a license from U.S. export control authorities.


Apply

Remote

What You’ll Do

-Build out beautiful and easy to use interfaces to deliver an industry-leading ML and AI cloud -Bring the best models, tooling, workflows, etc from the AI space to our platform -Own features end-to-end, from design to deployment to monitoring

You

-Have frontend web app development experience – Minimum of 6 years building product-grade “responsive” frontend software using: Typescript React (or equivalent strong experience in Vue or Svelte) HTML and modern CSS Vite -Have backend web app development experience – Minimum of 8 years of experience implementing business critical services, from initial conception to successful launch using: -Python, Unix/Command line -Django or FastAPI -Relational database like PostgreSQL or MySQL -Have CI/CD experience – Automation around testing and deployment to create a smooth developer experience -Have reliability & observability experience – Building highly available systems and SRE work including observability, alerting, and logging -Have experience with Cloud Native Services – Strong understanding of public cloud features like Cloudflare, Okta, AWS, etc.

Nice-to-haves

-Experience in Kubernetes -IaC (Terraform, Atlantis, Crossplane, etc) -Worked with event-based or serverless technologies (AWS Lambda, Kinesis, etc) -Experience in machine learning, AI, or data science fields -Held a leadership role, with the ability to lead and mentor junior team members -Knowledge of application security and web hardening -Strong engineering background - EECS preferred, Mathematics, Software Engineering, Physics

About Lambda

-We offer generous cash & equity compensation -Investors include Gradient Ventures, Google’s AI-focused venture fund -We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability -Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG -We have a wildly talented team of 250, and growing fast -Health, dental, and vision coverage for you and your dependents -Commuter/Work from home stipends -401k Plan with 2% company match -Flexible Paid Time Off Plan that we all actually use

Salary Range Information

Based on market data and other factors, the salary range for this position is $169,000-$243,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.


Apply

Location: China - We are looking for 2024, 2025, 2026 doctoral students and outstanding master students. - Focus on deep learning / machine learning, computer vision, natural language processing, knowledge graph, speech, big data / data science, autonomous driving / robotics, and other fields. - Having abundant project experience, published papers in top conferences and journals of computer science is a plus. - Internship experience in Top AI Laboratory and Top Startup is preferred. - Having won international AI competitions and ACM competitions are preferred. You will get: Treatment - Super-class salary! - Priority in residency establishment! - Company sponsored post doc program! Growth - Get 1V1 guidance from Baidu AIfield Top 20 hotshot tutor group. - Participate in professional trainings such as leadership and business thinking training programs. - Gain priority to participate in academic Top conferences. - Participate in AIDUer academic exchanges and project sharing regularly. Platform - The best research team, the most advanced research projects, the best AI landing scences. - Massive real data, implement the research results in real business scenario.


Apply

Please visit our careers page at the link below.


Apply

San Francisco, CA


As an AI Researcher, you will be pushing the frontier of foundation model research and make them a reality in products. You will be working on developing novel architectures, system optimizations, optimization algorithms, and data-centric optimizations, that go beyond state-of-the-arts. As a team, we have been pushing on all these fronts (e.g., Hyena, FlashAttention, FlexGen, and RedPajama). You will also work closely together with the machine learning systems, NLP/CV, and engineering teams for inspiration of research problems and to jointly work on solutions to practical challenges. You will also interact with customers to help them in their journey of training, using, and improving their AI applications using open models. Your research skills will be vital in staying up-to-date with the latest advancements in machine learning, ensuring that we stay at the cutting edge of open model innovations.

Requirements

Strong background in Machine Learning Experience in building state-of-the-art models at large scale Experience in developing algorithms in areas such as optimization, model architecture, and data-centric optimizations Passion in contributing to the open model ecosystem and pushing the frontier of open models Excellent problem-solving and analytical skills Bachelor's, Master's, or Ph.D. degree in Computer Science, Electrical Engineering, or a related field Responsibilities

Develop novel architectures, system optimizations, optimization algorithms, and data-centric optimizations, that significantly improve over state-of-the-art Take advantage of the computational infrastructure of Together to create the best open models in their class Understand and improve the full lifecycle of building open models; release and publish your insights (blogs, academic papers etc.) Collaborate with cross-functional teams to deploy your models and make them available to a wider community and customer base Stay up-to-date with the latest advancements in machine learning About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.


Apply

Remote

What You'll Do

-Help scale Lambda’s high performance cloud network -Contribute to the reproducible automation of network configuration -Contribute to the design and development of software defined networks -Help manage Spine and Leaf networks -Ensure high availability of our network through monitoring, failover, and redundancy -Ensure VMs have predictable networking performance -Help with deploying and maintaining network monitoring and management tools

You

-Have led the implementation of production-scale networking projects -Experience managing BGP -Have experience with Spine and Leaf (Clos) network topology -Have experience with multi-data center networks and hybrid cloud networks -Have experience building and maintaining Software Defined Networks (SDN) -Are comfortable on the Linux command line, and have an understanding of the Linux networking stack -Have python programming experience

Nice To Have

-Experience with OpenStack -Experience with HPC networking, such as Infiniband -Experience automating network configuration within public clouds, with tools like Terraform -Experience with configuration management tools like Ansible -Experience building and maintaining multi-data center networks -Have led implementation of production-scale SDNs in a cloud context (e.g. helped implement the infrastructure that powers an AWS VPC-like feature) -Deep understanding of the Linux networking stack and its interaction with network virtualization -Understanding of the SDN ecosystem (e.g. OVS, Neutron, DPDK, Cisco ACI or Nexus Fabric Controller, Arista CVP) -Experience with Next-Generation Firewalls (NGFW)

About Lambda

-We offer generous cash & equity compensation -Investors include Gradient Ventures, Google’s AI-focused venture fund -We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability -Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG -We have a wildly talented team of 250, and growing fast -Health, dental, and vision coverage for you and your dependents -Commuter/Work from home stipends -401k Plan with 2% company match -Flexible Paid Time Off Plan that we all actually use

Salary Range Information

Based on market data and other factors, the salary range for this position is $180,000 - $230,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.


Apply

San Francisco, CA


Together AI is looking for an ML Engineer who will develop systems and APIs that enable our customers to perform inference and fine tune LLMs. Relevant experience includes implementing runtime systems that perform inference at scale using AI/ML models from simple models up to the largest LLMs.

Requirements

5+ years experience writing high-performance, well-tested, production quality code Bachelor’s degree in computer science or equivalent industry experience Demonstrated experience in building large scale, fault tolerant, distributed systems like storage, search, and computation Expert level programmer in one or more of Python, Go, Rust, or C/C++ Experience implementing runtime inference services at scale or similar Excellent understanding of low level operating systems concepts including multi-threading, memory management, networking and storage, performance and scale GPU programming, NCCL, CUDA knowledge a plus Experience with Pytorch or Tensorflow, a plus Responsibilities

Design and build the production systems that power the Together Cloud inference and fine-tuning APIs, enabling reliability and performance at scale Partner with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world Perform architecture and research work for AI workloads Analyze and improve efficiency, scalability, and stability of various system resources Conduct design and code reviews Create services, tools & developer documentation Create testing frameworks for robustness and fault-tolerance Participate in an on-call rotation to respond to critical incidents as needed About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.


Apply

Location: Palo Alto, CA - Hybrid


The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale.

SambaNova Suite™ is the first full-stack, generative AI platform, from chip to model, optimized for enterprise and government organizations. Powered by the intelligent SN40L chip, the SambaNova Suite is a fully integrated platform, delivered on-premises or in the cloud, combined with state-of-the-art open-source models that can be easily and securely fine-tuned using customer data for greater accuracy. Once adapted with customer data, customers retain model ownership in perpetuity, so they can turn generative AI into one of their most valuable assets.

Job Summary We are looking for a world-class engineering leader to guide a team of talented Machine Learning engineers and researchers driving the development & innovation of our vision technology. One must thrive in a fast-paced environment, where you'll work closely with cross-functional teams to optimize performance and drive velocity. Leveraging cutting-edge techniques, you will play a vital role in our overall success in deploying state of the art AI capabilities all around the globe.

Responsibilities - Lead and mentor a team high-performing team of machine learning engineers in a fast-paced environment, providing technical guidance, mentorship, and support to drive their professional growth and development. - Oversee the rapid development and implementation of machine learning models, leveraging advanced algorithms and techniques to optimize performance. - Collaborate closely with cross-functional teams, including product managers, software engineers, and data engineers, to deliver data-driven insights and recommendations that enhance our solutions in an agile environment. - Stay at the forefront of industry trends, emerging technologies, and best practices in machine learning, vision and MLOps. Apply this knowledge to drive innovation, meet tight deadlines, and maintain a competitive edge. - Establish and maintain strong relationships with stakeholders, providing clear communication of technical concepts and findings to both technical and non-technical audiences.

Skills & Qualifications - Master's or PhD in a quantitative field such as Data Science, Computer Science, Statistics, or a related discipline. - 10+ years of experience in Machine Learning, with a focus in vision. - 5+ years proven success in technical leadership, while delivering impactful projects across the organization. - Strong expertise in machine learning algorithms, and data analysis techniques. - Proficiency in Python, with hands-on experience using machine learning libraries and frameworks such as Pytorch, Tensorflow, or JAX. - Strong communication and collaboration skills, with the ability to effectively convey technical concepts to both technical and non-technical stakeholders in a fast-paced context. - Experience and familiarity with production ML environments, including model release, evaluation and monitoring.

Preferred Qualifications - Track record of published ML papers and/or blogs. - Track record of engagement with open-source ML community. - Experience with Vision applications in AI for Science, Oil and Gas, or medical imaging. - Experience with Vision and Multi-modal foundation models such as Stable Diffusion, ViT and CLIP. - Experience with performance optimization of ML models. - 2+ years of experience in a startup environment.


Apply

Our mission at Capital One is to create trustworthy, reliable and human-in-the-loop AI systems, changing banking for good. For years, Capital One has been leading the industry in using machine learning to create real-time, intelligent, automated customer experiences. From informing customers about unusual charges to answering their questions in real time, our applications of AI & ML are bringing humanity and simplicity to banking. Because of our investments in public cloud infrastructure and machine learning platforms, we are now uniquely positioned to harness the power of AI. We are committed to building world-class applied science and engineering teams and continue our industry leading capabilities with breakthrough product experiences and scalable, high-performance AI infrastructure. At Capital One, you will help bring the transformative power of emerging AI capabilities to reimagine how we serve our customers and businesses who have come to love the products and services we build.

We are looking for an experienced Lead Generative AI Engineer to help build and maintain APIs and SDKs to train, fine-tune and access AI models at scale. You will work as part of our Enterprise AI team and build systems that will enable our users to work with Large-Language Models (LLMs) and Foundation Models (FMs), using our public cloud infrastructure. You will work with a team of world-class AI engineers and researchers to design and implement key API products and services that enable real-time customer-facing applications.


Apply

Remote

What You’ll Do

-Remotely provision and manage large-scale HPC clusters for AI workloads (up to many thousands of nodes) -Remotely install and configure operating systems, firmware, software, and networking on HPC clusters both manually and using automation tools -Troubleshoot and resolve HPC cluster issues working closely with physical deployment teams on-site -Provide context and details to an automation team to further automate the deployment process -Provide clear and detailed requirements back to HPC design team on gaps and improvement areas, specifically in the areas of simplification, stability, and operational efficiency -Contribute to the creation and maintenance of Standard Operating Procedures -Provide regular and well-communicated updates to project leads throughout each deployment -Mentor and assist less-experienced team members -Stay up-to-date on the latest HPC/AI technologies and best practices

You

-Have 10+ years of experience in managing HPC clusters -Have 10+ years of everyday Linux experience -Have a strong understanding of HPC architecture (compute, networking, storage) -Have an innate attention to detail -Have experience with Bright Cluster Manager or similar cluster management tools -Are an expert in configuring and troubleshooting: -SFP+ fiber, InfiniBand (IB), and 100 GbE network fabrics -Ethernet, switching, power infrastructure, GPU direct, RDMA, NCCL, Horovod environments -Linux-based compute nodes, firmware updates, driver installation -SLURM, Kubernetes, or other job scheduling systems -Work well under deadlines and structured project plans -Have excellent problem-solving and troubleshooting skills -Have the flexibility to travel to our North American data centers as on-site needs arise or as part of training exercises -Are able to work both independently and as part of a team

Nice to Have

-Experience with machine learning and deep learning frameworks (PyTorch, TensorFlow) and benchmarking tools (DeepSpeed, MLPerf) -Experience with containerization technologies (Docker, Kubernetes) -Experience working with the technologies that underpin our cloud business (GPU acceleration, virtualization, and cloud computing) -Keen situational awareness in customer situations, employing diplomacy and tact -Bachelor's degree in EE, CS, Physics, Mathematics, or equivalent work experience

About Lambda

-We offer generous cash & equity compensation -Investors include Gradient Ventures, Google’s AI-focused venture fund -We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability -Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG -We have a wildly talented team of 200, and growing fast -Health, dental, and vision coverage for you and your dependents -Commuter/Work from home stipends -401k Plan with 2% company match -Flexible Paid Time Off Plan that we all actually use

Salary Range Information

Based on market data and other factors, the salary range for this position is $170,000-$230,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information,


Apply

US and Canada only

Cerebras' systems are designed with a singular focus on machine learning. Our processor is the Wafer Scale Engine (WSE), a single chip with performance equivalent to a cluster of GPUs, giving the user cluster-scale capability with the simplicity of programming a single device. Because of this programming simplicity, large model training can be scaled out using simple data parallelism to the performance of thousands of GPUs. ML practitioners can focus on their machine learning, rather than parallelizing and distributing their applications across many devices. The Cerebras hardware architecture is designed with unique capabilities including orders of magnitude higher memory bandwidth and unstructured sparsity acceleration, not accessible on traditional GPUs. With a rare combination of cutting-edge hardware and deep expertise in machine learning, we stand among the select few global organizations capable of conducting large-scale innovative deep learning research and developing novel ML algorithms not possible on traditional hardware.

About the role

Cerebras has senior and junior research scientist roles open with focus on co-design and demonstration of novel state-of-the-art ML algorithms with this unique specialized architecture. We are working on research areas including advancing and scaling foundation models for natural language processing and multi-modal applications, new weight and activation sparsity algorithms, and novel efficient training techniques. A key responsibility of our group is to ensure that state-of-the-art techniques can be applied systematically across many important applications.

As part of the Core ML team, you will have the unique opportunity to research state-of-the-art models as part of a collaborative and close-knit team. We deliver important demos of Cerebras capability as well as publish our findings as ways to support and engage with the community. A key aspect of the senior role will also be to provide active guidance and mentorship to other talented and passionate scientists and engineers.

Research Directions

Our research focuses on improving state-of-the-art foundation models in NLP, computer vision, and multi-modal settings by studying many dimensions unique to the Cerebras architecture:

  • Scaling laws to predict and analyze large-scale training improvements: accuracy/loss, architecture scaling, and hyperparameter transfer
  • Sparse and low-precision training algorithms for reduced training time and increased accuracy. For instance, weight and activation sparsity, mixture-of-experts, and low-rank adaptation
  • Optimizers, initializers, normalizers to improve training dynamics and efficiency

Responsibilities

  • Develop novel training algorithms that advance state-of-the-art in model quality and compute efficiency
  • Develop novel network architectures that address foundational challenges in language and multi-modal domains
  • Co-design ML algorithms that take advantage of existing unique Cerebras hardware advantages and collaborate with engineers to co-design next generation architectures
  • Design and run research experiments that show novel algorithms are efficient and robust
  • Analyze results to gain research insights, including training dynamics, gradient quality, and dataset preprocessing techniques
  • Publish and present research at leading machine learning conferences
  • Collaborate with engineers in co-design of the product to bring the research to customers

Requirements

  • Strong grasp of machine learning theory, fundamentals, linear algebra, and statistics
  • Experience with state-of-the-art models, such as GPT, LLaMA, DaLL-E, PaLI, or Stable Diffusion
  • Experience with machine learning frameworks, such as TensorFlow and PyTorch.
  • Strong track record of relevant research success through relevant publications at top conferences or journals (e.g. ICLR, ICML, NeurIPS), or patents and patent applications

Apply