Skip to yearly menu bar Skip to main content




MLSys 2024 Career Website

The MLSys 2024 conference is not accepting applications to post at this time.

Here we highlight career opportunities submitted by our Exhibitors, and other top industry, academic, and non-profit leaders. We would like to thank each of our exhibitors for supporting MLSys 2024. Opportunities can be sorted by job category, location, and filtered by any other field using the search box. For information on how to post an opportunity, please visit the help page, linked in the navigation bar above.

Search Opportunities

San Francisco, CA


Together AI is looking for an ML Engineer who will develop systems and APIs that enable our customers to perform inference and fine tune LLMs. Relevant experience includes implementing runtime systems that perform inference at scale using AI/ML models from simple models up to the largest LLMs.

Requirements

5+ years experience writing high-performance, well-tested, production quality code Bachelor’s degree in computer science or equivalent industry experience Demonstrated experience in building large scale, fault tolerant, distributed systems like storage, search, and computation Expert level programmer in one or more of Python, Go, Rust, or C/C++ Experience implementing runtime inference services at scale or similar Excellent understanding of low level operating systems concepts including multi-threading, memory management, networking and storage, performance and scale GPU programming, NCCL, CUDA knowledge a plus Experience with Pytorch or Tensorflow, a plus Responsibilities

Design and build the production systems that power the Together Cloud inference and fine-tuning APIs, enabling reliability and performance at scale Partner with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world Perform architecture and research work for AI workloads Analyze and improve efficiency, scalability, and stability of various system resources Conduct design and code reviews Create services, tools & developer documentation Create testing frameworks for robustness and fault-tolerance Participate in an on-call rotation to respond to critical incidents as needed About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.


Apply

Seattle or Remote


OctoAI is a leading startup in the fast-paced generative AI market. Our mission is to empower businesses to build differentiated applications that delight customers with the latest generative AI features.

Our platform, OctoAI, delivers generative AI infrastructure to run, tune, and scale models that power AI applications. OctoAI makes models work for you by providing developers easy access to efficient AI infrastructure so they can run the models they choose, tune them for their specific use case, and scale from dev to production seamlessly. With the fastest foundation models on the market (including Llama-2, Stable Diffusion, and SDXL), integrated customization solutions, and world-class ML systems under the hood, developers can focus on building apps that wow their customers without becoming AI infrastructure experts.

Our team consists of experts in cloud services, infrastructure, machine learning systems, hardware, and compilers as well as an accomplished go-to-market team with diverse backgrounds. We have secured over $130M in venture capital funding and will continue to grow over the next year. We're based largely in Seattle but have a remote-first culture with people working all over the US and elsewhere in the world.

We dream big but execute with focus and believe in creativity, productivity, and a balanced life. We value diversity in all dimensions and are always looking for talented people to join our team!

Our MLSys Engineering team specializes in developing the most efficient and feature packed engines for generative model deployment. This includes feature enablement and optimization for popular media models, such as Mixtral, Llama-2, Stable Diffusion, SDXL, SVD, and SD3 and thus, requires broad understanding on a various system layers from serving API to hardware-level. We do this by building systems that innovate new techniques as well as leveraging and contributing to open source projects including TVM, MLC-LLM, vLLM, CUTLASS, and more.

We are seeking a highly skilled and experienced Machine Learning Systems Engineer to join our dynamic team. In this role, you will be responsible contributing to the latest techniques and technologies in AI and machine learning.


Apply

Please use the link below to review all opportunities at Cerebras Systems. We are actively hiring across our Machine Learning, Software, Hardware, Systems, Manufacturing, and Product organizations.

Why Join Cerebras People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

  1. Build a breakthrough AI platform beyond the constraints of the GPU
  2. Publish and open source their cutting-edge AI research
  3. Work on one of the fastest AI supercomputers in the world
  4. Enjoy job stability with startup vitality
  5. Our simple, non-corporate work culture that respects individual beliefs

Read our blog: Five Reasons to Join Cerebras in 2024.

Apply today and become part of the forefront of groundbreaking advancements in AI.

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.


Apply

San Francisco, CA


As a Systems Research Engineer specialized in GPU Programming, you will play a crucial role in developing and optimizing GPU-accelerated kernels and algorithms for ML/AI applications. Working closely with the modeling and algorithm team, you will co-design GPU kernels and model architecture to enhance the performance and efficiency of our AI systems. Collaborating with the hardware and software teams, you will contribute to the co-design of efficient GPU architectures and programming models, leveraging your expertise in GPU programming and parallel computing. Your research skills will be vital in staying up-to-date with the latest advancements in GPU programming techniques, ensuring that our AI infrastructure remains at the forefront of innovation.

Requirements

Strong background in GPU programming and parallel computing, such as CUDA and/or Triton. Knowledge of ML/AI applications and models Knowledge of performance profiling and optimization tools for GPU programming Excellent problem-solving and analytical skills Bachelor's, Master's, or Ph.D. degree in Computer Science, Electrical Engineering, or equivalent practical experiences Responsibilities

Optimize and fine-tune GPU code to achieve better performance and scalability Collaborate with cross-functional teams to integrate GPU-accelerated solutions into existing software systems Stay up-to-date with the latest advancements in GPU programming techniques and technologies About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more


Apply

Please visit our careers page at the link below.


Apply

San Francisco, CA


As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI’s rapid growth.

This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive and reliable abstraction for running AI workloads in them. You will get to be a technology thought leader, evangelize new, cutting-edge technologies, and solve complex problems.

To be successful, you’ll need to be deeply technical and possess excellent communication, collaboration, and diplomacy skills. You have experience practicing infrastructure-as-code, including using tools like Terraform and Ansible. You have strong software development fundamentals and skills. In addition, you have strong systems knowledge and troubleshooting abilities.

Requirements

5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired) Demonstrated experience with high performance or distributed cloud microservices architectures and ideally experience building them in operation at a global scale using multiple cloud providers such as AWS, Azure, or GCP Excellent understanding of low level operating systems concepts including multi-threading, memory management, networking and storage, performance, and scale Pragmatic, methodical, well-organized, detail-oriented, and self-starting Experience with Kubernetes and containerization, VPNs, AI workloads, and blockchain based protocols a plus GPU programming, NCCL, CUDA knowledge a plus Experience with Pytorch or Tensorflow a plus 5+ years experience writing high-performance, well-tested, production quality code

Responsibilities

Perform architecture and research work for decentralized AI workloads Work on the core, open-source Together AI platform Create services, tools, and developer documentation Create testing frameworks for robustness and fault-tolerance

About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.


Apply

Bay Area, California Only

Cerebras has developed a radically new chip and system to dramatically accelerate deep learning applications. Our system runs training and inference workloads orders of magnitude faster than contemporary machines, fundamentally changing the way ML researchers work and pursue AI innovation.

At Cerebras, we're proud to be among the few companies globally capable of training massive LLMs with over 100 billion parameters. We're active contributors to the open-source community, with millions of downloads of our models on Hugging Face. Our customers include national labs, global corporations across multiple industries, and top-tier healthcare systems. Recently, we announced a multi-year, multi-million-dollar partnership with Mayo Clinic, underscoring our commitment to transforming AI applications across various fields.

The Role

As the Cerebras ML Product Manager, you'll spearhead the transformation of AI across various industries by productizing critical machine learning (ML) use cases. Collaborating closely with Product leadership and ML research teams, you'll identify promising areas within the industry and research community, balancing business value and ML thought leadership. Your role involves translating abstract neural network requirements into actionable tasks for the Engineering team, establishing roadmaps, processes, success criteria, and feedback loops for product improvement. This position requires a blend of deep technical expertise in ML and deep learning concepts, familiarity with modern models, particularly in the Large Language Model (LLM) space, and a solid grasp of mathematical foundations. Ideal candidates will anticipate future trends in deep learning and understand connections across different neural network types and application domains.

Responsibilities

  • Understand deep learning use cases across industries through market analysis, research, and user studies
  • Develop and own the product roadmap for neural network architectures and ML methods on Cerebras platform
  • Collaborate with end users to define market requirements for AI models
  • Define software requirements and priorities with engineering for ML network support
  • Establish success metrics for application enablement, articulating accuracy and performance expectations
  • Support Marketing, Product Marketing, and Sales by documenting features and defining ML user needs
  • Collaborate across teams to define product go-to-market strategy and expand user community
  • Clearly communicate roadmaps, priorities, experiments, and decisions

Requirements

  • Bachelor’s or Master’s degree in computer science, electrical engineering, physics, mathematics, a related scientific/engineering discipline, or equivalent practical experience
  • 3-10+ years product management experience, working directly with engineering teams, end users (enterprise data scientists/ML researchers), and senior product/business leaders
  • Strong fundamentals in machine learning/deep learning concepts, modern models, and the mathematical foundations behind them; understanding of how to apply deep learning models to relevant real-world applications and use cases
  • Experience working with a data science/ML stack, including TensorFlow and PyTorch
  • An entrepreneurial sense of ownership of overall team and product success, and the ability to make things happen around you. A bias towards getting things done, owning the solution, and driving problems to resolution
  • Outstanding presentation skills with a strong command of verbal and written communication

Preferred

  • Experience developing machine learning applications or building tools for machine learning application developers
  • Prior research publications in the machine learning/deep learning fields demonstrating deep understanding of the space

Apply

As a Capital One Machine Learning Engineer (MLE), you'll be part of an Agile team dedicated to productionizing machine learning applications and systems at scale. You’ll participate in the detailed technical design, development, and implementation of machine learning applications using existing and emerging technology platforms. You’ll focus on machine learning architectural design, develop and review model and application code, and ensure high availability and performance of our machine learning applications. You'll have the opportunity to continuously learn and apply the latest innovations and best practices in machine learning engineering.


Apply

Location: Palo Alto, CA - Hybrid


The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale.

SambaNova Suite™ is the first full-stack, generative AI platform, from chip to model, optimized for enterprise and government organizations. Powered by the intelligent SN40L chip, the SambaNova Suite is a fully integrated platform, delivered on-premises or in the cloud, combined with state-of-the-art open-source models that can be easily and securely fine-tuned using customer data for greater accuracy. Once adapted with customer data, customers retain model ownership in perpetuity, so they can turn generative AI into one of their most valuable assets.

Working at SambaNova This role presents a unique opportunity to shape the future of AI and the value it can unlock across every aspect of an organization’s business and operations, including innovating from strategic product pathfinding to large-scale production. We are excited to have talents on board, pushing towards democratizing the modern LLM capability in real-world use cases.

Responsibilities SambaNova is hiring a Principal Engineer for the Foundation LLM team.

  • Design and implement large-scale data pipelines that feed billions of high-quality tokens into LLMs.
  • Continuously improve SambaNova’s LLM by exploring new ideas, including but not limited to new modeling techniques, prompt engineering, instruction tuning, and alignment.
  • Curate and crawl the necessary dataset to induce domain specificity.
  • Collaborate with product management and executive teams to develop a roadmap for continuous improvement of LLM and incorporate new capabilities.
  • Work closely with the product team and our customers to translate product requirements into requisite LLM capabilities.
  • Expand LLM capabilities into new languages and domains.
  • Develop applications on top of LLMs including but not limited to semantic search, summarization, conversational agents, etc.

Basic Qualifications - Bachelor's or Master's degree in engineering or science fields - 5-10 years of hands-on engineering experience in machine learning

Additional Required Qualifications - Experience with one or more deep learning frameworks like TensorFlow, PyTorch, Caffe2, or Theano - A deep theoretical or empirical understanding of deep learning - Experience building and deploying machine learning models - Strong analytical and debugging skills - Experience with either one of Large Language Models, Multilingual Models, Semantic Search, Summarization, Data Pipelines, Domain Adaptation (finance, legal, or bio-medical), and conversational agents. Experience in leading small teams. Experience in Python and/or C++.

Preferred Qualifications - Experience working in a high-growth startup - A team player who demonstrates humility - Action-oriented with a focus on speed & results - Ability to thrive in a no-boundaries culture & make an impact on innovation


Apply

San Francisco, CA


As a Systems Research Engineer specialized in Machine Learning Systems, you will play a crucial role in researching and building the next generation AI platform at Together. Working closely with the modeling, algorithm, and engineering teams, you will design large-scale distributed training systems and a low-latency/high-throughput inference engine that serves a diverse, rapidly growing user base. Your research skills will be vital in staying up-to-date with the latest advancements in machine learning systems, ensuring that our AI infrastructure remains at the forefront of innovation.

Requirements

Strong background in machine learning systems, such as distributed learning and efficient inference for large language models and diffusion models Knowledge of ML/AI applications and models, especially foundation models such as large language models and diffusion models, how they are constructed and how they are used Knowledge of system performance profiling and optimization tools for ML systems Excellent problem-solving and analytical skills Bachelor's, Master's, or Ph.D. degree in Computer Science, Electrical Engineering, or equivalent practical experience Responsibilities

Optimize and fine-tune existing training and inference platform to achieve better performance and scalability Collaborate with cross-functional teams to integrate cutting edge research ideas into existing software systems Develop your own ideas of optimizing the training and inference platforms and push the frontier of machine learning systems research Stay up-to-date with the latest advancements in machine learning systems techniques and apply many of them to the Together platform About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.


Apply

Location: Palo Alto, CA - Hybrid


The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale.

SambaNova Suite™ is the first full-stack, generative AI platform, from chip to model, optimized for enterprise and government organizations. Powered by the intelligent SN40L chip, the SambaNova Suite is a fully integrated platform, delivered on-premises or in the cloud, combined with state-of-the-art open-source models that can be easily and securely fine-tuned using customer data for greater accuracy. Once adapted with customer data, customers retain model ownership in perpetuity, so they can turn generative AI into one of their most valuable assets.

Job Summary We are looking for a world-class engineering leader to guide a team of talented Machine Learning engineers and researchers driving the development & innovation of our vision technology. One must thrive in a fast-paced environment, where you'll work closely with cross-functional teams to optimize performance and drive velocity. Leveraging cutting-edge techniques, you will play a vital role in our overall success in deploying state of the art AI capabilities all around the globe.

Responsibilities - Lead and mentor a team high-performing team of machine learning engineers in a fast-paced environment, providing technical guidance, mentorship, and support to drive their professional growth and development. - Oversee the rapid development and implementation of machine learning models, leveraging advanced algorithms and techniques to optimize performance. - Collaborate closely with cross-functional teams, including product managers, software engineers, and data engineers, to deliver data-driven insights and recommendations that enhance our solutions in an agile environment. - Stay at the forefront of industry trends, emerging technologies, and best practices in machine learning, vision and MLOps. Apply this knowledge to drive innovation, meet tight deadlines, and maintain a competitive edge. - Establish and maintain strong relationships with stakeholders, providing clear communication of technical concepts and findings to both technical and non-technical audiences.

Skills & Qualifications - Master's or PhD in a quantitative field such as Data Science, Computer Science, Statistics, or a related discipline. - 10+ years of experience in Machine Learning, with a focus in vision. - 5+ years proven success in technical leadership, while delivering impactful projects across the organization. - Strong expertise in machine learning algorithms, and data analysis techniques. - Proficiency in Python, with hands-on experience using machine learning libraries and frameworks such as Pytorch, Tensorflow, or JAX. - Strong communication and collaboration skills, with the ability to effectively convey technical concepts to both technical and non-technical stakeholders in a fast-paced context. - Experience and familiarity with production ML environments, including model release, evaluation and monitoring.

Preferred Qualifications - Track record of published ML papers and/or blogs. - Track record of engagement with open-source ML community. - Experience with Vision applications in AI for Science, Oil and Gas, or medical imaging. - Experience with Vision and Multi-modal foundation models such as Stable Diffusion, ViT and CLIP. - Experience with performance optimization of ML models. - 2+ years of experience in a startup environment.


Apply

d-Matrix has fundamentally changed the physics of memory-compute integration with our digital in-memory compute (DIMC) engine. The “holy grail” of AI compute has been to break through the memory wall to minimize data movements. We’ve achieved this with a first-of-its-kind DIMC engine. Having secured over $154M, $110M in our Series B offering, d-Matrix is poised to advance Large Language Models to scale Generative inference acceleration with our chiplets and In-Memory compute approach. We are on track to deliver our first commercial product in 2024. We are poised to meet the energy and performance demands of these Large Language Models. The company has 100+ employees across Silicon Valley, Sydney and Bengaluru.

Our pedigree comes from companies like Microsoft, Broadcom, Inphi, Intel, Texas Instruments, Lucent, MIPS and Wave Computing. Our past successes include building chips for all the cloud hyperscalers globally - Amazon, Facebook, Google, Microsoft, Alibaba, Tencent along with enterprise and mobile operators like China Mobile, Cisco, Nokia, Ciena, Reliance Jio, Verizon, AT&AT. We are recognized leaders in the mixed signal, DSP connectivity space, now applying our skills to next generation AI.  

Location:

Hybrid, working onsite at our Santa Clara, CA headquarters 3 days per week.

The role: Software Engineer, Staff - Kernels

What you will do:

The role requires you to be part of the team that helps productize the SW stack for our AI compute engine. As part of the Software team, you will be responsible for the development, enhancement, and maintenance of software kernels for next-generation AI hardware. You possess experience building software kernels for HW architectures. You possess a very strong understanding of various hardware architectures and how to map algorithms to the architecture. You understand how to map computational graphs generated by AI frameworks to the underlying architecture. You have had past experience working across all aspects of the full stack tool chain and understand the nuances of what it takes to optimize and trade-off various aspects of hardware-software co-design. You are able to build and scale software deliverables in a tight development window. You will work with a team of compiler experts to build out the compiler infrastructure working closely with other software (ML, Systems) and hardware (mixed signal, DSP, CPU) experts in the company. 

What you will bring:

Minimum:

MS or PhD in Computer Engineering, Math, Physics or related degree with 5+ years of industry experience.

Strong grasp of computer architecture, data structures, system software, and machine learning fundamentals. 

Proficient in C/C++ and Python development in Linux environment and using standard development tools. 

Experience implementing algorithms in high level languages such as C/C++, Python. 

Experience implementing algorithms for specialized hardware such as FPGAs, DSPs, GPUs, AI accelerators using libraries such as CuDA etc. 

Experience in implementing operators commonly used in ML workloads - GEMMs, Convolutions, BLAS, SIMD operators for operations like softmax, layer normalization, pooling etc.

Experience with development for embedded SIMD vector processors such as Tensilica. 

Self-motivated team player with a strong sense of ownership and leadership. 

Preferred:

Prior startup, small team or incubation experience. 

Experience with ML frameworks such as TensorFlow and.or PyTorch. 

Experience working with ML compilers and algorithms, such as MLIR, LLVM, TVM, Glow, etc.

Experience with a deep learning framework (such as PyTorch, Tensorflow) and ML models for CV, NLP, or Recommendation. 

Work experience at a cloud provider or AI compute / sub-system company.


Apply

US and Canada only

Cerebras' systems are designed with a singular focus on machine learning. Our processor is the Wafer Scale Engine (WSE), a single chip with performance equivalent to a cluster of GPUs, giving the user cluster-scale capability with the simplicity of programming a single device. Because of this programming simplicity, large model training can be scaled out using simple data parallelism to the performance of thousands of GPUs. ML practitioners can focus on their machine learning, rather than parallelizing and distributing their applications across many devices. The Cerebras hardware architecture is designed with unique capabilities including orders of magnitude higher memory bandwidth and unstructured sparsity acceleration, not accessible on traditional GPUs. With a rare combination of cutting-edge hardware and deep expertise in machine learning, we stand among the select few global organizations capable of conducting large-scale innovative deep learning research and developing novel ML algorithms not possible on traditional hardware.

About the role

Cerebras has senior and junior research scientist roles open with focus on co-design and demonstration of novel state-of-the-art ML algorithms with this unique specialized architecture. We are working on research areas including advancing and scaling foundation models for natural language processing and multi-modal applications, new weight and activation sparsity algorithms, and novel efficient training techniques. A key responsibility of our group is to ensure that state-of-the-art techniques can be applied systematically across many important applications.

As part of the Core ML team, you will have the unique opportunity to research state-of-the-art models as part of a collaborative and close-knit team. We deliver important demos of Cerebras capability as well as publish our findings as ways to support and engage with the community. A key aspect of the senior role will also be to provide active guidance and mentorship to other talented and passionate scientists and engineers.

Research Directions

Our research focuses on improving state-of-the-art foundation models in NLP, computer vision, and multi-modal settings by studying many dimensions unique to the Cerebras architecture:

  • Scaling laws to predict and analyze large-scale training improvements: accuracy/loss, architecture scaling, and hyperparameter transfer
  • Sparse and low-precision training algorithms for reduced training time and increased accuracy. For instance, weight and activation sparsity, mixture-of-experts, and low-rank adaptation
  • Optimizers, initializers, normalizers to improve training dynamics and efficiency

Responsibilities

  • Develop novel training algorithms that advance state-of-the-art in model quality and compute efficiency
  • Develop novel network architectures that address foundational challenges in language and multi-modal domains
  • Co-design ML algorithms that take advantage of existing unique Cerebras hardware advantages and collaborate with engineers to co-design next generation architectures
  • Design and run research experiments that show novel algorithms are efficient and robust
  • Analyze results to gain research insights, including training dynamics, gradient quality, and dataset preprocessing techniques
  • Publish and present research at leading machine learning conferences
  • Collaborate with engineers in co-design of the product to bring the research to customers

Requirements

  • Strong grasp of machine learning theory, fundamentals, linear algebra, and statistics
  • Experience with state-of-the-art models, such as GPT, LLaMA, DaLL-E, PaLI, or Stable Diffusion
  • Experience with machine learning frameworks, such as TensorFlow and PyTorch.
  • Strong track record of relevant research success through relevant publications at top conferences or journals (e.g. ICLR, ICML, NeurIPS), or patents and patent applications

Apply

Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a big group of makers, breakers, doers and disruptors, who solve real problems and meet real customer needs. We are seeking Full Stack Software Engineers who are passionate about marrying data with emerging technologies. As a Capital One Software Engineer, you’ll have the opportunity to be on the forefront of driving a major transformation within Capital One.


Apply

Seattle or Remote

OctoAI is a leading startup in the fast-paced generative AI market. Our mission is to empower businesses to build differentiated applications that delight customers with the latest generative AI features.

Our platform, OctoAI, delivers generative AI infrastructure to run, tune, and scale models that power AI applications. OctoAI makes models work for you by providing developers easy access to efficient AI infrastructure so they can run the models they choose, tune them for their specific use case, and scale from dev to production seamlessly. With the fastest foundation models on the market (including Llama-2, Stable Diffusion, and SDXL), integrated customization solutions, and world-class ML systems under the hood, developers can focus on building apps that wow their customers without becoming AI infrastructure experts.

Our team consists of experts in cloud services, infrastructure, machine learning systems, hardware, and compilers as well as an accomplished go-to-market team with diverse backgrounds. We have secured over $130M in venture capital funding and will continue to grow over the next year. We're based largely in Seattle but have a remote-first culture with people working all over the US and elsewhere in the world.

We dream big but execute with focus and believe in creativity, productivity, and a balanced life. We value diversity in all dimensions and are always looking for talented people to join our team!

Our Automation team specializes in developing the most efficient engine for generative model deployment. We concentrate on enhancements from detailed GPU kernel adjustments to broader system-level optimizations, including continuous batching.

We are seeking a highly skilled and experienced Machine Learning Systems Engineer with experience in CUDA Kernel optimization to join our dynamic team. In this role, you will be responsible for driving significant advancements in GPU performance optimizations and contributing to cutting-edge projects in AI and machine learning.


Apply