Skip to yearly menu bar Skip to main content




MLSys 2024 Career Website

The MLSys 2024 conference is not accepting applications to post at this time.

Here we highlight career opportunities submitted by our Exhibitors, and other top industry, academic, and non-profit leaders. We would like to thank each of our exhibitors for supporting MLSys 2024. Opportunities can be sorted by job category, location, and filtered by any other field using the search box. For information on how to post an opportunity, please visit the help page, linked in the navigation bar above.

Search Opportunities

Please use the link below to review all opportunities at Cerebras Systems. We are actively hiring across our Machine Learning, Software, Hardware, Systems, Manufacturing, and Product organizations.

Why Join Cerebras People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

  1. Build a breakthrough AI platform beyond the constraints of the GPU
  2. Publish and open source their cutting-edge AI research
  3. Work on one of the fastest AI supercomputers in the world
  4. Enjoy job stability with startup vitality
  5. Our simple, non-corporate work culture that respects individual beliefs

Read our blog: Five Reasons to Join Cerebras in 2024.

Apply today and become part of the forefront of groundbreaking advancements in AI.

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.


Apply

San Francisco, CA


As an AI Researcher, you will be building next generation open models, both large language models and computer vision models such as diffusion models, using the computation and software infrastructure at Together. You will be working closely with the data engineering team to unveil the recipe of building open models that push the frontier, and will be working with the algorithm and engineering team to make your model widely available to everyone. You will also interact with customers to help them in their journey of training, using, and improving their AI applications using open models. Your research skills will be vital in staying up-to-date with the latest advancements in NLP and Computer Vision, ensuring that we stay at the cutting edge of open model innovations.

Requirements

Strong background in Natural Language Processing or Computer Vision Experience in building state-of-the-art models at large scale Passion in contributing to the open model ecosystem and pushing the frontier of open models Excellent problem-solving and analytical skills Bachelor's, Master's, or Ph.D. degree in Computer Science, Electrical Engineering, or equivalent practical experiences Responsibilities

Taking advantage of the computational infrastructure of Together to create the best open models in their class Understanding and improving the full lifecycle of building open models; release and publish your insights (blogs, academic papers etc.) Collaborating with cross-functional teams to deploy your model and make available to a wider community and customer base Staying up-to-date with the latest advancements in NLP and Computer Vision About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more


Apply

Seattle or Remote


OctoAI is a leading startup in the fast-paced generative AI market. Our mission is to empower businesses to build differentiated applications that delight customers with the latest generative AI features.

Our platform, OctoAI, delivers generative AI infrastructure to run, tune, and scale models that power AI applications. OctoAI makes models work for you by providing developers easy access to efficient AI infrastructure so they can run the models they choose, tune them for their specific use case, and scale from dev to production seamlessly. With the fastest foundation models on the market (including Llama-2, Stable Diffusion, and SDXL), integrated customization solutions, and world-class ML systems under the hood, developers can focus on building apps that wow their customers without becoming AI infrastructure experts.

Our team consists of experts in cloud services, infrastructure, machine learning systems, hardware, and compilers as well as an accomplished go-to-market team with diverse backgrounds. We have secured over $130M in venture capital funding and will continue to grow over the next year. We're based largely in Seattle but have a remote-first culture with people working all over the US and elsewhere in the world.

We dream big but execute with focus and believe in creativity, productivity, and a balanced life. We value diversity in all dimensions and are always looking for talented people to join our team!

Our MLSys Engineering team specializes in developing the most efficient and feature packed engines for generative model deployment. This includes feature enablement and optimization for popular media models, such as Mixtral, Llama-2, Stable Diffusion, SDXL, SVD, and SD3 and thus, requires broad understanding on a various system layers from serving API to hardware-level. We do this by building systems that innovate new techniques as well as leveraging and contributing to open source projects including TVM, MLC-LLM, vLLM, CUTLASS, and more.

We are seeking a highly skilled and experienced Machine Learning Systems Engineer to join our dynamic team. In this role, you will be responsible contributing to the latest techniques and technologies in AI and machine learning.


Apply

UAE locals or any candidate willing to relocate

Cerebras has developed a radically new chip and system to dramatically accelerate deep learning applications. Our system runs training and inference workloads orders of magnitude faster than contemporary machines, fundamentally changing the way ML researchers work and pursue AI innovation.

We are innovating at every level of the stack – from chip, to microcode, to power delivery and cooling, to new algorithms and network architectures at the cutting edge of ML research. Our fully-integrated system delivers unprecedented performance because it is built from the ground up for deep learning workloads.

About the role

As an applied machine learning engineer, you will take today’s state-of-the-art solutions in various verticals and adapt them to run on the new Cerebras system architecture. You will get to see how deep learning is being applied to some of the world’s most difficult problems today and help ML researchers in these fields to innovate more rapidly and in ways that are not currently possible on other hardware systems.

Responsibilities

  • Familiar with state-of-the-art transformer architectures for language and vision model.
  • Bring up new state-of-the art model on Cerebras System and function validation.
  • Train a model to convergence, and hyper-parameter tuning.
  • Optimize model code to run efficiently on Cerebras System.
  • Explore new model architecture that take advantage of Cerebras unique capabilities.
  • Develop new approaches for solving real world AI problems on various domains.

Requirements

  • Masters or PhD in Computer Science or related field
  • Familiarity with JAX/TensorFlow/PyTorch
  • Good understanding of how to define custom layers and back-propagate through them.
  • Experience with transformer deep learning models
  • Experience in vertical such as computer vision or language modeling
  • Experience with Large Language Models such as GPT family, Llama, BLooM.

Apply

Bengaluru, Karnataka, India

Cerebras has developed a radically new chip and system to dramatically accelerate deep learning applications. Our system runs training and inference workloads orders of magnitude faster than contemporary machines, fundamentally changing the way ML researchers work and pursue AI innovation.

We are innovating at every level of the stack – from chip, to microcode, to power delivery and cooling, to new algorithms and network architectures at the cutting edge of ML research. Our fully-integrated system delivers unprecedented performance because it is built from the ground up for deep learning workloads.

About the role

The AppliedML team is seeking a senior technical leader to spearhead new initiatives on Generative AI solutions. In this role, you will lead a team of research and software engineers to plan, develop, and deliver end-to-end solutions trained on massive supercomputers. These projects may be part of our customer collaborations or open-source initiatives. These solutions will be trained on some of the largest systems and using some unique datasets we have developed in partnership with our diverse collaborators. You will plan and design experiments, execute them using Cerebras' unique workflow, and share the findings with internal stakeholders and external partners.

Responsibilities

  • Lead the technical exploration – from framing the problem statement, defining the option space, and approaching the options in a data-driven way to identify the final approach
  • Design experiments to test the different hypotheses analyze output to distill the learnings, and use them to adjust the project direction
  • Keep up with the state-of-the-art in Generative AI – efficient training recipes, model architecture, alignment, and instruction tuning, among others
  • Influence and mentor a distributed team of engineers
  • Integrate and enhance the latest research in model compression, including sparsity and quantization, to achieve super-linear scaling in model performance and accuracy
  • Breakthrough efficiency through co-designing hardware capabilities, model architecture, and training/deployment recipes

Requirements

  • MS in Computer Science, Statistics, or related fields
  • Experience with technical leadership of a moderate size team for 2+ years
  • Hands-on experience with training DL models for speech, language, vision, or a combination of them (multi-modal)
  • Experience with being the technical lead of a feature or project from conception through productization
  • Experience operating in a self-directed environment with multiple stakeholders
  • Experience working with other leaders to define strategic roadmaps
  • Proven track record of clearly articulating the findings to a broad audience with varying technical familiarity with the subject matter

Preferred

  • Ph.D. in Computer Science, Statistics, or related fields
  • Publications in top conferences such as NeurIPS, ICML, and CVPR, among others
  • Track record of building impactful features through open source or productization
  • People management experience is desired

Apply

Seattle or Remote

OctoAI is a leading startup in the fast-paced generative AI market. Our mission is to empower businesses to build differentiated applications that delight customers with the latest generative AI features.

Our platform, OctoAI, delivers generative AI infrastructure to run, tune, and scale models that power AI applications. OctoAI makes models work for you by providing developers easy access to efficient AI infrastructure so they can run the models they choose, tune them for their specific use case, and scale from dev to production seamlessly. With the fastest foundation models on the market (including Llama-2, Stable Diffusion, and SDXL), integrated customization solutions, and world-class ML systems under the hood, developers can focus on building apps that wow their customers without becoming AI infrastructure experts.

Our team consists of experts in cloud services, infrastructure, machine learning systems, hardware, and compilers as well as an accomplished go-to-market team with diverse backgrounds. We have secured over $130M in venture capital funding and will continue to grow over the next year. We're based largely in Seattle but have a remote-first culture with people working all over the US and elsewhere in the world.

We dream big but execute with focus and believe in creativity, productivity, and a balanced life. We value diversity in all dimensions and are always looking for talented people to join our team!

Our Automation team specializes in developing the most efficient engine for generative model deployment. We concentrate on enhancements from detailed GPU kernel adjustments to broader system-level optimizations, including continuous batching.

We are seeking a highly skilled and experienced Machine Learning Systems Engineer with experience in CUDA Kernel optimization to join our dynamic team. In this role, you will be responsible for driving significant advancements in GPU performance optimizations and contributing to cutting-edge projects in AI and machine learning.


Apply

Our mission at Capital One is to create trustworthy, reliable and human-in-the-loop AI systems, changing banking for good. For years, Capital One has been leading the industry in using machine learning to create real-time, intelligent, automated customer experiences. From informing customers about unusual charges to answering their questions in real time, our applications of AI & ML are bringing humanity and simplicity to banking. Because of our investments in public cloud infrastructure and machine learning platforms, we are now uniquely positioned to harness the power of AI. We are committed to building world-class applied science and engineering teams and continue our industry leading capabilities with breakthrough product experiences and scalable, high-performance AI infrastructure. At Capital One, you will help bring the transformative power of emerging AI capabilities to reimagine how we serve our customers and businesses who have come to love the products and services we build.

We are looking for an experienced Director, AI Platforms to help us build the foundations of our enterprise AI Capabilities. In this role you will work on developing generic platform services to support applications powered by Generative AI. You will develop SDKs and APIs to build agents, information retrieval and to build models as a service for powering generative AI workflows such as optimizing LLMs via RAG.

Additionally you will manage end-to-end coordination with operations and manage creation of high quality curated datasets and productionizing of models along with working with applied research and product teams to identify and prioritize ongoing and upcoming services.


Apply

Location: Palo Alto, CA - Hybrid


The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale.

SambaNova Suite™ is the first full-stack, generative AI platform, from chip to model, optimized for enterprise and government organizations. Powered by the intelligent SN40L chip, the SambaNova Suite is a fully integrated platform, delivered on-premises or in the cloud, combined with state-of-the-art open-source models that can be easily and securely fine-tuned using customer data for greater accuracy. Once adapted with customer data, customers retain model ownership in perpetuity, so they can turn generative AI into one of their most valuable assets.

The Runtime team at SambaNova is a seasoned engineering team with a proven track record of delivering cutting-edge system software solutions for AI and machine learning applications in the enterprise & commercial landscape.

Runtime is responsible for the lowest levels of the SambaNova stack, above the hardware. We handle all phases of software infrastructure to enable the higher level apps, including:

  • OS interface/integration
  • Data model manipulation for scaling
  • Networking/communication intra and inter node
  • Orchestration of partitioned workloads
  • Error monitoring and general care and feeding of the hardware.

We build a high performance, distributed and scalable software execution environment for SambaNova DataScale & SambaSuite platform(s) to support data-flow applications e.g. ML training and inference, data processing operations like ETL, and HPC applications.

We are searching for experienced software engineer candidates that will work on all parts of the Runtime infrastructure. That will include drivers, kernel modules, and userspace libraries. The candidate will participate in building, testing and deploying next-generation high-performance compute systems for AI applications at scale. We expect the candidate to have a strong background in programming, building and testing software in distributed systems, performance tuning of large scale systems, and good teamwork and planning skills.

Likely Work Responsibilities - Build and enhance infrastructure for high performance ML training and inference. - System software (drivers and kernel) support for the next generation silicon. - User-facing tools (analysis, job management, profiling etc) for Datascale systems. - Virtualization, for isolation and ease of use in multi-tenant environments. - Collaborate with other software teams; ML, Compiler, DevOps.

Preferred Skills & Qualifications - Experience with operating system, kernel space driver and user space library - Experience with communication fabrics, such as RDMA, PCIe, Infiniband and RoCE - Experience with software bringup for custom hardware - Good communication skills and enthusiasm to help colleagues


Apply

Location: Palo Alto, CA - Hybrid


The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale.

SambaNova Suite™ is the first full-stack, generative AI platform, from chip to model, optimized for enterprise and government organizations. Powered by the intelligent SN40L chip, the SambaNova Suite is a fully integrated platform, delivered on-premises or in the cloud, combined with state-of-the-art open-source models that can be easily and securely fine-tuned using customer data for greater accuracy. Once adapted with customer data, customers retain model ownership in perpetuity, so they can turn generative AI into one of their most valuable assets.

Working at SambaNova This role presents a unique opportunity to shape the future of AI and the value it can unlock across every aspect of an organization’s business and operations, including innovating from strategic product pathfinding to large-scale production. We are excited to have talents on board, pushing towards democratizing the modern LLM capability in real-world use cases.

Responsibilities SambaNova is hiring a Principal Engineer for the Foundation LLM team.

  • Design and implement large-scale data pipelines that feed billions of high-quality tokens into LLMs.
  • Continuously improve SambaNova’s LLM by exploring new ideas, including but not limited to new modeling techniques, prompt engineering, instruction tuning, and alignment.
  • Curate and crawl the necessary dataset to induce domain specificity.
  • Collaborate with product management and executive teams to develop a roadmap for continuous improvement of LLM and incorporate new capabilities.
  • Work closely with the product team and our customers to translate product requirements into requisite LLM capabilities.
  • Expand LLM capabilities into new languages and domains.
  • Develop applications on top of LLMs including but not limited to semantic search, summarization, conversational agents, etc.

Basic Qualifications - Bachelor's or Master's degree in engineering or science fields - 5-10 years of hands-on engineering experience in machine learning

Additional Required Qualifications - Experience with one or more deep learning frameworks like TensorFlow, PyTorch, Caffe2, or Theano - A deep theoretical or empirical understanding of deep learning - Experience building and deploying machine learning models - Strong analytical and debugging skills - Experience with either one of Large Language Models, Multilingual Models, Semantic Search, Summarization, Data Pipelines, Domain Adaptation (finance, legal, or bio-medical), and conversational agents. Experience in leading small teams. Experience in Python and/or C++.

Preferred Qualifications - Experience working in a high-growth startup - A team player who demonstrates humility - Action-oriented with a focus on speed & results - Ability to thrive in a no-boundaries culture & make an impact on innovation


Apply

d-Matrix has fundamentally changed the physics of memory-compute integration with our digital in-memory compute (DIMC) engine. The “holy grail” of AI compute has been to break through the memory wall to minimize data movements. We’ve achieved this with a first-of-its-kind DIMC engine. Having secured over $154M, $110M in our Series B offering, d-Matrix is poised to advance Large Language Models to scale Generative inference acceleration with our chiplets and In-Memory compute approach. We are on track to deliver our first commercial product in 2024. We are poised to meet the energy and performance demands of these Large Language Models. The company has 100+ employees across Silicon Valley, Sydney and Bengaluru.

Our pedigree comes from companies like Microsoft, Broadcom, Inphi, Intel, Texas Instruments, Lucent, MIPS and Wave Computing. Our past successes include building chips for all the cloud hyperscalers globally - Amazon, Facebook, Google, Microsoft, Alibaba, Tencent along with enterprise and mobile operators like China Mobile, Cisco, Nokia, Ciena, Reliance Jio, Verizon, AT&AT. We are recognized leaders in the mixed signal, DSP connectivity space, now applying our skills to next generation AI.  

Location:

Hybrid, working onsite at our Santa Clara, CA headquarters or San Diego, CA Location 3 days per week.

What You Will Do:

The Machine Learning Team is responsible for the R&D of core algorithm-hardware co-design capabilities in d-Matrix's end-to-end solution.  You will be joining a team of exceptional people enthusiastic about researching and developing state-of-the-art efficient deep learning techniques tailored for d-Matrix's AI compute engine.  You will also have the opportunity of collaboration with top academic labs and help customers to optimize and deploy workloads for real-world AI applications on our systems. 

• Design, implement and evaluate efficient deep neural network architectures and algorithms for d-Matrix's AI compute engine.

• Engage and collaborate with internal and external ML researchers to meet R&D goals. 

• Engage and collaborate with SW team to meet stack development milestones. 

• Conduct research to guide hardware design. 

• Develop and maintain tools for high-level simulation and research. 

• Port customer workloads, optimize them for deployment, generate reference implementations and evaluate performance. 

• Report and present progress timely and effectively. 

• Contribute to publications of papers and intellectual properties. 

What You Will Bring:

• Master's degree in Computer Science, Electrical and Computer Engineering, or a related technical discipline with 3+ years of industry experience, PhD preferred with 1+ year of industry experience. 

• High proficiency with major deep learning frameworks: PyTorch is a must. 

• High proficiency in algorithm analysis, data structure, and Python programming is a must. 

Desired:

• Proficiency with C/C++ programming is preferred. 

• Proficiency with GPU CUDA programming is preferred.

• Deep, wide and current knowledge in machine learning and modern deep learning is preferred

• Experience in real-world data science projects in an industry setting is preferred. 

• Experience with efficient deep learning is preferred: quantization, sparsity, distillation. 

• Experience with specialized HW accelerator systems for deep neural network is preferred. 

• Passionate about AI and thriving in a fast-paced and dynamic startup culture.


Apply

Please use this link to explore open positions across all departments at Lambda.

We encourage you to share this link with anyone you know who is also searching!


Apply

d-Matrix has fundamentally changed the physics of memory-compute integration with our digital in-memory compute (DIMC) engine. The “holy grail” of AI compute has been to break through the memory wall to minimize data movements. We’ve achieved this with a first-of-its-kind DIMC engine. Having secured over $154M, $110M in our Series B offering, d-Matrix is poised to advance Large Language Models to scale Generative inference acceleration with our chiplets and In-Memory compute approach. We are on track to deliver our first commercial product in 2024. We are poised to meet the energy and performance demands of these Large Language Models. The company has 100+ employees across Silicon Valley, Sydney and Bengaluru.

Our pedigree comes from companies like Microsoft, Broadcom, Inphi, Intel, Texas Instruments, Lucent, MIPS and Wave Computing. Our past successes include building chips for all the cloud hyperscalers globally - Amazon, Facebook, Google, Microsoft, Alibaba, Tencent along with enterprise and mobile operators like China Mobile, Cisco, Nokia, Ciena, Reliance Jio, Verizon, AT&AT. We are recognized leaders in the mixed signal, DSP connectivity space, now applying our skills to next generation AI.  

Location:

Hybrid, working onsite at our Santa Clara, CA headquarters 3 days per week.

The role: Software Engineer, Staff - Kernels

What you will do:

The role requires you to be part of the team that helps productize the SW stack for our AI compute engine. As part of the Software team, you will be responsible for the development, enhancement, and maintenance of software kernels for next-generation AI hardware. You possess experience building software kernels for HW architectures. You possess a very strong understanding of various hardware architectures and how to map algorithms to the architecture. You understand how to map computational graphs generated by AI frameworks to the underlying architecture. You have had past experience working across all aspects of the full stack tool chain and understand the nuances of what it takes to optimize and trade-off various aspects of hardware-software co-design. You are able to build and scale software deliverables in a tight development window. You will work with a team of compiler experts to build out the compiler infrastructure working closely with other software (ML, Systems) and hardware (mixed signal, DSP, CPU) experts in the company. 

What you will bring:

Minimum:

MS or PhD in Computer Engineering, Math, Physics or related degree with 5+ years of industry experience.

Strong grasp of computer architecture, data structures, system software, and machine learning fundamentals. 

Proficient in C/C++ and Python development in Linux environment and using standard development tools. 

Experience implementing algorithms in high level languages such as C/C++, Python. 

Experience implementing algorithms for specialized hardware such as FPGAs, DSPs, GPUs, AI accelerators using libraries such as CuDA etc. 

Experience in implementing operators commonly used in ML workloads - GEMMs, Convolutions, BLAS, SIMD operators for operations like softmax, layer normalization, pooling etc.

Experience with development for embedded SIMD vector processors such as Tensilica. 

Self-motivated team player with a strong sense of ownership and leadership. 

Preferred:

Prior startup, small team or incubation experience. 

Experience with ML frameworks such as TensorFlow and.or PyTorch. 

Experience working with ML compilers and algorithms, such as MLIR, LLVM, TVM, Glow, etc.

Experience with a deep learning framework (such as PyTorch, Tensorflow) and ML models for CV, NLP, or Recommendation. 

Work experience at a cloud provider or AI compute / sub-system company.


Apply

San Francisco, CA


As a Systems Research Engineer specialized in Machine Learning Systems, you will play a crucial role in researching and building the next generation AI platform at Together. Working closely with the modeling, algorithm, and engineering teams, you will design large-scale distributed training systems and a low-latency/high-throughput inference engine that serves a diverse, rapidly growing user base. Your research skills will be vital in staying up-to-date with the latest advancements in machine learning systems, ensuring that our AI infrastructure remains at the forefront of innovation.

Requirements

Strong background in machine learning systems, such as distributed learning and efficient inference for large language models and diffusion models Knowledge of ML/AI applications and models, especially foundation models such as large language models and diffusion models, how they are constructed and how they are used Knowledge of system performance profiling and optimization tools for ML systems Excellent problem-solving and analytical skills Bachelor's, Master's, or Ph.D. degree in Computer Science, Electrical Engineering, or equivalent practical experience Responsibilities

Optimize and fine-tune existing training and inference platform to achieve better performance and scalability Collaborate with cross-functional teams to integrate cutting edge research ideas into existing software systems Develop your own ideas of optimizing the training and inference platforms and push the frontier of machine learning systems research Stay up-to-date with the latest advancements in machine learning systems techniques and apply many of them to the Together platform About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.


Apply

San Francisco, CA


Together AI is looking for an AI Engineer who will develop systems and APIs that enable our customers to perform inference and fine tune LLMs and integrate those APIs into third-party AI toolchains such as Langchain. Relevant experience includes building developer tools used and loved by developers around the world.

Requirements

5+ years experience writing large-scale AI developer tools or similar Bachelor’s degree in computer science or equivalent industry experience Expert level programmer in one or more of Python, Go, Rust, or C/C++ Experience integrating with AI inference and fine-tuning APIs or similar GPU programming, NCCL, CUDA knowledge a plus Experience with Pytorch or Tensorflow, a plus Responsibilities

Design and build the production systems that power the Together Cloud inference and fine-tuning APIs, enabling reliability and performance at scale Integrate Together Cloud inference and fine-tuning APIs with third party AI toolchains such as Langchain Partner with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world Perform architecture and research work for AI workloads Analyze and improve efficiency, scalability, and stability of various system resources Conduct design and code reviews Create services, tools & developer documentation Create testing frameworks for robustness and fault-tolerance Participate in an on-call rotation to respond to critical incidents as needed About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.


Apply

San Francisco, CA


As an AI Researcher, you will be pushing the frontier of foundation model research and make them a reality in products. You will be working on developing novel architectures, system optimizations, optimization algorithms, and data-centric optimizations, that go beyond state-of-the-arts. As a team, we have been pushing on all these fronts (e.g., Hyena, FlashAttention, FlexGen, and RedPajama). You will also work closely together with the machine learning systems, NLP/CV, and engineering teams for inspiration of research problems and to jointly work on solutions to practical challenges. You will also interact with customers to help them in their journey of training, using, and improving their AI applications using open models. Your research skills will be vital in staying up-to-date with the latest advancements in machine learning, ensuring that we stay at the cutting edge of open model innovations.

Requirements

Strong background in Machine Learning Experience in building state-of-the-art models at large scale Experience in developing algorithms in areas such as optimization, model architecture, and data-centric optimizations Passion in contributing to the open model ecosystem and pushing the frontier of open models Excellent problem-solving and analytical skills Bachelor's, Master's, or Ph.D. degree in Computer Science, Electrical Engineering, or a related field Responsibilities

Develop novel architectures, system optimizations, optimization algorithms, and data-centric optimizations, that significantly improve over state-of-the-art Take advantage of the computational infrastructure of Together to create the best open models in their class Understand and improve the full lifecycle of building open models; release and publish your insights (blogs, academic papers etc.) Collaborate with cross-functional teams to deploy your models and make them available to a wider community and customer base Stay up-to-date with the latest advancements in machine learning About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.

Compensation

We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.


Apply