Only the best AI and Machine Learning jobs
Browse thousands of jobs in AI and Machine Learning

Stability.ai is hiring a Machine Learning Ops

About Stability: 

Stability.ai is a community and mission driven, open-source artificial intelligence company that cares deeply about real-world implications and applications. Our most considerable advances grow from our diversity in working across multiple teams and disciplines. We are unafraid to go against established norms and explore creativity. We are motivated to generate breakthrough ideas and convert them into tangible solutions. Our vibrant communities consist of experts, leaders and partners across the globe who are developing cutting-edge open AI models for Image, Language, Audio, Video, 3D and Biology.

About the role: 

We are looking for a Machine Learning Operations Engineer who develop, deploy and maintain systems for advanced machine learning that include data ingestion and transformation, labeling, experimentation, distributed training, deployment, monitoring and management. You will help build out, modify, upgrade and maintain our end to end machine learning platforms, and enforce isolation and security between customer environments. You’ll work closely with service partners and the wider team to ensure that models move in a maintainable, repeatable way from research to production.  

Responsibilities:  

  • Implement training and inference pipelines in collaboration with a wide range of stakeholders and cross-functional teams. 
  • Productionalize deep learning models while ensuring that business SLAs, including security requirements, are adhered to
  • Build tooling and pipelining abstractions to allow other team members to focus on experimentation while empowering self-service workflows to deploy and serve models reliably and consistently.
  • Build, deploy, modify, and upgrade end-to-end MLOps platforms that cover all aspects of advanced ingestion, labeling, training, deployment and management of models
  • Assist the data team with interfaces between the data platform and MLOPs platforms 
  • Provide infrastructure and tooling to make building and training models faster, easier, and more repeatable

 

Qualifications: 

  • 3+ years experience in DevOps, IT and/or MLOps
  • Strong programming knowledge in Python and/or Go
  • Strong experience with GPUs for machine learning or other high performance compute
  • Hands-on experience with ML frameworks, tools and libraries
  • Well-versed in data structures, data modeling, and database management systems as well as object and file storage systems.
  • Experience with defining infrastructure as code
  • Experience with model validation, model training, and other aspects of evaluating an ML system 
  • Experience with continuous integration and continuous deployment tooling
  • Experience in building traditional code tests like unit and integration testing
  • Experience with Git, Containers, networking and deployment and automation


Apply Now

Mention aiml.careers when you apply so they know you're a genuine candidate.

Remote

Stability.ai - AI/ML Jobs

Stability.ai

Location: Remote

Apply Now

Mention aiml.careers when you apply so they know you're a genuine candidate.