Flip Logo

Flip

Senior Machine Learning Engineer - Machine Learning Infrastructure

Posted 18 Days Ago
Remote
Mid level
Remote
Mid level
Design and optimize machine learning infrastructure for deployment, monitoring, and maintenance. Collaborate with teams to enhance performance and ensure reliability in production environments.
The summary above was generated by AI

Senior Machine Learning Engineer - Machine Learning Infrastructure

Location: based in NYC or US remote


About Flip.shop:

Welcome to Flip.shop, where innovation meets the social commerce revolution! Fresh off our Series C funding round, we've raised $144 million, propelling our valuation to an impressive $1.05 billion. We’re redefining the shopping experience by giving consumers a voice in a space dominated by tech giants. Join us on this exhilarating journey where your technical skills will play a pivotal role in shaping the future of social commerce!


Why Join Us?

At Flip.shop, you’ll be at the forefront of innovation in social commerce. This isn’t just a job—it’s a chance to build infrastructure that empowers our AI-driven platform to scale and deliver personalized shopping experiences. You will have the opportunity to directly partner, work with and learn from the very best engineers and scientists who joined us from some of the leading big-tech companies! 

If you thrive in a fast-paced, collaborative environment where you can develop high-performance systems, we want to hear from you!


Role Overview:

We are seeking a Senior Machine Learning Engineer - Machine Learning Infrastructure to design, build, and optimize the infrastructure that powers our machine learning systems. You’ll ensure the efficient deployment, scaling, and monitoring of machine learning models, and will help streamline the development lifecycle. This role offers the opportunity to create scalable, production-level systems that support real-time recommendations and drive business growth.

Responsibilities:

  • Infrastructure Development: Design and implement scalable infrastructure for deploying, monitoring, and maintaining machine learning models in production environments. Design and implement machine learning systems for feeds, ads, and search ranking models.
  • Training Infrastructure: Optimize the serving and training infrastructure of machine learning models.
  • Model Training: Enhance the workflow for model training and serving, data pipelines, storage systems, and resource management within multi-tenancy machine learning systems.
  • Tooling & Automation: Build tools to automate workflows for model training, testing, and deployment, ensuring that machine learning models can move quickly from development to production.
  • Performance Optimization: Ensure the infrastructure supports high-performance model inference at scale, with a focus on minimizing latency and maximizing throughput.
  • Collaboration: Work closely with data scientists, machine learning engineers, and DevOps teams to create seamless integration between development and production environments.
  • Monitoring & Maintenance: Build robust monitoring systems to track model performance and infrastructure health, ensuring reliability and uptime of machine learning services.
  • Security & Compliance: Implement best practices in infrastructure security, data privacy, and compliance, particularly when handling sensitive user data.

Requirements:

  • Education: Bachelor's degree or higher in Computer Science or a related field, with 3+ years of experience in building scalable systems.
  • Technical Skills: Proficiency in one or two programming languages (C/C++, Golang) within a Linux environment.
  • Solid understanding of GPU hardware architecture, GPU software stack (CUDA, cuDNN), and experience in GPU performance analysis.
  • Experience in deep model inference/training, debugging, and tuning.
  • ML Workflow Knowledge: Familiarity with mainstream machine learning frameworks (e.g., TensorFlow, PyTorch, MxNet).
  • Familiarity with MLOps practices.
  • Experience with big data frameworks (e.g., Spark, Hadoop, Flink) and resource management and task scheduling for large-scale distributed systems.
  • Open-source: Experience in using or designing open-source machine learning lifecycle management systems like TFX.

Key Skills

  • Excellent logical analysis and problem-solving skills with the ability to abstract and decompose complex business logic.
  • Strong sense of responsibility, good learning ability, communication skills, and self-motivation, with the ability to respond and act quickly.
  • Good working document habits, with timely writing and updating of workflow and technical documentation.

Why You’ll Love Working Here:

At Flip.shop, you’ll have the opportunity to build the backbone of our AI-driven platform, working on cutting-edge infrastructure that powers personalized shopping experiences for millions of users. Your work will directly contribute to scaling our machine learning systems, ensuring they run efficiently in a high-performance production environment. This is your chance to have a lasting impact and help Flip.shop shape the future of social commerce.


Ready to Build the Future?

If you're passionate about building scalable infrastructure and driving innovation in machine learning at scale, join us at Flip.shop! Let’s redefine the future of online shopping together.


Compensation & Benefits:

Base salary and total compensation will vary based on factors including but not limited to location, experience, and performance. Please note the base salary is just one component of the company’s total rewards package for exempt employees. Other rewards may include equity, bonuses, long term incentives, a PTO policy, and other progressive benefits.

Top Skills

C/C++
Cuda
Cudnn
Flink
Go
Hadoop
Linux
Mxnet
PyTorch
Spark
TensorFlow

Similar Jobs

8 Days Ago
Remote
2 Locations
150K-220K Annually
Senior level
150K-220K Annually
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software
Design and maintain distributed systems infrastructure, including GPU clusters and network architectures. Implement monitoring and optimize storage solutions for AI/ML workloads.
Top Skills: Caching SolutionsContainer OrchestrationDistributed File SystemsGpu InfrastructureKubernetesNetworkingSlurm
3 Days Ago
Remote
United States
191K-223K Annually
Senior level
191K-223K Annually
Senior level
Real Estate • Travel • PropTech
As a Senior Data Engineer, you'll build robust data pipelines and optimize data models for AI/ML uses, ensuring data quality and compliance while collaborating with other teams for scalable infrastructure.
Top Skills: AirflowDockerJavaKubernetesMlflowPythonPyTorchScalaSparksqlTensorFlow
20 Days Ago
Remote
United States
200K-240K Annually
Senior level
200K-240K Annually
Senior level
Artificial Intelligence • Productivity • Robotics • Industrial • Generative AI • Automation • Manufacturing
Design and implement scalable infrastructure for AI deployment, focusing on distributed systems and optimizing cloud resources and workflows.
Top Skills: AWSAzureDockerGCPKubernetesNode.jsPythonReactTypescript

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

  • Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
  • Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
  • Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
  • Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account