Hey Aaron, I work on Cortex which is a tool for continuously deploying models as HTTP endpoints on AWS. Under the hood we use Kubernetes instead of Lambda to avoid cold starts, enable more flexibility with customizing compute and memory usage (e.g. running inference on GPUs), and support spot instances. Could you clarify your comment regarding editing of config files? Is it still a problem if the configuration is declarative and tracked in git? I'd love to hear your feedback! (GitHub: https://github.com/cortexlabs/cortex | website: https://cortex.dev/)
Sure, I'm thinking about the development lifecycle in terms of what actions data scientists have to take to get a model deployed. Anytime the process has a branch (ie: you need to change this file whenever something elsewhere changes) then I know I'm going to forget to do that.
If we were to use Cortex, we would likely wrap the creation of cortex.yml in a function and call it when we're saving our models. We do something similar right now and store the meta in json files for later deployment. I love tracking config in git too.
That makes sense. Programmatically updating cortex.yaml is a common use case especially when you're thinking about continuous deployment. We also have a Python client which can replace the cortex.yaml file (https://www.cortex.dev/deployments/python-client).
From the MLflow Models docs: "An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools—for example, real-time serving through a REST API or batch inference on Apache Spark. The format defines a convention that lets you save a model in different “flavors” that can be understood by different downstream tools."
Cortex is what they are referring to as a downstream tool for real-time serving through a REST API. In other words, MLflow helps with model management and packaging, whereas Cortex is a platform for running real-time inference at scale. We are working on supporting more model packaging formats and I think it's a good idea to support the MLflow format as well.
Each model is loaded into a Docker container, along with any Python packages and request handling code. The cluster runs on EKS on your AWS account. Cortex takes the declarative configuration from 'cortex.yaml' and creates it every time you run 'cortex deploy' so the containers don’t change unless you run 'cortex deploy' again with updated configuration. This post goes into more detail about some of our design decisions: https://towardsdatascience.com/inference-at-scale-49bc222b3a...
Contributor here - Cortex supports Tensorflow saved models in addition to ONNX. PyTorch support is on the roadmap. Do you have specific frameworks in mind that you would like Cortex to support?
My understanding is that Seldon and Kubeflow are more geared towards infrastructure engineers. Our goal is to hide the infrastructure tooling so that Kuberentes, Docker, or AWS expertise isn’t required. Cortex installs with one command, models are deployed with minimal declarative configuration, autoscaling works by default, and you don’t need to build Docker images / manage a registry.
Thanks for the feedback! We aren't trying to invent another infrastructure provisioning language, and I agree that Terraform would be the right choice if that was the case. Our YAML is more similar to the configuration of deployment tools like Netlify or CircleCI. We use CloudFormation and Kubernetes under the hood but our goal is to provide a much higher abstraction for data scientists / ML engineers.
Not entirely. The abstractions are different between infrastructure deployment.... versus configuration yml of circleci.
The declaration of deployment state is a very BIG and hard problem that has had millions of collective man hours spent over decades. I urge you not to think of it as a simple configuration.
In fact it is so hard that AWS has to build a new language on top of typescript ..versus cloudformation templates that it already had.
The Terraform provider idea is interesting, I'll think about it more carefully. Almost all of our deployment configuration under the hood is done with Kubernetes (which is focused on the declaration of deployment state). We modeled our configuration after Kubernetes for that reason, and we want to go beyond low-level infrastructure configuration by allowing users to configure prediction tracking, model retraining thresholds, and other more ML specific features using the same declarative paradigm and in the same configuration files.
Well, CDK actually produces CloudFormation templates. Sorry, but I always feel the urge to jump in when people claim Terraform should be used instead of CloudFormation because of personal preferences. If you are AWS native and already using CloudFormation, I see no reason to switch. CloudFormation provides a ton of functionality out of the box and Amazon handles it for you. Rollbacks alone are a huge reason one might want to use it over Terraform.