cortex

module

v0.25.0 Latest Latest Go to latest Published: Dec 23, 2020 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/cortexlabs/cortex

Links

Open Source Insights

README ¶

Run inference at scale

Cortex is an open source platform for large-scale inference workloads.

Model serving infrastructure

Supports deploying TensorFlow, PyTorch, sklearn and other models as realtime or batch APIs.
Ensures high availability with availability zones and automated instance restarts.
Runs inference on on-demand instances or spot instances with on-demand backups.
Autoscales to handle production workloads with support for overprovisioning.

Configure a cluster

# cluster.yaml

region: us-east-1
instance_type: g4dn.xlarge
min_instances: 10
max_instances: 100
spot: true

Spin up on your AWS or GCP account

$ cortex cluster up --config cluster.yaml

￮ configuring autoscaling ✓
￮ configuring networking ✓
￮ configuring logging ✓

cortex is ready!

Reproducible deployments

Package dependencies, code, and configuration for reproducible deployments.
Configure compute, autoscaling, and networking for each API.
Integrate with your data science platform or CI/CD system.
Deploy custom Docker images or use the pre-built defaults.

Define an API

class PythonPredictor:
  def __init__(self, config):
    from transformers import pipeline

    self.model = pipeline(task="text-generation")

  def predict(self, payload):
    return self.model(payload["text"])[0]

requirements = ["tensorflow", "transformers"]

Configure an API

api_spec = {
  "name": "text-generator",
  "kind": "RealtimeAPI",
  "compute": {
    "gpu": 1,
    "mem": "8Gi"
  },
  "autoscaling": {
    "min_replicas": 1,
    "max_replicas": 10
  }
}

Scalable machine learning APIs

Scale to handle production workloads with request-based autoscaling.
Stream performance metrics and logs to any monitoring tool.
Serve many models efficiently with multi-model caching.
Use rolling updates to update APIs without downtime.
Configure traffic splitting for A/B testing.

Deploy to your cluster

import cortex

cx = cortex.client("aws")
cx.create_api(api_spec, predictor=PythonPredictor, requirements=requirements)

# creating https://example.com/text-generator

Consume your API

$ curl https://example.com/text-generator -X POST -H "Content-Type: application/json" -d '{"text": "hello world"}'

Get started

Directories ¶

Path	Synopsis
cli
cluster
cmd
local
types/cliconfig
types/flags
dev
pkg
consts
lib/archive
lib/aws
lib/cache
lib/cast
lib/configreader
lib/console
lib/cron
lib/debug
lib/docker
lib/errors
lib/exit
lib/files
lib/gcp
lib/hash
lib/json
lib/k8s
lib/maps
lib/math
lib/msgpack
lib/parallel
lib/pointer
lib/print
lib/prompt
lib/random
lib/regex
lib/sets/strset
lib/sets/strset/threadsafe
lib/slices
lib/strings
lib/table
lib/telemetry
lib/time
lib/urls
operator
operator/config
operator/endpoints
operator/operator
operator/resources
operator/resources/batchapi
operator/resources/realtimeapi
operator/resources/trafficsplitter
operator/schema
types
types/clusterconfig
types/clusterstate
types/metrics
types/spec
types/status
types/userconfig

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL