A faster way to ship your models to production

pip install bentoml

BentoML is compatible across machine learning frameworks and standardizes ML model packaging and management for your team.

Tutorial

BentoML로 online model serving을 하는 방법을 설명

scikit-learn
Iris dataset

Bento는 inference requests, 배포를 위한 docker container image를 HTTP or gRPC를 사용해서 쉽게 서빙이 가능

Setup for the tutorial

Google Colab 에서도 가능
아래 Docker로 부터 tutorial notebook 실행도 가능

  
docker run -it --rm -p 8888:8888 -p 3000:3000 -p 3001:3001 bentoml/quickstart:latest

Local 개발 환경에서도 가능
- bentoml/examples를 다운 받아서

Saving Models with BentoML

  
import bentoml

from sklearn import svm
from sklearn import datasets

# Load training data set
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Train the model
clf = svm.SVC(gamma='scale')
clf.fit(X, y)

# Save model to the BentoML local model store
saved_model = bentoml.sklearn.save_model("iris_clf", clf)
print(f"Model saved: {saved_model}")

# Model saved: Model(tag="iris_clf:zy3dfgxzqkjrlgxi")

  
model = bentoml.sklearn.load_model("iris_clf:2uo5fkgxj27exuqj")

# Alternatively, use `latest` to find the newest version
model = bentoml.sklearn.load_model("iris_clf:latest")

Framework 별로 가이드를 확인
pretrained 모델이 이미 있다면 Preparing Models를 참고
저장된 모델을 관리하기 위해서 Managing Models을 참고

Creating a Service

Services는 BentoML의 Core components
$ bentoml serve service:svc --reload로 아래 코드를 실행

  
 # service.py
 import numpy as np
 import bentoml
 from bentoml.io import NumpyNdarray

 iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()

 svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

 @svc.api(input=NumpyNdarray(), output=NumpyNdarray())
 def classify(input_series: np.ndarray) -> np.ndarray:
     result = iris_clf_runner.predict.run(input_series)
     return result

요청은 아래 코드로

  
import requests

requests.post(
   "http://127.0.0.1:3000/classify",
   headers={"content-type": "application/json"},
   data="[[5.9, 3, 5.1, 1.8]]",
).text

Using Models in a Service

bentoml.sklearn.get은 모델 저장소에 저장된 모델에 대한 refernce를 생성하고, to_runner는 모델에서 Runner 인스턴스를 생성
Runner abstraction은 inference computation을 schedule을 어떻게 하는지, 추론 호출을 동적으로 일괄 처리하는 방법 및 사용 가능한 모든 하드웨어 리소를 더 잘 활용하는 방법 측면에서 BentoServer에 더 많은 유연성을 제공

  
import bentoml

iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
iris_clf_runner.init_local()
iris_clf_runner.predict.run([[5.9, 3., 5.1, 1.8]])

Custom Runners 사용하기 위해서 Runners, Adaptive Batching을 참고

Service API and IO Descriptor

svc.api decorator를 bentoml.Service의 object’s APIs list에 추가
input, output 파라미터는 IO Descriptor object, API 함수의 expected input/output types을 지정하고 HTTP endpoints를 생성
IO Descriptor object는 schema 또는 shape을 사용하여 입출력 유효성 검사가 가능
API function 내에서는 모든 비즈니스 로직, feature fetching, and feature transformation code를 정의할 수 잇음
Model inference 호출은 runner objects를 통해 직접 이루어지며, service object를 생성할때 `bentoml.Service(name=.., runners=[..]) 를 호출에 전달
BentoML은 sync and async endpoints를 제공
- performance sensitive use cases에서는 async API를 대신 고려할 수 있음
  - IO intense workloads가 있는 작업
  - composing multiple models

  
# async로 작성한
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
async def classify(input_series: np.ndarray) -> np.ndarray:
   result = await iris_clf_runner.predict.async_run(input_series)
   return result

Building a Bento 🍱

서비스 정의가 완료되면 모델과 서비스를 bento로 빌드
Bento는 서비스를 위한 distribution format.
서비스를 실행하는 데 필요한 모든 소스코드, 모델 파일 및 종속성을 포함하는 독립형 archive
To build a Bento
- bentofile.yaml

  
service: "service:svc"  # Same as the argument passed to `bentoml serve`
labels:
   owner: bentoml-team
   stage: dev
include:
- "*.py"  # A pattern for matching which files to include in the bento
python:
   packages:  # Additional pip packages required by the service
   - scikit-learn
   - pandas

production에서 serving을 위한 준비가 끝났다. bnetoml server CLI command를 사용해서 가능

bentoml serve iris_classifier:latest --production

Bento
- BentoML에서 deployment 단위
- 모델 배포 워크플로우에서 추적해야 하는 가장 중요한 artifacts
BentoML은 Bentos를 managing하고 moving 하기 위한 CLI 명령과 API를 제공

Generate Docker Image

bentoml containerize CLI를 통해 production deployment를 위한 Bento로 부터 Docker image를 생성
Docker image TAG는 bento TAG와 동일하게 기본값으로 설정

  
bentoml containerize iris_classifier:latest
bentoml containerize --platform=linux/amd64 iris_classifier:latest # Applie Silicon

BentoServer를 실행

  
 docker run -it --rm -p 3000:3000 iris_classifier:6otbsmxzq6lwbgxi serve --production

Deploying Bentos

BentoML 저장된 모델 포맷, 서비스 API 정의 및 Bento build process를 표준화하여 ML 팀을 위한 다양한 배포 옵션을 제공
health check and prometheus metrics
distributed tracing
logging
Integrations
- Airflow, Apache Flink, Arize AI, MLflow, Spark
Deploying Bento
- Bento 컨테이너 생성하고 커스텀하게 배포
- Yatai
- Bentoct

Reference

bentoML

BentoML