Quality Control for Machine Learning Observability and Monitoring
What is it?¶
Scouter
is a developer-first monitoring toolkit for machine learning workflows (data, models, genai workflows and more). It is designed to be easy to use, flexible, performant, and extensible, allowing you to customize it to fit your specific needs. It's built on top of the Rust
programming language and uses Postgres
as its primary data store.
Check our current roadmap and tasks¶
Developer-First Experience¶
- Zero-friction Integration - Drop into existing ML workflows in minutes
- Type-safe by Design - Rust in the back, Python in the front*. Catch errors before they hit production
- Dependency Overhead - One dependency for monitoring. No need to install multiple libraries
- Standardized Patterns - Out of the box and easy to use patterns for common monitoring tasks
- Integrations - Works out of the box with any python api framework. Integrations for event-driven workflows (
Kafka
andRabbitMQ
)
Production Ready¶
- High-Performance Server - Built with Rust and Axum for speed, reliability and concurrency
- Cloud-Ready - Native support for AWS, GCP, Azure
- Modular Design - Use what you need, leave what you don't
- Alerting and Monitoring - Built-in alerting integrations with
Slack
andOpsGenie
to notify you and your team when an alert is triggered - Data Retention - Built-in data retention policies to keep your database clean and performant
Scouter is written in Rust and is exposed via a Python API built with PyO3.
Quick Start¶
Scouter follows a client and server architecture whereby the client is a lightweight library that can be dropped into any Python application and the server is a Rust-based service that handles the heavy lifting of data collection, storage, and querying (setup separately).
Install Scouter¶
Population Stability Index (PSI) Example - Client¶
import numpy as np
import pandas as pd
from scouter.client import ScouterClient # Get the scouter client in order to interact with the server
from scouter.drift import Drifter, PsiDriftConfig
def generate_data() -> pd.DataFrame:
"""Create a fake data frame for testing"""
n = 10_000
X_train = np.random.normal(-4, 2.0, size=(n, 4))
col_names = []
for i in range(0, X_train.shape[1]):
col_names.append(f"col_{i}")
X = pd.DataFrame(X_train, columns=col_names)
return X
if __name__ == "__main__":
# Drfter class for creating drift profiles
drifter = Drifter()
client = ScouterClient()
# get fake data
data = generate_data()
``# Create a psi config
psi_config = PsiDriftConfig(
name="test",
space="test",
version="0.0.1",
features_to_monitor=["feature_1"],
)
# Create drift profile
psi_profile = drifter.create_drift_profile(data, psi_config)
# register drift profile
client.register_profile(psi_profile)
Custom Metric Example - Client¶
import numpy as np
import pandas as pd
from scouter.client import ScouterClient # Get the scouter client in order to interact with the server
from scouter.drift import (
CustomDriftProfile,
CustomMetric,
CustomMetricDriftConfig,
Drifter,
PsiDriftConfig,
SpcDriftConfig,
)
def generate_data() -> pd.DataFrame:
"""Create a fake data frame for testing"""
n = 10_000
X_train = np.random.normal(-4, 2.0, size=(n, 4))
col_names = []
for i in range(0, X_train.shape[1]):
col_names.append(f"col_{i}")
X = pd.DataFrame(X_train, columns=col_names)
return X
if __name__ == "__main__":
# Drfter class for creating drift profiles
drifter = Drifter()
client = ScouterClient()
# get fake data
data = generate_data()
``# Create a custom config
custom_config = CustomMetricDriftConfig(
name="test",
space="test",
version="0.0.1",
)
# Create drift profile
custom_profile = CustomDriftProfile(
config=custom_config,
metrics=[
CustomMetric(
name="mae",
value=10,
alert_threshold=AlertThreshold.Above, # any value above 10 will trigger an alert
),
],
)
# register drift profile
client.register_profile(custom_profile)