Welcome to Koality
Koality is a Python library for data quality checks and monitoring based on DuckDB. It provides a flexible framework for validating data in tables, detecting anomalies, and ensuring data quality across your data pipelines.
Features
- Multiple Check Types: Null ratio, regex matching, value sets, duplicates, counts, match rates, and more
- YAML Configuration: Define checks declaratively in YAML files with global defaults and check bundles
- DuckDB Backend: Fast, in-process analytical database for executing checks
- CLI Interface: Run checks from the command line with
koality - Extensible: Easy to add custom check types by extending base classes
- Result Persistence: Store check results in database tables for historical tracking
Quick Start
from koality import CheckExecutor
from koality.models import Config
from pydantic_yaml import parse_yaml_raw_as
# Load configuration from YAML
with open("checks.yaml") as f:
config = parse_yaml_raw_as(Config, f.read())
# Execute checks
executor = CheckExecutor(config)
results = executor()
# Check for failures
if executor.check_failed:
print(executor.get_failed_checks_msg())
Installation
pip install koality
Or with uv:
uv add koality
Available Checks
| Check | Description |
|---|---|
NullRatioCheck |
Validates the ratio of null values in a column |
RegexMatchCheck |
Checks if values match a regex pattern |
ValuesInSetCheck |
Validates values are within an allowed set |
RollingValuesInSetCheck |
Rolling window version of ValuesInSetCheck |
DuplicateCheck |
Detects duplicate values in a column |
CountCheck |
Validates row counts or distinct counts |
OccurrenceCheck |
Checks min/max occurrence of values |
MatchRateCheck |
Validates join match rates between tables |
RelCountChangeCheck |
Detects relative count changes over time |
IqrOutlierCheck |
Detects outliers using interquartile range |
Documentation
- Getting Started - Installation and first steps
- Configuration - YAML configuration reference
- API Reference - Full API documentation