Skip to content

Welcome to Koality

Koality is a Python library for data quality checks and monitoring based on DuckDB. It provides a flexible framework for validating data in tables, detecting anomalies, and ensuring data quality across your data pipelines.

Features

  • Multiple Check Types: Null ratio, regex matching, value sets, duplicates, counts, match rates, and more
  • YAML Configuration: Define checks declaratively in YAML files with global defaults and check bundles
  • DuckDB Backend: Fast, in-process analytical database for executing checks
  • CLI Interface: Run checks from the command line with koality
  • Extensible: Easy to add custom check types by extending base classes
  • Result Persistence: Store check results in database tables for historical tracking

Quick Start

from koality import CheckExecutor
from koality.models import Config
from pydantic_yaml import parse_yaml_raw_as

# Load configuration from YAML
with open("checks.yaml") as f:
    config = parse_yaml_raw_as(Config, f.read())

# Execute checks
executor = CheckExecutor(config)
results = executor()

# Check for failures
if executor.check_failed:
    print(executor.get_failed_checks_msg())

Installation

pip install koality

Or with uv:

uv add koality

Available Checks

Check Description
NullRatioCheck Validates the ratio of null values in a column
RegexMatchCheck Checks if values match a regex pattern
ValuesInSetCheck Validates values are within an allowed set
RollingValuesInSetCheck Rolling window version of ValuesInSetCheck
DuplicateCheck Detects duplicate values in a column
CountCheck Validates row counts or distinct counts
OccurrenceCheck Checks min/max occurrence of values
MatchRateCheck Validates join match rates between tables
RelCountChangeCheck Detects relative count changes over time
IqrOutlierCheck Detects outliers using interquartile range

Documentation