Getting Started

This guide will help you get started with Koality for data quality monitoring.

Installation

Using pip

pip install koality

Using uv

uv add koality

Supported Databases

Koality uses DuckDB as its query engine. Currently supported:

DuckDB (in-memory) - Fully supported
Google Cloud BigQuery - Fully supported via DuckDB's BigQuery extension

Other databases may work out of the box but may also be supported in future releases through extension of the execute_query function. Contributions are welcome!

Basic Usage

1. Create a Configuration File

Create a YAML configuration file (checks.yaml) that defines your data quality checks:

name: my_dqm_checks
database_setup: |
  ATTACH 'my_database.duckdb' AS my_db;
database_accessor: my_db

defaults:
  result_table: dqm_results
  filters:
    partition_date:
      column: date
      value: "2024-01-01"
      type: date

check_bundles:
  - name: orders_checks
    defaults:
      table: orders
    checks:
      - check_type: NullRatioCheck
        check_column: order_id
        lower_threshold: 0.0
        upper_threshold: 0.0

      - check_type: CountCheck
        check_column: "*"
        lower_threshold: 1000
        upper_threshold: 1000000

Understanding database_setup and database_accessor

database_setup: SQL commands executed when the CheckExecutor initializes. Use this to install extensions, attach databases, and configure connections.
database_accessor: The alias/name of your attached database. Koality uses this to identify the database type and route queries appropriately.

Example for Google Cloud BigQuery:

name: bigquery_checks
database_setup: |
  INSTALL bigquery;
  LOAD bigquery;
  ATTACH 'project=my-gcp-project' AS bq (TYPE bigquery, READ_ONLY);
database_accessor: bq

When using BigQuery, Koality automatically uses the BigQuery Jobs API for reads (which supports views) and bigquery_execute() for writes.

2. Run Checks Programmatically

from koality import CheckExecutor
from koality.models import Config
from pydantic_yaml import parse_yaml_raw_as

# Load configuration
with open("checks.yaml") as f:
    config = parse_yaml_raw_as(Config, f.read())

# Create executor and run checks
executor = CheckExecutor(config)
results = executor()

# Handle results
if executor.check_failed:
    print("Some checks failed:")
    print(executor.get_failed_checks_msg())
else:
    print("All checks passed!")

3. Run Checks via CLI

koality run --config_path checks.yaml

Override Configuration at Runtime

Use the --overwrites (-o) option to override configuration values without modifying the YAML file:

# Override a filter value (e.g., run for a specific date instead of "yesterday")
koality run --config_path checks.yaml -o defaults.filters.partition_date=2023-06-15

# Override multiple values
koality run --config_path checks.yaml -o defaults.filters.partition_date=2023-06-15 -o defaults.filters.shop_id=SHOP02

# Override other defaults
koality run --config_path checks.yaml -o defaults.identifier_format=column_name -o defaults.monitor_only=true

# Override filter fields (column, operator, etc.)
koality run --config_path checks.yaml -o defaults.filters.partition_date.column=OTHER_DATE_COL

# Override at bundle level (only affects that bundle)
koality run --config_path checks.yaml -o check_bundles.orders_checks.filters.partition_date=2023-06-15

# Override at check level (only affects specific check by index)
koality run --config_path checks.yaml -o check_bundles.orders_checks.0.table=orders_archive

Overwrites applied to global defaults propagate to all checks automatically.

4. Validate Configuration

Validate your configuration file without executing checks:

koality validate --config_path checks.yaml

5. Print Resolved Configuration

View the fully resolved configuration (after default propagation) in different formats:

# YAML format (default)
koality print --config_path checks.yaml

# JSON format
koality print --config_path checks.yaml --format json

# Pydantic model representation
koality print --config_path checks.yaml --format model

# Custom indentation
koality print --config_path checks.yaml --format yaml --indent 4

# Preview configuration with overwrites applied
koality print --config_path checks.yaml -o partition_date=2023-06-15

Using print with overwrites is useful for verifying your overrides are applied correctly before running checks.

Understanding Check Results

Each check returns a result dictionary with the following fields:

Field	Description
`DATE`	The date the check was run for
`METRIC_NAME`	Name of the metric/check
`IDENTIFIER`	Identifier value (format depends on `identifier_format` setting)
`TABLE`	Table being checked
`COLUMN`	Column being checked
`VALUE`	Actual value measured
`LOWER_THRESHOLD`	Lower threshold for passing
`UPPER_THRESHOLD`	Upper threshold for passing
`RESULT`	`SUCCESS`, `FAIL`, `MONITOR_ONLY`, or `ERROR`

Next Steps

Learn about Configuration options
Explore available Checks
See the API Reference for detailed documentation