pixaris.metrics package

Submodules

pixaris.metrics.base module

class pixaris.metrics.base.BaseMetric[source]

Bases: object

When implementing a new Metric, inherit from this one and implement all the abstract methods.

abstractmethod calculate(x: any) → dict[source]

pixaris.metrics.iou module

class pixaris.metrics.iou.IoUMetric(reference_images: Iterable[Image])[source]

Bases: BaseMetric

calculate(generated_images: Iterable[Image]) → dict[source]

Calculate the Intersection over Union (IoU) for a list of generated images.

Parameters:: generated_images (Iterable[Image]) – A list of generated images.
Returns:: A dictionary containing a single entry: “iou”: the average IoU score.
Return type:: dict

pixaris.metrics.llm module

class pixaris.metrics.llm.BaseLLMMetric(prompt: str, sample_size: int = 3, **reference_images: list[Image])[source]

Bases: BaseMetric

BaseLLMMetric is a base class for metrics that use a Gemini large language model (LLM) to evaluate images.

Parameters:

prompt (str) – The prompt string for the LLM. The prompt has to determine what should be evaluated and that the output is JSON formatted. ‘{“base_llm_metric”: x}’ where x is the score. Can also be multiple scores in one.
sample_size (int, optional) – The number of times to call the LLM for the same image. Defaults to 3.
reference_images (dict[str, list[Image]]) – A dictionary of reference images.

calculate(evaluation_images: list[Image]) → dict[source]

Calculate the LLM metrics for a list of evaluation images.

Parameters:: evaluation_images (list[Image]) – A list of evaluation images.
Returns:: A dictionary containing the LLM metrics for the evaluation images.
Return type:: dict
Raises:: ValueError – If the number of evaluation images does not match the number of reference images.

class pixaris.metrics.llm.ErrorLLMMetric[source]

Bases: BaseLLMMetric

ErrorLLMMetric is a class inheriting from BaseLLMMetric that uses a Gemini LLM to evaluate the error in the generated images. This metric is used to find errors in the generated images, such as missing objects, incorrect colors, or other visual artifacts. Does not require any reference images. The metric ranges from 0 (many errors) to 1 (no errors).

calculate(evaluation_images: list[Image]) → dict[source]

Calculate the Error LLM metrics for a list of evaluation images.

Parameters:: evaluation_images (list[Image]) – A list of evaluation images.
Returns:: A dictionary containing the LLM metrics for the evaluation images.
Return type:: dict
Raises:: ValueError – If the number of evaluation images does not match the number of reference images.

class pixaris.metrics.llm.SimilarityLLMMetric(reference_images: list[Image])[source]

Bases: BaseLLMMetric

SimilarityLLMMetric is a class inheriting from BaseLLMMetric that uses a Gemini LLM to evaluate the similarity between images. Compare the objects in your images against objects in a set of reference images. Calculates a rough estimate of how similar they are. The metric ranges from 0 (not similar) to 1 (very similar).

Parameters:: reference_images (list[Image]) – A list of reference images to compare against.

calculate(evaluation_images: list[Image]) → dict[source]

Calculate the LLM metrics for a list of evaluation images.

Parameters:: evaluation_images (list[Image]) – A list of evaluation images.
Returns:: A dictionary containing the LLM metrics for the evaluation images.
Return type:: dict
Raises:: ValueError – If the number of evaluation images does not match the number of reference images.

class pixaris.metrics.llm.StyleLLMMetric(**reference_images: list[Image])[source]

Bases: BaseLLMMetric

StyleLLMMetric is a class inheriting from BaseLLMMetric that uses a Gemini LLM to evaluate the style of images. Compares the style of the generated images against a (multiple) reference style images. The metric ranges from 0 (bad style match) to 1 (excellent style match).

Parameters:: reference_images – A **kwargs dictionary of reference images. Pass lists of images that you want to compare to.

Example::

style_images = [image1, image2] object_images = [image3, image4]

Will compare image1 and image3 to the first evaluation image and image2 and image4 to the second evaluation image.

calculate(evaluation_images: list[Image]) → dict[source]

Calculate the Style LLM metrics for a list of evaluation images.

Parameters:: evaluation_images (list[Image]) – A list of evaluation images.
Returns:: A dictionary containing the LLM metrics for the evaluation images.
Return type:: dict
Raises:: ValueError – If the number of evaluation images does not match the number of reference images.

pixaris.metrics.luminescence module

class pixaris.metrics.luminescence.LuminescenceComparisonByMaskMetric(mask_images: Iterable[Image])[source]

Bases: BaseMetric

Calculate the luminescence difference between the masked part and the unmasked part of an image. The metric is calculated as the absolute difference between the average luminescence of the masked part and the unmasked part of the image. The result is a number between 0 and 1. 1 is the best possible score (minimal difference in luminescence of masked and unmasked part) and 0 is the worst score (maximal difference).

calculate(generated_images: Iterable[Image]) → dict[source]

Calculate the luminescence score of a list of generated images. For each image we calculate the average luminescence of the masked part and the unmasked part, and return the absolute difference between them. Luminescence is a number between 0 and 1, so the result is also a number between 0 and 1. We invert them to make 1 the best score (minimal difference in luminescence of masked and unmasked part) and 0 the worst (maximal difference).

Parameters:: generated_images (Iterable[Image]) – A list of generated images.
Returns:: A dictionary containing a single entry: “luminescence_difference”: the average luminescence_difference score.
Return type:: dict

class pixaris.metrics.luminescence.LuminescenceWithoutMaskMetric[source]

Bases: BaseMetric

Calculates mean and variance of the luminescence of the image.

calculate(generated_images: Iterable[Image]) → dict[source]

Calculate the luminescence score of a list of generated images. For each image we calculate the mean and variance of the luminescence, and return the average of them. The results is 2 numbers between 0 and 1

Parameters:: generated_images (Iterable[Image]) – A list of generated images.
Returns:: A dictionary containing different luminescence statistics:
Return type:: dict

pixaris.metrics.prompts module

pixaris.metrics.saturation module

class pixaris.metrics.saturation.SaturationComparisonByMaskMetric(mask_images: Iterable[Image])[source]

Bases: BaseMetric

Calculate the saturation difference between the masked part and the unmasked part of an image. The metric is calculated as the absolute difference between the average saturation of the masked part and the unmasked part of the image. The result is a number between 0 and 1 with 1 being the best possible score and 0 being the worst score.

calculate(generated_images: Iterable[Image]) → dict[source]

Calculate the saturation score of a list of generated images. For each image we calculate the average saturation of the masked part and the unmasked part, and return the absolute difference between them.

Parameters:: generated_images (Iterable[Image]) – A list of generated images.
Returns:: A dictionary containing a single entry: “saturation_difference”: the average saturation_difference score.
Return type:: dict

class pixaris.metrics.saturation.SaturationWithoutMaskMetric[source]

Bases: BaseMetric

Calculates mean and variance of the saturation of the image.

calculate(generated_images: Iterable[Image]) → dict[source]

Calculate the saturation score of a list of generated images. For each image we calculate the mean and variance of the saturation, and return the average of them. The results is 2 numbers between 0 and 1

Parameters:: generated_images (Iterable[Image]) – A list of generated images.
Returns:: A dictionary containing different saturation statistics:
Return type:: dict

pixaris.metrics.utils module

pixaris.metrics.utils.dict_mean(input_dict_list: Iterable[dict]) → dict[source]

Calculate the mean value for each key in a list of dictionaries.

Parameters:: input_dict_list (Iterable[dict]) – A list of dictionaries with the same keys.
Returns:: A dictionary with the mean values for each key.
Return type:: dict
Raises:: ValueError – If the input list is empty or if the dictionaries have different keys.

pixaris.metrics.utils.normalize_image(image: <module 'PIL.Image' from '/home/runner/.cache/pypoetry/virtualenvs/pixaris-m4dTnt0q-py3.12/lib/python3.12/site-packages/PIL/Image.py'>, max_size=(1024, 1024)) → <module 'PIL.Image' from '/home/runner/.cache/pypoetry/virtualenvs/pixaris-m4dTnt0q-py3.12/lib/python3.12/site-packages/PIL/Image.py'>[source]

Normalize the given image by placing it on a white background, scaling it while preserving aspect ratio, and returning the resulting image.

Parameters:

image (PIL.Image.Image) – The input image to be normalized.
max_size (tuple[int, int], optional) – The maximum size of the output image, defaults to (1024, 1024).

Returns:

The normalized image with a white background.

Return type:

PIL.Image.Image

pixaris.metrics package

Submodules

pixaris.metrics.base module

pixaris.metrics.iou module

pixaris.metrics.llm module

pixaris.metrics.luminescence module

pixaris.metrics.prompts module

pixaris.metrics.saturation module

pixaris.metrics.utils module

Module contents