Summaries for Timing

A gauge can only hold the last value of what is was set to, so how can we time events and measure latency?

The answer is to use a summary. It will track both the total time taken by events there were and how many events there were:

import time
import random
from prometheus_client import start_http_server
from prometheus_client import Summary

function_latency = Summary("my_function_latency_seconds",
    "Latency of my_function", unit="seconds")

def my_function():
    with function_latency.time():
        time.sleep(random.random())

if __name__ == "__main__":
    start_http_server(8000)
    while True:
        my_function()
        time.sleep(1)

Here the time() context manager is used to time some code. It can also be used as a function decorator.

The metrics output will include my_function_latency_seconds_sum and my_function_latency_seconds_count, and from these the latency in seconds can be calculated with the expression rate(my_function_latency_seconds_sum[1m]) / rate(my_function_latency_seconds_count[1m]) in PromQL.

Complete and Continue