Summaries for Timing
A gauge can only hold the last value of what is was set to, so how can we time events and measure latency?
The answer is to use a summary. It will track both the total time taken by events there were and how many events there were:
package io.robustperception.java_examples; import io.prometheus.client.Summary; import io.prometheus.client.hotspot.DefaultExports; import io.prometheus.client.exporter.HTTPServer; import java.util.Random; public class JavaExample { static final Summary functionLatency = Summary.build() .name("my_function_latency").unit("seconds") .help("Latency of my function").register(); static void myFunction() throws Exception { Summary.Timer requestTimer = functionLatency.startTimer(); try { Thread.sleep(new Random().nextInt(1000)); } finally { requestTimer.observeDuration(); } } public static void main(String[] args) throws Exception { DefaultExports.initialize(); HTTPServer server = new HTTPServer(8000); while (true) { myFunction(); Thread.sleep(1000); } } }
Here the startTimer()
is called when you want to start timing, and observeDuration
when you want to stop. A try..finally
is used to handle any exceptions that might be thrown.
The metrics output will include my_function_latency_seconds_sum
and my_function_latency_seconds_count
, and from these the latency in seconds can be calculated with the expression rate(my_function_latency_seconds_sum[1m]) / rate(my_function_latency_seconds_count[1m])
in PromQL.