freeleaps-service-hub/apps/metrics/docs/design.md

3.9 KiB

1.Override

We support two ways to query metrics:

  • Connect to StarRocks data warehouse and query metrics from it
  • Query Prometheus directly and retrieve metrics from it

2.Starrocks Metric

We can implement StarRocks Metric queries similar to Prometheus Metric queries. The only difference is replacing PromQL with SQL and querying through StarRocks API.

3.Prometheus Metric

3.1.Metrics Config

Currently, metrics are configured in code. In the future, they will be configured through database or other methods. Organization structure: Product ID -> Metric Name -> Metric Query Method (PromQL)

{
  "freeleaps": {
    // Just for demo
    "cpu_usage": "100 - (avg by (instance) (irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
    // Just for demo
    "memory_usage": "100 - ((node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100)",
    // Just for demo
    "disk_usage": "100 - ((node_filesystem_avail_bytes{mountpoint=\"/\"} / node_filesystem_size_bytes{mountpoint=\"/\"}) * 100)",
    "latency_ms": "1000*avg(freeleaps_notification_http_request_duration_seconds_sum{handler!=\"none\"} / freeleaps_notification_http_request_duration_seconds_count)",
    "reliability": "1-sum(rate(freeleaps_notification_http_requests_total{status=\"5xx\"}[1m]))"
  },
  "magicleaps": {}
}

If we want to add new metrics, theoretically we only need to add one configuration entry (provided that the metric exists in Prometheus and can be queried directly through PromQL without requiring any additional code processing)

3.2.API Design

3.2.1.Query Metrics by Product ID

API: /api/metrics/prometheus/product/{product_id}/available-metrics

Method: GET Request:

product_id=freeleaps

Response:

{
  "product_id": "freeleaps",
  "available_metrics": [
    "cpu_usage",
    "memory_usage",
    "disk_usage",
    "latency_ms",
    "reliability"
  ],
  "total_count": 5,
  "description": "List of metrics with predefined PromQL queries for product 'freeleaps'"
}

3.2.2.Query Metric Info

API: /api/metrics/prometheus/product/{product_id}/metric/{metric_name}/info

Method: GET Request:

product_id=freeleaps
metric_name=cpu_usage

Response:

{
  "metric_info": {
    "product_id": "freeleaps",
    "metric_name": "cpu_usage",
    "promql_query": "100 - (avg by (instance) (irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
    "description": "PromQL query for cpu_usage metric in product freeleaps"
  },
  "description": "Information about metric 'cpu_usage' in product 'freeleaps'"
}

3.2.3.Query Metric Data

API: /api/metrics/prometheus/metrics_query

Method: GET Request:

{
    "product_id":"freeleaps",
    "metric_name": "latency_ms",
    "start_time": "2025-09-12T00:00:00Z",
    "end_time": "2025-09-16T01:00:00Z",
    "step":"1h" # Interval between data points in the query result
}

Response:

{
  "metric_name": "latency_ms",
  "data_points": [
    {
      "date": "2025-09-12T08:00:00Z",
      "value": 41.37141507698155,
      "labels": {} # Optional: Additional labels for prometheus, Just for debugging
    },
    {
      "date": "2025-09-12T09:00:00Z",
      "value": 41.371992733188385,
      "labels": {}
    },
    {
      "date": "2025-09-12T10:00:00Z",
      "value": 41.37792878125675,
      "labels": {}
    },
    {
      "date": "2025-09-12T11:00:00Z",
      "value": 41.37297490632533,
      "labels": {}
    },
    ...
    {
      "date": "2025-09-16T08:00:00Z",
      "value": 40.72491916149973,
      "labels": {}
    },
    {
      "date": "2025-09-16T09:00:00Z",
      "value": 40.72186597550194,
      "labels": {}
    }
  ],
  "total_points": 98,
  "time_range": {
    "start": "2025-09-12T00:00:00Z",
    "end": "2025-09-16T01:00:00Z"
  },
  "step": "1h"
}

4.Universal Metrics

In the future, we can create an abstraction layer above StarRocks Metrics and Prometheus Metrics to unify metric queries from both data sources!