Merge pull request 'merge dev to master(only metrics)' (#58) from dev into master

Reviewed-on: freeleaps/freeleaps-service-hub#58
This commit is contained in:
icecheng 2025-09-19 08:16:19 +00:00
commit 16409e31bb
47 changed files with 2461 additions and 0 deletions

84
apps/metrics/.gitignore vendored Normal file
View File

@ -0,0 +1,84 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# Virtual environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# Logs
logs/
*.log
# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
# Test coverage
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# Environments
.env.local
.env.development.local
.env.test.local
.env.production.local

31
apps/metrics/Dockerfile Normal file
View File

@ -0,0 +1,31 @@
# download image here: https://docker.aityp.com/image/docker.io/python:3.12-slim
FROM python:3.12-slim
# Set working directory
WORKDIR /app
# Copy requirements file
COPY requirements.txt .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# StarRocks settings
ENV STARROCKS_HOST: str = "freeleaps-starrocks-cluster-fe-service.freeleaps-data-platform.svc"
ENV STARROCKS_PORT: int = 9030
ENV STARROCKS_USER: str = "root"
ENV STARROCKS_PASSWORD: str = ""
ENV STARROCKS_DATABASE: str = "freeleaps"
# Prometheus settings
ENV PROMETHEUS_ENDPOINT: str = "http://localhost:9090"
# Expose port
EXPOSE 8009
# Start command
CMD ["uvicorn", "webapi.main:app", "--host", "0.0.0.0", "--port", "8009"]

207
apps/metrics/README.md Normal file
View File

@ -0,0 +1,207 @@
# 📊 Metrics Service
> A lightweight FastAPI microservice for user registration analytics and metrics
[![Python](https://img.shields.io/badge/Python-3.12+-blue.svg)](https://python.org)
[![FastAPI](https://img.shields.io/badge/FastAPI-0.114+-green.svg)](https://fastapi.tiangolo.com)
[![Docker](https://img.shields.io/badge/Docker-Ready-blue.svg)](https://docker.com)
The Metrics service provides real-time APIs for querying user registration data (via StarRocks) and querying monitoring metrics (via Prometheus).
## ✨ Features
### 📊 Registration Analytics (StarRocks)
- Date Range Query (start_date ~ end_date)
- Typed responses with Pydantic models
### 📈 Prometheus Metrics
- Predefined PromQL metrics per product
- Time-range queries and metric info discovery
### 🔧 Technical Features
- Data Models: Pydantic v2
- Async HTTP and robust error handling
- Structured logging
- Auto-generated Swagger/OpenAPI docs
## 📁 Project Structure
```
metrics/
├── backend/ # Business logic layer
│ ├── infra/ # Infrastructure components
│ │ └── external_service/
│ │ ├── prometheus_client.py
│ │ └── starrocks_client.py
│ ├── models/ # Data models
│ │ └── user_registration_models.py
│ └── services/ # Business services
│ ├── prometheus_metrics_service.py
│ └── registration_analytics_service.py
├── webapi/ # API layer
│ ├── routes/ # API endpoints
│ │ ├── metrics/ # Aggregated routers (prefix: /api/metrics)
│ │ ├── prometheus_metrics/
│ │ │ ├── __init__.py
│ │ │ ├── available_metrics.py
│ │ │ ├── metric_info.py
│ │ │ └── metrics_query.py
│ │ └── starrocks_metrics/
│ │ ├── __init__.py
│ │ ├── available_metrics.py
│ │ ├── metric_info.py
│ │ └── metrics_query.py
│ ├── bootstrap/
│ │ └── application.py
│ └── main.py
├── common/
├── requirements.txt
├── Dockerfile
├── local.env
└── README.md
```
## 🚀 Quick Start
### Prerequisites
- Python 3.12+ or Docker
- Access to StarRocks database (for registration analytics)
- Access to Prometheus (for monitoring metrics)
### 🐍 Python Setup
```bash
# 1. Install dependencies
pip install -r requirements.txt
# 2. Start the service
python3 -m uvicorn webapi.main:app --host 0.0.0.0 --port 8009 --reload
```
### 🐳 Docker Setup
```bash
# 1. Build image
docker build -t metrics:latest .
# 2. Run container (env from baked local.env)
docker run --rm -p 8009:8009 metrics:latest
# Alternatively: run with a custom env file
# (this overrides envs copied into the image)
docker run --rm -p 8009:8009 --env-file local.env metrics:latest
```
### 📖 Access Documentation
Visit `http://localhost:8009/docs` for interactive API documentation.
## 📊 API Endpoints
All endpoints are exposed under the API base prefix `/api/metrics`.
### StarRocks (Registration Analytics)
- POST `/api/metrics/starrocks/dru_query` — Query daily registered users time series for a date range
- GET `/api/metrics/starrocks/product/{product_id}/available-metrics` — List available StarRocks-backed metrics for the product
- GET `/api/metrics/starrocks/product/{product_id}/metric/{metric_name}/info` — Get metric info for the product
Example request:
```bash
curl -X POST "http://localhost:8009/api/metrics/starrocks/dru_query" \
-H "Content-Type: application/json" \
-d '{
"product_id": "freeleaps",
"start_date": "2024-09-10",
"end_date": "2024-09-20"
}'
```
### Prometheus
- POST `/api/metrics/prometheus/metrics_query` — Query metric time series by product/metric
- GET `/api/metrics/prometheus/product/{product_id}/available-metrics` — List available metrics for product
- GET `/api/metrics/prometheus/product/{product_id}/metric/{metric_name}/info` — Get metric info
## 🧪 Testing
```bash
# Health check
curl http://localhost:8009/
```
## ⚙️ Configuration
### Environment Variables
```bash
# Server Configuration
SERVICE_API_ACCESS_HOST=0.0.0.0
SERVICE_API_ACCESS_PORT=8009
# StarRocks Database
STARROCKS_HOST=freeleaps-starrocks-cluster-fe-service.freeleaps-data-platform.svc
STARROCKS_PORT=9030
STARROCKS_USER=root
STARROCKS_PASSWORD=
STARROCKS_DATABASE=freeleaps
# Logging
LOG_BASE_PATH=./logs
BACKEND_LOG_FILE_NAME=metrics
APPLICATION_ACTIVITY_LOG=metrics-activity
# Prometheus
PROMETHEUS_ENDPOINT=http://localhost:9090
```
> Tip: Copy `local.env` to `.env` and modify as needed for your environment.
### 🐳 Docker Deployment
```bash
# Build and run
docker build -t metrics:latest .
docker run --rm -p 8009:8009 metrics:latest
# Run with custom environment
docker run --rm -p 8009:8009 --env-file local.env metrics:latest
# Run in background
docker run -d --name metrics-service -p 8009:8009 metrics:latest
```
**Image Details:**
- Base: Python 3.12-slim
- Port: 8009
- Working Dir: `/app`
## 🔧 Development
```bash
# Setup development environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Run with auto-reload
python -m uvicorn webapi.main:app --reload
```
## 📝 API Documentation
- Swagger UI: `http://localhost:8009/docs`
- ReDoc: `http://localhost:8009/redoc`
- OpenAPI JSON: `http://localhost:8009/openapi.json`
## ⚠️ Important Notes
- Date format: `YYYY-MM-DD` (single-digit month/day also accepted by API)
- Default `product_id`: "freeleaps"
- Requires StarRocks database access and/or Prometheus endpoint
## 🐛 Troubleshooting
| Issue | Solution |
|-------|----------|
| Port in use | `docker stop $(docker ps -q --filter ancestor=metrics:latest)` |
| Import errors | Check dependencies: `pip install -r requirements.txt` |
| DB connection | Verify StarRocks cluster accessibility |
| Container issues | Check logs: `docker logs <container_id>` |
## 📄 License
Part of the Freeleaps platform.

1
apps/metrics/__init__.py Normal file
View File

@ -0,0 +1 @@
# Metrics Service

View File

@ -0,0 +1 @@
# Backend module

View File

View File

@ -0,0 +1,119 @@
import httpx
from typing import Dict, Any, Optional, Union
from datetime import datetime
import json
from fastapi import HTTPException
from common.config.app_settings import app_settings
from common.log.module_logger import ModuleLogger
class PrometheusClient:
"""
Async Prometheus client for querying metrics data using PromQL.
This client provides methods to:
- Query data using PromQL expressions
- Get all available metrics
- Get labels for specific metrics
- Query metric series with label filters
"""
def __init__(self, endpoint: Optional[str] = None):
"""
Initialize Prometheus client.
Args:
endpoint: Prometheus server endpoint. If None, uses PROMETHEUS_ENDPOINT from settings.
"""
self.module_logger = ModuleLogger(__file__)
self.endpoint = endpoint or app_settings.PROMETHEUS_ENDPOINT
self.base_url = f"{self.endpoint.rstrip('/')}/api/v1"
async def request(self, endpoint: str, params: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
"""
Make HTTP request to Prometheus API.
Args:
endpoint: API endpoint path
params: Query parameters
Returns:
JSON response data
Raises:
httpx.HTTPError: If request fails
ValueError: If response is not valid JSON
"""
url = f"{self.base_url}/{endpoint.lstrip('/')}"
try:
await self.module_logger.log_info(f"Making request to Prometheus: {url} with params: {params}")
async with httpx.AsyncClient(timeout=30.0) as client:
response = await client.get(url, params=params)
response.raise_for_status()
data = response.json()
if data.get("status") != "success":
error_msg = data.get('error', 'Unknown error')
await self.module_logger.log_error(f"Prometheus API error: {error_msg}")
raise HTTPException(status_code=400, detail=f"Prometheus API error: {error_msg}")
return data
except httpx.HTTPError as e:
await self.module_logger.log_error(f"HTTP error querying Prometheus: {e}")
raise HTTPException(status_code=502, detail=f"Failed to connect to Prometheus: {str(e)}")
except json.JSONDecodeError as e:
await self.module_logger.log_error(f"Invalid JSON response from Prometheus: {e}")
raise HTTPException(status_code=400, detail=f"Invalid response from Prometheus: {str(e)}")
async def query_range(
self,
query: str,
start: Union[str, datetime],
end: Union[str, datetime],
step: str = "15s"
) -> Dict[str, Any]:
"""
Execute a PromQL range query.
Args:
query: PromQL query string
start: Start time (RFC3339 string or datetime)
end: End time (RFC3339 string or datetime)
step: Query resolution step width (e.g., "15s", "1m", "1h")
Returns:
Range query result data
Example:
result = await client.query_range(
"up{job='prometheus'}",
start=datetime.now() - timedelta(hours=1),
end=datetime.now(),
step="1m"
)
"""
params = {
"query": query,
"step": step
}
# Convert datetime to RFC3339 string if needed
if isinstance(start, datetime):
if start.tzinfo is None:
params["start"] = start.isoformat() + "Z"
else:
params["start"] = start.isoformat()
else:
params["start"] = start
if isinstance(end, datetime):
if end.tzinfo is None:
params["end"] = end.isoformat() + "Z"
else:
params["end"] = end.isoformat()
else:
params["end"] = end
return await self.request("query_range", params)

View File

@ -0,0 +1,60 @@
import pymysql
from typing import List, Dict, Any, Optional
from datetime import date
from common.log.module_logger import ModuleLogger
from common.config.app_settings import app_settings
class StarRocksClient:
"""StarRocks database client for querying user registration data"""
def __init__(self):
self.host = app_settings.STARROCKS_HOST
self.port = app_settings.STARROCKS_PORT
self.user = app_settings.STARROCKS_USER
self.password = app_settings.STARROCKS_PASSWORD
self.database = app_settings.STARROCKS_DATABASE
self.connection = None
self.module_logger = ModuleLogger(__file__)
async def connect(self) -> bool:
"""Establish connection to StarRocks database"""
try:
self.connection = pymysql.connect(
host=self.host,
port=self.port,
user=self.user,
password=self.password,
database=self.database,
charset='utf8mb4',
autocommit=True
)
await self.module_logger.log_info(f"Successfully connected to StarRocks at {self.host}:{self.port}")
return True
except Exception as e:
await self.module_logger.log_error(f"Failed to connect to StarRocks: {e}")
return False
async def disconnect(self):
"""Close database connection"""
if self.connection:
self.connection.close()
self.connection = None
await self.module_logger.log_info("Disconnected from StarRocks")
async def execute_query(self, query: str, params: Optional[tuple] = None) -> List[Dict[str, Any]]:
"""Execute SQL query and return results"""
if not self.connection:
if not await self.connect():
raise Exception("Failed to connect to StarRocks database")
try:
with self.connection.cursor(pymysql.cursors.DictCursor) as cursor:
cursor.execute(query, params)
results = cursor.fetchall()
await self.module_logger.log_info(f"Query executed successfully, returned {len(results)} rows")
return results
except Exception as e:
await self.module_logger.log_error(f"Query execution failed: {e}")
raise e

View File

@ -0,0 +1,268 @@
from typing import Dict, List, Any, Optional, Union
from datetime import datetime, timedelta
from fastapi import HTTPException
from common.log.module_logger import ModuleLogger
from ..infra.external_service.prometheus_client import PrometheusClient
class PrometheusMetricsService:
"""
Service class for querying Prometheus metrics with predefined PromQL queries.
This service provides a high-level interface for querying metrics data
using predefined PromQL queries mapped to metric names.
"""
# Global dictionary mapping metric names to their corresponding PromQL queries
METRIC_PROMQL_MAP: Dict[str, str] = {
"freeleaps": {
# Just demo, No Usage
"cpu_usage": "100 - (avg by (instance) (irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
# Just demo, No Usage
"memory_usage": "100 - ((node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100)",
# Just demo, No Usage
"disk_usage": "100 - ((node_filesystem_avail_bytes{mountpoint=\"/\"} / node_filesystem_size_bytes{mountpoint=\"/\"}) * 100)",
# Average response time for notification HTTP requests
"latency_ms": "1000*avg(freeleaps_notification_http_request_duration_seconds_sum{handler!=\"none\"} / freeleaps_notification_http_request_duration_seconds_count)",
# Error rate for 5xx HTTP status codes (stability metric)
"reliability": "1-sum(rate(freeleaps_notification_http_requests_total{status=\"5xx\"}[1m]))",
},
"magicleaps": {
}
}
def __init__(self, prometheus_endpoint: Optional[str] = None):
"""
Initialize PrometheusMetricsService.
Args:
prometheus_endpoint: Prometheus server endpoint. If None, uses default from settings.
"""
self.module_logger = ModuleLogger(__file__)
self.prometheus_client = PrometheusClient(prometheus_endpoint)
def get_available_metrics(self, product_id: Optional[str] = None) -> List[str]:
"""
Get list of available metric names that have predefined PromQL queries.
Args:
product_id: Optional product ID to filter metrics. If None, returns all metrics from all products.
Returns:
List of available metric names
"""
if product_id:
if product_id in self.METRIC_PROMQL_MAP:
return list(self.METRIC_PROMQL_MAP[product_id].keys())
else:
return []
else:
# Return all metrics from all products
all_metrics = []
for product_metrics in self.METRIC_PROMQL_MAP.values():
all_metrics.extend(product_metrics.keys())
return all_metrics
def get_available_products(self) -> List[str]:
"""
Get list of available product IDs.
Returns:
List of available product IDs
"""
return list(self.METRIC_PROMQL_MAP.keys())
async def query_metric_by_time_range(
self,
product_id: str,
metric_name: str,
start_time: Union[str, datetime],
end_time: Union[str, datetime],
step: str = "1m"
) -> List[Dict[str, Any]]:
"""
Query metric data for a specific time range.
Args:
product_id: Product ID to identify which product's metrics to query
metric_name: Name of the metric to query
start_time: Start time for the query (RFC3339 string or datetime)
end_time: End time for the query (RFC3339 string or datetime)
step: Query resolution step width (e.g., "1m", "5m", "1h")
Returns:
List of dictionaries with 'date' and 'value' keys
Raises:
ValueError: If product_id or metric_name is not found in the PromQL mapping
Exception: If Prometheus query fails
Example:
result = await service.query_metric_by_time_range(
"freeleaps",
"cpu_usage",
start_time=datetime.now() - timedelta(hours=1),
end_time=datetime.now(),
step="5m"
)
# Returns: [{"date": "2024-01-01T10:00:00Z", "value": 45.2}, ...]
"""
# Check if product_id exists in the mapping
if product_id not in self.METRIC_PROMQL_MAP:
available_products = ", ".join(self.get_available_products())
error_msg = f"Product '{product_id}' not found in PromQL mapping. Available products: {available_products}"
await self.module_logger.log_error(error_msg)
raise HTTPException(status_code=404, detail=error_msg)
# Check if metric name exists in the product's mapping
if metric_name not in self.METRIC_PROMQL_MAP[product_id]:
available_metrics = ", ".join(self.get_available_metrics(product_id))
error_msg = f"Metric '{metric_name}' not found in product '{product_id}' PromQL mapping. Available metrics: {available_metrics}"
await self.module_logger.log_error(error_msg)
raise HTTPException(status_code=404, detail=error_msg)
# Parse datetime strings if they are strings
if isinstance(start_time, str):
start_dt = datetime.fromisoformat(start_time.replace('Z', '+00:00'))
else:
start_dt = start_time
if isinstance(end_time, str):
end_dt = datetime.fromisoformat(end_time.replace('Z', '+00:00'))
else:
end_dt = end_time
# Validate time range
if start_dt >= end_dt:
raise HTTPException(
status_code=400,
detail="Start time must be before end time"
)
# Check time range is not too large (max 7 days for detailed queries)
time_diff = end_dt - start_dt
if time_diff > timedelta(days=7):
raise HTTPException(
status_code=400,
detail="Time range cannot exceed 7 days for detailed queries"
)
# Get the PromQL query for the metric
promql_query = self.METRIC_PROMQL_MAP[product_id][metric_name]
try:
await self.module_logger.log_info(
f"Querying metric '{metric_name}' from product '{product_id}' with PromQL: {promql_query}")
# Execute the range query
result = await self.prometheus_client.query_range(
query=promql_query,
start=start_dt,
end=end_dt,
step=step
)
# Parse the result and format it
formatted_data = self._format_query_result(result, metric_name)
await self.module_logger.log_info(
f"Successfully queried metric '{metric_name}' with {len(formatted_data)} data points")
return formatted_data
except Exception as e:
await self.module_logger.log_error(f"Failed to query metric '{metric_name}': {e}")
raise
def _format_query_result(self, prometheus_result: Dict[str, Any], metric_name: str) -> List[Dict[str, Any]]:
"""
Format Prometheus query result into the required format.
Args:
prometheus_result: Raw result from Prometheus API
metric_name: Name of the metric being queried
Returns:
List of dictionaries with 'date' and 'value' keys
"""
formatted_data = []
# Extract data from Prometheus result
data = prometheus_result.get("data", {})
result_type = data.get("resultType", "")
if result_type == "matrix":
# Handle range query results (matrix)
for series in data.get("result", []):
metric_labels = series.get("metric", {})
values = series.get("values", [])
for timestamp, value in values:
# Convert Unix timestamp to ISO format
date_str = datetime.fromtimestamp(timestamp).isoformat() + "Z"
formatted_data.append({
"date": date_str,
"value": float(value) if value != "NaN" else None,
"metric": metric_name,
"labels": metric_labels
})
elif result_type == "vector":
# Handle instant query results (vector)
for series in data.get("result", []):
metric_labels = series.get("metric", {})
timestamp = series.get("value", [None, None])[0]
value = series.get("value", [None, None])[1]
if timestamp and value:
date_str = datetime.fromtimestamp(timestamp).isoformat() + "Z"
formatted_data.append({
"date": date_str,
"value": float(value) if value != "NaN" else None,
"metric": metric_name,
"labels": metric_labels
})
# Sort by date
formatted_data.sort(key=lambda x: x["date"])
return formatted_data
async def get_metric_info(self, product_id: str, metric_name: str) -> Dict[str, Any]:
"""
Get information about a specific metric including its PromQL query.
Args:
product_id: Product ID to identify which product's metrics to query
metric_name: Name of the metric
Returns:
Dictionary containing metric information
Raises:
ValueError: If product_id or metric_name is not found in the PromQL mapping
"""
# Check if product_id exists in the mapping
if product_id not in self.METRIC_PROMQL_MAP:
available_products = ", ".join(self.get_available_products())
error_msg = f"Product '{product_id}' not found in PromQL mapping. Available products: {available_products}"
await self.module_logger.log_error(error_msg)
raise HTTPException(status_code=404, detail=error_msg)
# Check if metric name exists in the product's mapping
if metric_name not in self.METRIC_PROMQL_MAP[product_id]:
available_metrics = ", ".join(self.get_available_metrics(product_id))
error_msg = f"Metric '{metric_name}' not found in product '{product_id}' PromQL mapping. Available metrics: {available_metrics}"
await self.module_logger.log_error(error_msg)
raise HTTPException(status_code=404, detail=error_msg)
return {
"product_id": product_id,
"metric_name": metric_name,
"promql_query": self.METRIC_PROMQL_MAP[product_id][metric_name],
"description": f"PromQL query for {metric_name} metric in product {product_id}"
}

View File

@ -0,0 +1,268 @@
from typing import Dict, List, Any, Optional, Union
from datetime import datetime, timedelta, date
from fastapi import HTTPException
from common.log.module_logger import ModuleLogger
from ..infra.external_service.starrocks_client import StarRocksClient
class StarRocksMetricsService:
"""
Service class for querying StarRocks metrics with predefined SQL queries.
This service provides a high-level interface for querying metrics data
using predefined SQL queries mapped to metric names.
"""
# Global dictionary mapping metric names to their corresponding SQL queries
METRIC_SQL_MAP: Dict[str, Dict[str, str]] = {
"freeleaps": {
"daily_registered_users": """
SELECT
date_id,
product_id,
registered_cnt,
updated_at
FROM dws_daily_registered_users
WHERE date_id >= %s
AND date_id < %s
AND product_id = %s
ORDER BY date_id ASC
""",
},
"magicleaps": {
}
}
def __init__(self, starrocks_endpoint: Optional[str] = None):
"""
Initialize StarRocksMetricsService.
Args:
starrocks_endpoint: StarRocks server endpoint. If None, uses default from settings.
"""
self.module_logger = ModuleLogger(__file__)
self.starrocks_client = StarRocksClient()
def get_available_metrics(self, product_id: Optional[str] = None) -> List[str]:
"""
Get list of available metric names that have predefined SQL queries.
Args:
product_id: Optional product ID to filter metrics. If None, returns all metrics from all products.
Returns:
List of available metric names
"""
if product_id:
if product_id in self.METRIC_SQL_MAP:
return list(self.METRIC_SQL_MAP[product_id].keys())
else:
return []
else:
# Return all metrics from all products
all_metrics = []
for product_metrics in self.METRIC_SQL_MAP.values():
all_metrics.extend(product_metrics.keys())
return all_metrics
def get_available_products(self) -> List[str]:
"""
Get list of available product IDs.
Returns:
List of available product IDs
"""
return list(self.METRIC_SQL_MAP.keys())
async def query_metric_by_time_range(
self,
product_id: str,
metric_name: str,
start_date: Union[str, date],
end_date: Union[str, date]
) -> List[Dict[str, Any]]:
"""
Query metric data for a specific date range.
Args:
product_id: Product ID to identify which product's metrics to query
metric_name: Name of the metric to query
start_date: Start date for the query (ISO string or date)
end_date: End date for the query (ISO string or date)
Returns:
List of dictionaries with 'date' and 'value' keys
Raises:
ValueError: If product_id or metric_name is not found in the SQL mapping
Exception: If StarRocks query fails
Example:
result = await service.query_metric_by_time_range(
"freeleaps",
"daily_registered_users",
start_date=date.today() - timedelta(days=30),
end_date=date.today()
)
# Returns: [{"date": "2024-01-01", "value": 45, "labels": {...}}, ...]
"""
# Check if product_id exists in the mapping
if product_id not in self.METRIC_SQL_MAP:
available_products = ", ".join(self.get_available_products())
error_msg = f"Product '{product_id}' not found in SQL mapping. Available products: {available_products}"
await self.module_logger.log_error(error_msg)
raise HTTPException(status_code=404, detail=error_msg)
# Check if metric name exists in the product's mapping
if metric_name not in self.METRIC_SQL_MAP[product_id]:
available_metrics = ", ".join(self.get_available_metrics(product_id))
error_msg = f"Metric '{metric_name}' not found in product '{product_id}' SQL mapping. Available metrics: {available_metrics}"
await self.module_logger.log_error(error_msg)
raise HTTPException(status_code=404, detail=error_msg)
# Parse date strings if they are strings
if isinstance(start_date, str):
try:
start_dt = datetime.strptime(start_date, '%Y-%m-%d').date()
except ValueError:
raise HTTPException(
status_code=400,
detail="Invalid start_date format. Expected YYYY-MM-DD"
)
else:
start_dt = start_date
if isinstance(end_date, str):
try:
end_dt = datetime.strptime(end_date, '%Y-%m-%d').date()
except ValueError:
raise HTTPException(
status_code=400,
detail="Invalid end_date format. Expected YYYY-MM-DD"
)
else:
end_dt = end_date
# Validate date range
if start_dt >= end_dt:
raise HTTPException(
status_code=400,
detail="Start date must be before end date"
)
# Check date range is not too large (max 1 year)
time_diff = end_dt - start_dt
if time_diff > timedelta(days=365):
raise HTTPException(
status_code=400,
detail="Date range cannot exceed 1 year"
)
# Get the SQL query for the metric
sql_query = self.METRIC_SQL_MAP[product_id][metric_name]
try:
await self.module_logger.log_info(
f"Querying metric '{metric_name}' from product '{product_id}' from {start_dt} to {end_dt}")
# Execute the query
result = self.starrocks_client.execute_query(
query=sql_query,
params=(start_dt, end_dt, product_id)
)
# Parse the result and format it
formatted_data = self._format_query_result(result, metric_name, product_id)
await self.module_logger.log_info(
f"Successfully queried metric '{metric_name}' with {len(formatted_data)} data points")
return formatted_data
except Exception as e:
await self.module_logger.log_error(f"Failed to query metric '{metric_name}': {e}")
raise
def _format_query_result(self, starrocks_result: List[Dict[str, Any]], metric_name: str, product_id: str) -> List[Dict[str, Any]]:
"""
Format StarRocks query result into the required format.
Args:
starrocks_result: Raw result from StarRocks query
metric_name: Name of the metric being queried
product_id: Product ID for the metric
Returns:
List of dictionaries with 'date' and 'value' keys
"""
formatted_data = []
for row in starrocks_result:
# Format the date
date_value = row.get("date_id")
if date_value:
if isinstance(date_value, str):
date_str = date_value
else:
date_str = str(date_value)
else:
continue
# Get the value
value = row.get("registered_cnt", 0)
if value is None:
value = 0
# Create labels dictionary
labels = {
"product_id": row.get("product_id", product_id),
"metric_type": metric_name
}
formatted_data.append({
"date": date_str,
"value": int(value) if value is not None else 0,
"metric": metric_name,
"labels": labels
})
# Sort by date
formatted_data.sort(key=lambda x: x["date"])
return formatted_data
async def get_metric_info(self, product_id: str, metric_name: str) -> Dict[str, Any]:
"""
Get information about a specific metric including its SQL query.
Args:
product_id: Product ID to identify which product's metrics to query
metric_name: Name of the metric
Returns:
Dictionary containing metric information
Raises:
ValueError: If product_id or metric_name is not found in the SQL mapping
"""
# Check if product_id exists in the mapping
if product_id not in self.METRIC_SQL_MAP:
available_products = ", ".join(self.get_available_products())
error_msg = f"Product '{product_id}' not found in SQL mapping. Available products: {available_products}"
await self.module_logger.log_error(error_msg)
raise HTTPException(status_code=404, detail=error_msg)
# Check if metric name exists in the product's mapping
if metric_name not in self.METRIC_SQL_MAP[product_id]:
available_metrics = ", ".join(self.get_available_metrics(product_id))
error_msg = f"Metric '{metric_name}' not found in product '{product_id}' SQL mapping. Available metrics: {available_metrics}"
await self.module_logger.log_error(error_msg)
raise HTTPException(status_code=404, detail=error_msg)
return {
"product_id": product_id,
"metric_name": metric_name,
"sql_query": self.METRIC_SQL_MAP[product_id][metric_name].strip(),
"description": f"{metric_name} count from StarRocks table dws_{metric_name}"
}

View File

View File

View File

@ -0,0 +1,35 @@
from pydantic_settings import BaseSettings
from typing import Optional
class AppSettings(BaseSettings):
# Server settings
SERVER_HOST: str = "0.0.0.0"
SERVER_PORT: int = 8009
SERVICE_API_ACCESS_HOST: str = "0.0.0.0"
SERVICE_API_ACCESS_PORT: int = 8009
# Log settings
LOG_BASE_PATH: str = "./logs"
BACKEND_LOG_FILE_NAME: str = "freeleaps-metrics"
APPLICATION_ACTIVITY_LOG: str = "freeleaps-metrics-activity"
# StarRocks database settings
STARROCKS_HOST: str = "freeleaps-starrocks-cluster-fe-service.freeleaps-data-platform.svc"
STARROCKS_PORT: int = 9030
STARROCKS_USER: str = "root"
STARROCKS_PASSWORD: str = ""
STARROCKS_DATABASE: str = "freeleaps"
# Prometheus settings
PROMETHEUS_ENDPOINT: str = "http://localhost:9090"
METRICS_ENABLED: bool = False
PROBES_ENABLED: bool = True
class Config:
env_file = ".env"
app_settings = AppSettings()

View File

@ -0,0 +1,17 @@
import os
from dataclasses import dataclass
from .app_settings import app_settings
from .site_settings import site_settings
@dataclass
class LogSettings:
LOG_PATH_BASE: str = app_settings.LOG_BASE_PATH
LOG_RETENTION: str = os.environ.get("LOG_RETENTION", "30 days")
LOG_ROTATION: str = os.environ.get("LOG_ROTATION", "00:00") # midnight
MAX_BACKUP_FILES: int = int(os.environ.get("LOG_BACKUP_FILES", 5))
LOG_ROTATION_BYTES: int = int(os.environ.get("LOG_ROTATION_BYTES", 10 * 1024 * 1024)) # 10 MB
APP_NAME: str = site_settings.NAME
ENVIRONMENT: str = site_settings.ENV
log_settings = LogSettings()

View File

@ -0,0 +1,26 @@
import os
from pydantic_settings import BaseSettings
# NOTE: The values fall backs to your environment variables when not set here
class SiteSettings(BaseSettings):
NAME: str = "FREELEAPS-METRICS"
DEBUG: bool = True
ENV: str = "dev"
SERVER_HOST: str = "localhost"
SERVER_PORT: int = 9000
URL: str = "http://localhost"
TIME_ZONE: str = "UTC"
BASE_PATH: str = os.path.dirname(os.path.dirname((os.path.abspath(__file__))))
class Config:
env_file = ".devbase-webapi.env"
env_file_encoding = "utf-8"
site_settings = SiteSettings()

View File

View File

@ -0,0 +1,12 @@
from .base_logger import LoggerBase
from common.config.app_settings import app_settings
class ApplicationLogger(LoggerBase):
def __init__(self, application_activities: dict[str, any] = {}) -> None:
extra_fileds = {}
if application_activities:
extra_fileds.update(application_activities)
super().__init__(
logger_name=app_settings.APPLICATION_ACTIVITY_LOG,
extra_fileds=extra_fileds,
)

View File

@ -0,0 +1,136 @@
from loguru import logger as guru_logger
from common.config.log_settings import log_settings
from typing import Dict, Any, Optional
import socket
import json
import threading
import os
import sys
import inspect
import logging
from common.log.json_sink import JsonSink
class LoggerBase:
binded_loggers = {}
logger_lock = threading.Lock()
def __init__(self, logger_name: str, extra_fileds: dict[str, any]) -> None:
self.__logger_name = logger_name
self.extra_fileds = extra_fileds
with LoggerBase.logger_lock:
if self.__logger_name in LoggerBase.binded_loggers:
self.logger = LoggerBase.binded_loggers[self.__logger_name]
return
log_filename = f"{log_settings.LOG_PATH_BASE}/{self.__logger_name}.log"
log_level = "INFO"
rotation_bytes = int(log_settings.LOG_ROTATION_BYTES or 10 * 1024 * 1024)
guru_logger.remove()
file_sink = JsonSink(
log_file_path=log_filename,
rotation_size_bytes=rotation_bytes,
max_backup_files=log_settings.MAX_BACKUP_FILES
)
guru_logger.add(
sink=file_sink,
level=log_level,
filter=lambda record: record["extra"].get("topic") == self.__logger_name,
)
guru_logger.add(
sink=sys.stderr,
level=log_level,
format="{level} - {time:YYYY-MM-DD HH:mm:ss} - <{extra[log_file]}:{extra[log_line]}> - {extra[properties_str]} - {message}",
filter=lambda record: record["extra"].get("topic") == self.__logger_name,
)
host_name = socket.gethostname()
host_ip = socket.gethostbyname(host_name)
self.logger = guru_logger.bind(
topic=self.__logger_name,
host_ip=host_ip,
host_name=host_name,
app=log_settings.APP_NAME,
env=log_settings.ENVIRONMENT,
)
with LoggerBase.logger_lock:
LoggerBase.binded_loggers[self.__logger_name] = self.logger
def _get_log_context(self) -> dict:
frame = inspect.currentframe().f_back.f_back
filename = os.path.basename(frame.f_code.co_filename)
lineno = frame.f_lineno
return {"log_file": filename, "log_line": lineno}
def _prepare_properties(self, properties: Optional[Dict[str, Any]]) -> Dict[str, Any]:
props = {} if properties is None else properties.copy()
props_str = json.dumps(props, ensure_ascii=False) if props else "{}"
return props, props_str
async def log_event(self, sender_id: str, receiver_id: str, subject: str, event: str, properties: dict[str, any], text: str = "") -> None:
props, props_str = self._prepare_properties(properties)
context = self._get_log_context()
local_logger = self.logger.bind(sender_id=sender_id, receiver_id=receiver_id, subject=subject, event=event, properties=props, properties_str=props_str, **context)
local_logger.info(text)
async def log_exception(self, sender_id: str, receiver_id: str, subject: str, exception: Exception, text: str = "", properties: dict[str, any] = None) -> None:
props, props_str = self._prepare_properties(properties)
context = self._get_log_context()
local_logger = self.logger.bind(sender_id=sender_id, receiver_id=receiver_id, subject=subject, event="exception", properties=props, properties_str=props_str, exception=exception, **context)
local_logger.exception(text)
async def log_info(self, sender_id: str, receiver_id: str, subject: str, text: str = "", properties: dict[str, any] = None) -> None:
props, props_str = self._prepare_properties(properties)
context = self._get_log_context()
local_logger = self.logger.bind(sender_id=sender_id, receiver_id=receiver_id, subject=subject, event="information", properties=props, properties_str=props_str, **context)
local_logger.info(text)
async def log_warning(self, sender_id: str, receiver_id: str, subject: str, text: str = "", properties: dict[str, any] = None) -> None:
props, props_str = self._prepare_properties(properties)
context = self._get_log_context()
local_logger = self.logger.bind(sender_id=sender_id, receiver_id=receiver_id, subject=subject, event="warning", properties=props, properties_str=props_str, **context)
local_logger.warning(text)
async def log_error(self, sender_id: str, receiver_id: str, subject: str, text: str = "", properties: dict[str, any] = None) -> None:
props, props_str = self._prepare_properties(properties)
context = self._get_log_context()
local_logger = self.logger.bind(sender_id=sender_id, receiver_id=receiver_id, subject=subject, event="error", properties=props, properties_str=props_str, **context)
local_logger.error(text)
@staticmethod
def configure_uvicorn_logging():
print("📢 Setting up uvicorn logging interception...")
# Intercept logs from these loggers
intercept_loggers = ["uvicorn", "uvicorn.access", "uvicorn.error", "fastapi"]
class InterceptHandler(logging.Handler):
def emit(self, record):
level = (
guru_logger.level(record.levelname).name
if guru_logger.level(record.levelname, None)
else record.levelno
)
frame, depth = logging.currentframe(), 2
while frame.f_code.co_filename == logging.__file__:
frame = frame.f_back
depth += 1
guru_logger.opt(depth=depth, exception=record.exc_info).log(
level,
f"[{record.name}] {record.getMessage()}",
)
# Replace default handlers
logging.root.handlers.clear()
logging.root.setLevel(logging.INFO)
logging.root.handlers = [InterceptHandler()]
# Configure specific uvicorn loggers
for logger_name in intercept_loggers:
logging_logger = logging.getLogger(logger_name)
logging_logger.handlers.clear() # Remove default handlers
logging_logger.propagate = True # Ensure propagation through Loguru

View File

@ -0,0 +1,84 @@
import json
import datetime
import traceback
from pathlib import Path
class JsonSink:
def __init__(
self,
log_file_path: str,
rotation_size_bytes: int = 10 * 1024 * 1024,
max_backup_files: int = 5,
):
self.log_file_path = Path(log_file_path)
self.rotation_size = rotation_size_bytes
self.max_backup_files = max_backup_files
self._open_log_file()
def _open_log_file(self):
# ensure the parent directory exists
parent_dir = self.log_file_path.parent
if not parent_dir.exists():
parent_dir.mkdir(parents=True, exist_ok=True)
self.log_file = self.log_file_path.open("a", encoding="utf-8")
def _should_rotate(self) -> bool:
return self.log_file_path.exists() and self.log_file_path.stat().st_size >= self.rotation_size
def _rotate(self):
self.log_file.close()
timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
rotated_path = self.log_file_path.with_name(f"{self.log_file_path.stem}_{timestamp}{self.log_file_path.suffix}")
self.log_file_path.rename(rotated_path)
self._cleanup_old_backups()
self._open_log_file()
def _cleanup_old_backups(self):
parent = self.log_file_path.parent
stem = self.log_file_path.stem
suffix = self.log_file_path.suffix
backup_files = sorted(
parent.glob(f"{stem}_*{suffix}"),
key=lambda p: p.stat().st_mtime,
reverse=True,
)
for old_file in backup_files[self.max_backup_files:]:
try:
old_file.unlink()
except Exception as e:
print(f"Failed to delete old backup {old_file}: {e}")
def __call__(self, message):
record = message.record
if self._should_rotate():
self._rotate()
log_entry = {
"level": record["level"].name.lower(),
"timestamp": int(record["time"].timestamp() * 1000),
"text": record["message"],
"fields": record["extra"].get("properties", {}),
"context": {
"app": record["extra"].get("app"),
"env": record["extra"].get("env"),
"log_file": record["extra"].get("log_file"),
"log_line": record["extra"].get("log_line"),
"topic": record["extra"].get("topic"),
"sender_id": record["extra"].get("sender_id"),
"receiver_id": record["extra"].get("receiver_id"),
"subject": record["extra"].get("subject"),
"event": record["extra"].get("event"),
"host_ip": record["extra"].get("host_ip"),
"host_name": record["extra"].get("host_name"),
},
"stacktrace": None
}
if record["exception"]:
exc_type, exc_value, exc_tb = record["exception"]
log_entry["stacktrace"] = traceback.format_exception(exc_type, exc_value, exc_tb)
self.log_file.write(json.dumps(log_entry, ensure_ascii=False) + "\n")
self.log_file.flush()

View File

@ -0,0 +1,46 @@
from .application_logger import ApplicationLogger
class ModuleLogger(ApplicationLogger):
def __init__(self, sender_id: str) -> None:
super().__init__()
self.event_sender_id = sender_id
self.event_receiver_id = "ModuleLogger"
self.event_subject = "module"
async def log_exception(self, exception: Exception, text: str = "Exception", properties: dict[str, any] = None) -> None:
return await super().log_exception(
sender_id=self.event_sender_id,
receiver_id=self.event_receiver_id,
subject=self.event_subject,
exception=exception,
text=text,
properties=properties,
)
async def log_info(self, text: str, data: dict[str, any] = None) -> None:
return await super().log_info(
sender_id=self.event_sender_id,
receiver_id=self.event_receiver_id,
subject=self.event_subject,
text=text,
properties=data,
)
async def log_warning(self, text: str, data: dict[str, any] = None) -> None:
return await super().log_warning(
sender_id=self.event_sender_id,
receiver_id=self.event_receiver_id,
subject=self.event_subject,
text=text,
properties=data,
)
async def log_error(self, text: str, data: dict[str, any] = None) -> None:
return await super().log_error(
sender_id=self.event_sender_id,
receiver_id=self.event_receiver_id,
subject=self.event_subject,
text=text,
properties=data,
)

View File

@ -0,0 +1,140 @@
import logging
from enum import Enum
from typing import Optional, Callable, Tuple, Dict
import inspect
from datetime import datetime, timezone
# ProbeType is an Enum that defines the types of probes that can be registered.
class ProbeType(Enum):
LIVENESS = "liveness"
READINESS = "readiness"
STARTUP = "startup"
# ProbeResult is a class that represents the result of a probe check.
class ProbeResult:
def __init__(self, success: bool, message: str = "ok", data: Optional[dict] = None):
self.success = success
self.message = message
self.data = data or {}
def to_dict(self) -> dict:
return {
"success": self.success,
"message": self.message,
"data": self.data
}
# Probe is a class that represents a probe that can be registered.
class Probe:
def __init__(self, type: ProbeType, path: str, check_fn: Callable, name: Optional[str] = None):
self.type = type
self.path = path
self.check_fn = check_fn
self.name = name or f"{type.value}-{id(self)}"
async def execute(self) -> ProbeResult:
try:
result = self.check_fn()
if inspect.isawaitable(result):
result = await result
if isinstance(result, ProbeResult):
return result
elif isinstance(result, bool):
return ProbeResult(result, "ok" if result else "failed")
else:
return ProbeResult(True, "ok")
except Exception as e:
return ProbeResult(False, str(e))
# ProbeGroup is a class that represents a group of probes that can be checked together.
class ProbeGroup:
def __init__(self, path: str):
self.path = path
self.probes: Dict[str, Probe] = {}
def add_probe(self, probe: Probe):
self.probes[probe.name] = probe
async def check_all(self) -> Tuple[bool, dict]:
results = {}
all_success = True
for name, probe in self.probes.items():
result = await probe.execute()
results[name] = result.to_dict()
if not result.success:
all_success = False
return all_success, results
# FrameworkAdapter is an abstract class that defines the interface for framework-specific probe adapters.
class FrameworkAdapter:
async def handle_request(self, group: ProbeGroup):
all_success, results = await group.check_all()
status_code = 200 if all_success else 503
return {"status": "ok" if all_success else "failed", "payload": results, "timestamp": int(datetime.now(timezone.utc).timestamp())}, status_code
def register_route(self, path: str, handler: Callable):
raise NotImplementedError
# ProbeManager is a class that manages the registration of probes and their corresponding framework adapters.
class ProbeManager:
_default_paths = {
ProbeType.LIVENESS: "/_/livez",
ProbeType.READINESS: "/_/readyz",
ProbeType.STARTUP: "/_/healthz"
}
def __init__(self):
self.groups: Dict[str, ProbeGroup] = {}
self.adapters: Dict[str, FrameworkAdapter] = {}
self._startup_complete = False
def register_adapter(self, framework: str, adapter: FrameworkAdapter):
self.adapters[framework] = adapter
logging.info(f"Registered probe adapter ({adapter}) for framework: {framework}")
def register(
self,
type: ProbeType,
check_func: Optional[Callable] = None,
path: Optional[str] = None,
prefix: str = "",
name: Optional[str] = None,
frameworks: Optional[list] = None
):
path = path or self._default_paths.get(type, "/_/healthz")
if prefix:
path = f"{prefix}{path}"
if type == ProbeType.STARTUP and check_func is None:
check_func = self._default_startup_check
probe = Probe(type, path, check_func or (lambda: True), name)
if path not in self.groups:
self.groups[path] = ProbeGroup(path)
self.groups[path].add_probe(probe)
for framework in (frameworks or ["default"]):
self._register_route(framework, path)
logging.info(f"Registered {type.value} probe route ({path}) for framework: {framework}")
def _register_route(self, framework: str, path: str):
if framework not in self.adapters:
return
adapter = self.adapters[framework]
group = self.groups[path]
async def handler():
return await adapter.handle_request(group)
adapter.register_route(path, handler)
def _default_startup_check(self) -> bool:
return self._startup_complete
def mark_startup_complete(self):
self._startup_complete = True

View File

@ -0,0 +1,15 @@
from . import FrameworkAdapter
from fastapi.responses import JSONResponse
from typing import Callable
# FastAPIAdapter is a class that implements the FrameworkAdapter interface for FastAPI.
class FastAPIAdapter(FrameworkAdapter):
def __init__(self, app):
self.app = app
def register_route(self,path: str, handler: Callable):
async def wrapper():
data, status_code = await handler()
return JSONResponse(content=data, status_code=status_code)
self.app.add_api_route(path, wrapper, methods=["GET"])

299
apps/metrics/docs/design.md Normal file
View File

@ -0,0 +1,299 @@
# 1.Override
We support two ways to query metrics:
- Connect to StarRocks data warehouse and query metrics from it
- Query Prometheus directly and retrieve metrics from it
# 2.Starrocks Metric
We can implement StarRocks Metric queries similar to Prometheus Metric queries. The only difference is replacing PromQL with SQL and querying through StarRocks API.
## 2.1.Metrics Config
Currently, metrics are configured in code through the `StarRocksMetricsService.METRIC_SQL_MAP` dictionary. In the future, they will be configured through database or other methods.
Organization structure: Product ID -> Metric Name -> SQL Query
```python
METRIC_SQL_MAP: Dict[str, Dict[str, str]] = {
"freeleaps": {
"daily_registered_users": """
SELECT
date_id,
product_id,
registered_cnt,
updated_at
FROM dws_daily_registered_users
WHERE date_id >= %s
AND date_id <= %s
AND product_id = %s
ORDER BY date_id ASC
""",
},
"magicleaps": {
# Future metrics can be added here
}
}
```
## 2.2.API Design
### 2.2.1.Query Metrics by Product ID
API: `/api/metrics/starrocks/product/{product_id}/available-metrics`
Method: GET
Request:
```
product_id=freeleaps
```
Response:
```json
{
"product_id": "freeleaps",
"available_metrics": [
"daily_registered_users"
],
"total_count": 1,
"description": "List of StarRocks-backed metrics for product 'freeleaps'"
}
```
### 2.2.2.Query Metric Info
API: `/api/metrics/starrocks/product/{product_id}/metric/{metric_name}/info`
Method: GET
Request:
```
product_id=freeleaps
metric_name=daily_registered_users
```
Response:
```json
{
"metric_info": {
"product_id": "freeleaps",
"metric_name": "daily_registered_users",
"sql_query": "SELECT date_id, product_id, registered_cnt, updated_at FROM dws_daily_registered_users WHERE date_id >= %s AND date_id <= %s AND product_id = %s ORDER BY date_id ASC",
"description": "Daily registered users count from StarRocks table dws_daily_registered_users"
},
"description": "Information about StarRocks metric 'daily_registered_users' in product 'freeleaps'"
}
```
### 2.2.3.Query Metric Data
API: `/api/metrics/starrocks/metrics_query`
Method: POST
Request:
```json
{
"product_id": "freeleaps",
"metric_name": "daily_registered_users",
"start_date": "2024-09-10",
"end_date": "2024-09-20"
}
```
Response:
```json
{
"metric_name": "daily_registered_users",
"data_points": [
{
"date": "2024-09-10",
"value": 45,
"labels": {
"product_id": "freeleaps",
"metric_type": "daily_registered_users"
}
},
{
"date": "2024-09-11",
"value": 52,
"labels": {
"product_id": "freeleaps",
"metric_type": "daily_registered_users"
}
},
{
"date": "2024-09-12",
"value": 38,
"labels": {
"product_id": "freeleaps",
"metric_type": "daily_registered_users"
}
},
...
{
"date": "2024-09-19",
"value": 67,
"labels": {
"product_id": "freeleaps",
"metric_type": "daily_registered_users"
}
}
],
"total_points": 10,
"time_range": {
"start": "2024-09-10",
"end": "2024-09-19"
}
}
```
## 2.3.Technical Implementation
### 2.3.1.StarRocks Client
- Uses PyMySQL to connect to StarRocks database
- Supports parameterized queries for security
- Automatic connection management with context manager
- Error handling and logging
### 2.3.2.Data Format
- Date format: `YYYY-MM-DD`
- Values are returned as integers or floats
- Labels include product_id and metric_type for debugging
- Results are sorted by date in ascending order
### 2.3.3.Validation
- Date range validation (start_date < end_date)
- Maximum date range limit (1 year)
- Product ID and metric name validation against available mappings
- Input format validation for date strings
# 3.Prometheus Metric
## 3.1.Metrics Config
Currently, metrics are configured in code. In the future, they will be configured through database or other methods.
Organization structure: Product ID -> Metric Name -> Metric Query Method (PromQL)
```json
{
"freeleaps": {
// Just for demo
"cpu_usage": "100 - (avg by (instance) (irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
// Just for demo
"memory_usage": "100 - ((node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100)",
// Just for demo
"disk_usage": "100 - ((node_filesystem_avail_bytes{mountpoint=\"/\"} / node_filesystem_size_bytes{mountpoint=\"/\"}) * 100)",
"latency_ms": "1000*avg(freeleaps_notification_http_request_duration_seconds_sum{handler!=\"none\"} / freeleaps_notification_http_request_duration_seconds_count)",
"reliability": "1-sum(rate(freeleaps_notification_http_requests_total{status=\"5xx\"}[1m]))"
},
"magicleaps": {}
}
```
If we want to add new metrics, theoretically we only need to add one configuration entry (provided that the metric exists in Prometheus and can be queried directly through PromQL without requiring any additional code processing)
## 3.2.API Design
### 3.2.1.Query Metrics by Product ID
API: `/api/metrics/prometheus/product/{product_id}/available-metrics`
Method: GET
Request:
```
product_id=freeleaps
```
Response:
```json
{
"product_id": "freeleaps",
"available_metrics": [
"cpu_usage",
"memory_usage",
"disk_usage",
"latency_ms",
"reliability"
],
"total_count": 5,
"description": "List of metrics with predefined PromQL queries for product 'freeleaps'"
}
```
### 3.2.2.Query Metric Info
API: `/api/metrics/prometheus/product/{product_id}/metric/{metric_name}/info`
Method: GET
Request:
```
product_id=freeleaps
metric_name=cpu_usage
```
Response:
```json
{
"metric_info": {
"product_id": "freeleaps",
"metric_name": "cpu_usage",
"promql_query": "100 - (avg by (instance) (irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
"description": "PromQL query for cpu_usage metric in product freeleaps"
},
"description": "Information about metric 'cpu_usage' in product 'freeleaps'"
}
```
### 3.2.3.Query Metric Data
API: `/api/metrics/prometheus/metrics_query`
Method: GET
Request:
```
{
"product_id":"freeleaps",
"metric_name": "latency_ms",
"start_time": "2025-09-12T00:00:00Z",
"end_time": "2025-09-16T01:00:00Z",
"step":"1h" # Interval between data points in the query result
}
```
Response:
```json
{
"metric_name": "latency_ms",
"data_points": [
{
"date": "2025-09-12T08:00:00Z",
"value": 41.37141507698155,
"labels": {} # Optional: Additional labels for prometheus, Just for debugging
},
{
"date": "2025-09-12T09:00:00Z",
"value": 41.371992733188385,
"labels": {}
},
{
"date": "2025-09-12T10:00:00Z",
"value": 41.37792878125675,
"labels": {}
},
{
"date": "2025-09-12T11:00:00Z",
"value": 41.37297490632533,
"labels": {}
},
...
{
"date": "2025-09-16T08:00:00Z",
"value": 40.72491916149973,
"labels": {}
},
{
"date": "2025-09-16T09:00:00Z",
"value": 40.72186597550194,
"labels": {}
}
],
"total_points": 98,
"time_range": {
"start": "2025-09-12T00:00:00Z",
"end": "2025-09-16T01:00:00Z"
},
"step": "1h"
}
```
# 4.Universal Metrics
In the future, we can create an abstraction layer above StarRocks Metrics and Prometheus Metrics to unify metric queries from both data sources!

19
apps/metrics/local.env Normal file
View File

@ -0,0 +1,19 @@
# Local environment configuration for Metrics service
SERVER_HOST=0.0.0.0
SERVER_PORT=8009
SERVICE_API_ACCESS_PORT=8009
SERVICE_API_ACCESS_HOST=0.0.0.0
# starrocks settings
STARROCKS_HOST=freeleaps-starrocks-cluster-fe-service.freeleaps-data-platform.svc
STARROCKS_PORT=9030
STARROCKS_USER=root
STARROCKS_PASSWORD=""
STARROCKS_DATABASE=freeleaps
# log settings
LOG_BASE_PATH=./logs
BACKEND_LOG_FILE_NAME=metrics
APPLICATION_ACTIVITY_LOG=metrics-activity
PROMETHEUS_ENDPOINT=http://localhost:9090

View File

@ -0,0 +1,17 @@
fastapi==0.114.0
pydantic==2.9.2
loguru==0.7.2
uvicorn==0.23.2
beanie==1.21.0
pika==1.3.2
aio-pika
httpx
pydantic-settings
python-jose
passlib[bcrypt]
prometheus-fastapi-instrumentator==7.0.2
pytest==8.4.1
pytest-asyncio==0.21.2
pymysql==1.1.0
sqlalchemy==2.0.23
python-dotenv

View File

@ -0,0 +1 @@
# WebAPI module

View File

@ -0,0 +1,77 @@
import logging
from fastapi import FastAPI
from fastapi.openapi.utils import get_openapi
from common.config.app_settings import app_settings
from webapi.providers import exception_handler, common, probes, metrics, router
from webapi.providers.logger import register_logger
def create_app() -> FastAPI:
logging.info("App initializing")
app = FreeleapsMetricsApp()
register_logger()
register(app, exception_handler)
register(app, router)
register(app, common)
# Call the custom_openapi function to change the OpenAPI version
customize_openapi_security(app)
# Register probe APIs if enabled
if app_settings.PROBES_ENABLED:
register(app, probes)
# Register metrics APIs if enabled
if app_settings.METRICS_ENABLED:
register(app, metrics)
return app
# This function overrides the OpenAPI schema version to 3.0.0
def customize_openapi_security(app: FastAPI) -> None:
def custom_openapi():
if app.openapi_schema:
return app.openapi_schema
# Generate OpenAPI schema
openapi_schema = get_openapi(
title="FreeLeaps Metrics API",
version="3.1.0",
description="FreeLeaps Metrics API Documentation",
routes=app.routes,
)
# Ensure the components section exists in the OpenAPI schema
if "components" not in openapi_schema:
openapi_schema["components"] = {}
# Add security scheme to components
openapi_schema["components"]["securitySchemes"] = {
"bearerAuth": {"type": "http", "scheme": "bearer", "bearerFormat": "JWT"}
}
# Add security requirement globally
openapi_schema["security"] = [{"bearerAuth": []}]
app.openapi_schema = openapi_schema
return app.openapi_schema
app.openapi = custom_openapi
def register(app, provider):
logging.info(provider.__name__ + " registering")
provider.register(app)
def boot(app, provider):
logging.info(provider.__name__ + " booting")
provider.boot(app)
class FreeleapsMetricsApp(FastAPI):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)

View File

@ -0,0 +1,20 @@
from common.config.site_settings import site_settings
from fastapi.responses import RedirectResponse
import uvicorn
from webapi.bootstrap.application import create_app
app = create_app()
@app.get("/", status_code=301)
async def root():
"""
TODO: redirect client to /docs
"""
return RedirectResponse("docs")
if __name__ == "__main__":
uvicorn.run(
app="main:app", host=site_settings.SERVER_HOST, port=site_settings.SERVER_PORT
)

View File

@ -0,0 +1,31 @@
from fastapi.middleware.cors import CORSMiddleware
from common.config.site_settings import site_settings
def register(app):
app.debug = site_settings.DEBUG
app.title = site_settings.NAME
add_global_middleware(app)
# This hook ensures that a connection is opened to handle any queries
# generated by the request.
@app.on_event("startup")
async def startup():
pass
# This hook ensures that the connection is closed when we've finished
# processing the request.
@app.on_event("shutdown")
async def shutdown():
pass
def add_global_middleware(app):
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)

View File

@ -0,0 +1,39 @@
from fastapi import FastAPI, HTTPException
from fastapi.exceptions import RequestValidationError
from starlette.requests import Request
from starlette.responses import JSONResponse
from starlette.status import (
HTTP_400_BAD_REQUEST,
HTTP_401_UNAUTHORIZED,
HTTP_403_FORBIDDEN,
HTTP_404_NOT_FOUND,
HTTP_422_UNPROCESSABLE_ENTITY,
HTTP_500_INTERNAL_SERVER_ERROR,
)
async def custom_http_exception_handler(request: Request, exc: HTTPException):
return JSONResponse(
status_code=exc.status_code,
content={"error": exc.detail},
)
async def validation_exception_handler(request: Request, exc: RequestValidationError):
return JSONResponse(
status_code=HTTP_400_BAD_REQUEST,
content={"error": str(exc)},
)
async def exception_handler(request: Request, exc: Exception):
return JSONResponse(
status_code=HTTP_500_INTERNAL_SERVER_ERROR,
content={"error": str(exc)},
)
def register(app: FastAPI):
app.add_exception_handler(HTTPException, custom_http_exception_handler)
app.add_exception_handler(RequestValidationError, validation_exception_handler)
app.add_exception_handler(Exception, exception_handler)

View File

@ -0,0 +1,7 @@
from common.log.base_logger import LoggerBase
def register_logger():
print("📢 Setting up logging interception...")
LoggerBase.configure_uvicorn_logging()
print("✅ Logging interception complete. Logs are formatted and deduplicated!")

View File

@ -0,0 +1,16 @@
import logging
from prometheus_fastapi_instrumentator import Instrumentator
from common.config.app_settings import app_settings
def register(app):
instrumentator = (
Instrumentator().instrument(
app,
metric_namespace="freeleaps-mertics",
metric_subsystem=app_settings.APP_NAME)
)
@app.on_event("startup")
async def startup():
instrumentator.expose(app, endpoint="/api/_/metrics", should_gzip=True)
logging.info("Metrics endpoint exposed at /api/_/metrics")

View File

@ -0,0 +1,24 @@
from common.probes import ProbeManager, ProbeType
from common.probes.adapters import FastAPIAdapter
def register(app):
probes_manager = ProbeManager()
probes_manager.register_adapter("fastapi", FastAPIAdapter(app))
async def readiness_checker():
return {"success": True, "message": "Ready"}
probes_manager.register(
name="readiness",
prefix="/api",
type=ProbeType.READINESS,
check_func=readiness_checker,
frameworks=["fastapi"]
)
probes_manager.register(name="liveness", prefix="/api", type=ProbeType.LIVENESS, frameworks=["fastapi"])
probes_manager.register(name="startup", prefix="/api", type=ProbeType.STARTUP, frameworks=["fastapi"])
@app.on_event("startup")
async def mark_startup_complete():
probes_manager.mark_startup_complete()

View File

@ -0,0 +1,34 @@
from webapi.routes import api_router
from starlette import routing
def register(app):
app.include_router(
api_router,
prefix="/api",
tags=["api"],
dependencies=[],
responses={404: {"description": "no page found"}},
)
if app.debug:
for route in app.routes:
if not isinstance(route, routing.WebSocketRoute):
print(
{
"path": route.path,
"endpoint": route.endpoint,
"name": route.name,
"methods": route.methods,
}
)
else:
print(
{
"path": route.path,
"endpoint": route.endpoint,
"name": route.name,
"type": "web socket route",
}
)

View File

@ -0,0 +1,8 @@
from fastapi import APIRouter
from webapi.routes.starrocks_metrics import api_router as starrocks_metrics_router
from webapi.routes.prometheus_metrics import api_router as prometheus_metrics_router
api_router = APIRouter()
api_router.include_router(starrocks_metrics_router, prefix="/starrocks", tags=["starrocks-metrics"])
api_router.include_router(prometheus_metrics_router, prefix="/prometheus", tags=["prometheus-metrics"])

View File

@ -0,0 +1,9 @@
from fastapi import APIRouter
from .available_metrics import router as available_metrics_router
from .metrics_query import router as metrics_query_router
from .metric_info import router as metric_info_router
api_router = APIRouter()
api_router.include_router(available_metrics_router, tags=["prometheus-metrics"])
api_router.include_router(metrics_query_router, tags=["prometheus-metrics"])
api_router.include_router(metric_info_router, tags=["prometheus-metrics"])

View File

@ -0,0 +1,31 @@
from fastapi import APIRouter
from common.log.module_logger import ModuleLogger
from backend.services.prometheus_metrics_service import PrometheusMetricsService
router = APIRouter()
# Initialize service and logger
prometheus_service = PrometheusMetricsService()
module_logger = ModuleLogger(__file__)
@router.get("/prometheus/product/{product_id}/available-metrics")
async def get_available_metrics(product_id: str):
"""
Get list of available metrics for a specific product.
Args:
product_id: Product ID to get metrics for (required).
Returns a list of metric names that have predefined PromQL queries for the specified product.
"""
await module_logger.log_info(f"Getting available metrics list for product_id: {product_id}")
metrics = prometheus_service.get_available_metrics(product_id)
return {
"product_id": product_id,
"available_metrics": metrics,
"total_count": len(metrics),
"description": f"List of metrics with predefined PromQL queries for product '{product_id}'"
}

View File

@ -0,0 +1,32 @@
from fastapi import APIRouter, HTTPException
from common.log.module_logger import ModuleLogger
from backend.services.prometheus_metrics_service import PrometheusMetricsService
router = APIRouter()
# Initialize service and logger
prometheus_service = PrometheusMetricsService()
module_logger = ModuleLogger(__file__)
@router.get("/prometheus/product/{product_id}/metric/{metric_name}/info")
async def get_metric_info(
product_id: str,
metric_name: str
):
"""
Get information about a specific metric including its PromQL query.
Args:
product_id: Product ID to identify which product's metrics to query
metric_name: Name of the metric to get information for
"""
await module_logger.log_info(f"Getting info for metric '{metric_name}' from product '{product_id}'")
metric_info = await prometheus_service.get_metric_info(product_id, metric_name)
return {
"metric_info": metric_info,
"description": f"Information about metric '{metric_name}' in product '{product_id}'"
}

View File

@ -0,0 +1,83 @@
from fastapi import APIRouter
from typing import Optional, List, Dict, Any
from pydantic import BaseModel, Field
from common.log.module_logger import ModuleLogger
from backend.services.prometheus_metrics_service import PrometheusMetricsService
class MetricDataPoint(BaseModel):
"""Single data point in a time series."""
date: str = Field(..., description="Timestamp in ISO format")
value: Optional[float] = Field(None, description="Metric value")
labels: Optional[Dict[str, str]] = Field(None, description="Metric labels")
class MetricTimeSeriesResponse(BaseModel):
"""Response model for metric time series data."""
metric_name: str = Field(..., description="Name of the queried metric")
data_points: List[MetricDataPoint] = Field(..., description="List of data points")
total_points: int = Field(..., description="Total number of data points")
time_range: Dict[str, str] = Field(..., description="Start and end time of the query")
step: str = Field("1h", description="Query resolution step")
class MetricQueryRequest(BaseModel):
"""Request model for metric query."""
product_id: str = Field(..., description="Product ID to identify which product's metrics to query")
metric_name: str = Field(..., description="Name of the metric to query")
start_time: str = Field(..., description="Start time in ISO format or RFC3339")
end_time: str = Field(..., description="End time in ISO format or RFC3339")
step: str = Field("1h", description="Query resolution step (e.g., 1m, 5m, 1h)")
router = APIRouter()
# Initialize service and logger
prometheus_service = PrometheusMetricsService()
module_logger = ModuleLogger(__file__)
@router.post("/prometheus/metrics_query", response_model=MetricTimeSeriesResponse)
async def metrics_query(
request: MetricQueryRequest
):
"""
Query metrics time series data.
Returns XY curve data (time series) for the specified metric within the given time range.
"""
await module_logger.log_info(
f"Querying metric '{request.metric_name}' from product '{request.product_id}' from {request.start_time} to {request.end_time}")
# Query the metric data
data_points = await prometheus_service.query_metric_by_time_range(
product_id=request.product_id,
metric_name=request.metric_name,
start_time=request.start_time,
end_time=request.end_time,
step=request.step
)
# Format response
response = MetricTimeSeriesResponse(
metric_name=request.metric_name,
data_points=[
MetricDataPoint(
date=point["date"],
value=point["value"],
labels=point["labels"]
)
for point in data_points
],
total_points=len(data_points),
time_range={
"start": request.start_time,
"end": request.end_time
},
step=request.step
)
await module_logger.log_info(
f"Successfully queried metric '{request.metric_name}' with {len(data_points)} data points")
return response

View File

@ -0,0 +1,9 @@
from fastapi import APIRouter
from .metrics_query import router as metrics_query_router
from .available_metrics import router as available_metrics_router
from .metric_info import router as metric_info_router
api_router = APIRouter()
api_router.include_router(available_metrics_router, tags=["starrocks-metrics"])
api_router.include_router(metrics_query_router, tags=["starrocks-metrics"])
api_router.include_router(metric_info_router, tags=["starrocks-metrics"])

View File

@ -0,0 +1,41 @@
from fastapi import APIRouter, HTTPException
from common.log.module_logger import ModuleLogger
from backend.services.starrocks_metrics_service import StarRocksMetricsService
router = APIRouter()
# Initialize service and logger
starrocks_service = StarRocksMetricsService()
module_logger = ModuleLogger(__file__)
@router.get("/starrocks/product/{product_id}/available-metrics")
async def get_available_metrics(product_id: str):
"""
Get list of available StarRocks-backed metrics for a specific product.
Args:
product_id: Product ID to get metrics for (required).
Returns a list of metric names available via StarRocks for the specified product.
"""
await module_logger.log_info(
f"Getting StarRocks available metrics list for product_id: {product_id}"
)
metrics = starrocks_service.get_available_metrics(product_id)
if not metrics:
available_products = ", ".join(starrocks_service.get_available_products())
raise HTTPException(
status_code=404,
detail=f"Unknown product_id: {product_id}. Available products: {available_products}"
)
return {
"product_id": product_id,
"available_metrics": metrics,
"total_count": len(metrics),
"description": f"List of StarRocks-backed metrics for product '{product_id}'",
}

View File

@ -0,0 +1,34 @@
from fastapi import APIRouter, HTTPException
from common.log.module_logger import ModuleLogger
from backend.services.starrocks_metrics_service import StarRocksMetricsService
router = APIRouter()
# Initialize service and logger
starrocks_service = StarRocksMetricsService()
module_logger = ModuleLogger(__file__)
@router.get("/starrocks/product/{product_id}/metric/{metric_name}/info")
async def get_metric_info(
product_id: str,
metric_name: str
):
"""
Get information about a specific StarRocks-backed metric.
Args:
product_id: Product identifier for the product's data.
metric_name: Name of the StarRocks-backed metric.
"""
await module_logger.log_info(
f"Getting StarRocks metric info for metric '{metric_name}' from product '{product_id}'"
)
metric_info = await starrocks_service.get_metric_info(product_id, metric_name)
return {
"metric_info": metric_info,
"description": f"Information about StarRocks metric '{metric_name}' in product '{product_id}'",
}

View File

@ -0,0 +1,80 @@
from fastapi import APIRouter
from typing import Optional, List, Dict, Any, Union
from pydantic import BaseModel, Field
from datetime import date, datetime
from common.log.module_logger import ModuleLogger
from backend.services.starrocks_metrics_service import StarRocksMetricsService
class MetricDataPoint(BaseModel):
"""Single data point in metric time series."""
date: str = Field(..., description="Date in YYYY-MM-DD format")
value: Union[int, float] = Field(..., description="Metric value")
labels: Dict[str, Any] = Field(default_factory=dict, description="Metric labels")
class MetricTimeSeriesResponse(BaseModel):
"""Response model for metric time series data."""
metric_name: str = Field(..., description="Name of the queried metric")
data_points: List[MetricDataPoint] = Field(..., description="List of data points")
total_points: int = Field(..., description="Total number of data points")
time_range: Dict[str, str] = Field(..., description="Start and end date of the query")
class MetricQueryRequest(BaseModel):
"""Request model for metric query."""
product_id: str = Field(..., description="Product ID to identify which product's data to query")
metric_name: str = Field(..., description="Name of the metric to query")
start_date: str = Field(..., description="Start date in YYYY-MM-DD format")
end_date: str = Field(..., description="End date in YYYY-MM-DD format")
router = APIRouter()
# Initialize service and logger
starrocks_service = StarRocksMetricsService()
module_logger = ModuleLogger(__file__)
@router.post("/starrocks/metrics_query", response_model=MetricTimeSeriesResponse)
async def metrics_query(
request: MetricQueryRequest
):
"""
Query StarRocks metrics time series data.
Returns XY curve data (time series) for the specified metric within the given date range.
"""
await module_logger.log_info(
f"Querying metric '{request.metric_name}' from product '{request.product_id}' from {request.start_date} to {request.end_date}")
# Query the metric data
data_points = await starrocks_service.query_metric_by_time_range(
product_id=request.product_id,
metric_name=request.metric_name,
start_date=request.start_date,
end_date=request.end_date
)
# Format response
response = MetricTimeSeriesResponse(
metric_name=request.metric_name,
data_points=[
MetricDataPoint(
date=point["date"],
value=point["value"],
labels=point["labels"]
)
for point in data_points
],
total_points=len(data_points),
time_range={
"start": request.start_date,
"end": request.end_date
}
)
await module_logger.log_info(
f"Successfully queried metric '{request.metric_name}' with {len(data_points)} data points")
return response