Kibana Dashboards for ft_transcendence: Visualizing Project Data
Kibana Dashboards for ft_transcendence: Visualizing Project Data
Why Dashboards Matter
Dashboards are crucial for transforming raw log data into actionable insights. For our ft_transcendence project, well-designed Kibana dashboards help us:
- Monitor System Health: Quickly spot issues with any service
- Track User Experience: Understand how users interact with the application
- Identify Security Concerns: Detect unusual patterns or potential threats
- Optimize Performance: Pinpoint bottlenecks and inefficiencies
Dashboard Structure for ft_transcendence
We've organized our dashboards into three main categories:
1. Operations Dashboards
- System Overview: High-level health of all services
- Service-Specific: Detailed metrics for each service
- Error Tracking: Aggregated view of all errors
2. User Experience Dashboards
- API Performance: Response times and error rates
- User Sessions: Login patterns and session durations
- Feature Usage: Which parts of the app users engage with most
3. Security Dashboards
- Authentication Events: Logins, logouts, and failures
- Access Patterns: Unusual activity detection
- Error Clustering: Finding patterns in security-related errors
Creating Our Main Dashboard
Here's how we built our System Overview dashboard:
Step 1: Index Pattern Configuration
First, we created index patterns to access our data:
- In Kibana, go to Stack Management > Index Patterns
- Create patterns for each service:
- django-*
- nginx-*
- nextjs-*
- postgresql-*
- redis-*
Step 2: Building Visualizations
We created these key visualizations:
Service Health Status
{
"aggs": {
"services": {
"terms": {
"field": "service.keyword",
"size": 10
},
"aggs": {
"errors": {
"filter": {
"bool": {
"should": [
{ "term": { "log_level.keyword": "ERROR" } },
{ "term": { "log_level.keyword": "CRITICAL" } },
{ "range": { "http_status": { "gte": 500 } } }
]
}
}
},
"health": {
"bucket_script": {
"buckets_path": {
"errors": "errors._count"
},
"script": "params.errors > 0 ? 'error' : 'healthy'"
}
}
}
}
}
}
Request Volume Over Time
{
"aggs": {
"requests_over_time": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "30s"
},
"aggs": {
"by_service": {
"terms": {
"field": "service.keyword",
"size": 5
}
}
}
}
}
}
Error Distribution Pie Chart
{
"aggs": {
"error_types": {
"terms": {
"field": "error_type.keyword",
"size": 10
}
}
},
"query": {
"bool": {
"must": [{ "term": { "log_level.keyword": "ERROR" } }]
}
}
}
Step 3: Assembling the Dashboard
We arranged visualizations in a logical flow:
- Top Section: Status indicators showing at-a-glance health
- Middle Section: Time-series graphs showing activity trends
- Bottom Section: Detailed tables for drilling down into specific issues
Service-Specific Dashboards
Django Dashboard Example
For our Django application, we focused on:
- View Performance: Response times by view function
- Database Queries: Query execution times and counts
- Error Traceback: Full error details with context
Key visualization for slow database queries:
{
"aggs": {
"over_time": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "1m"
},
"aggs": {
"slow_queries": {
"filter": {
"bool": {
"must": [
{ "exists": { "field": "query_time" } },
{ "range": { "query_time": { "gte": 100 } } }
]
}
}
}
}
}
}
}
Nginx Dashboard Example
For Nginx monitoring, we focus on:
- Request Rate: Requests per second
- Status Codes: Distribution of HTTP response codes
- Response Times: Latency percentiles
- Top URLs: Most frequently accessed endpoints
- Client IPs: Source of requests by geography
User Activity Dashboard
This dashboard helps us understand user behavior:
- Session Flow Visualization: Shows how users navigate through the app
- User Retention: How often users return to the app
- Feature Adoption: Which features are used most
Example visualization for session duration:
{
"aggs": {
"users": {
"terms": {
"field": "user_id.keyword",
"size": 100
},
"aggs": {
"session_start": {
"min": {
"field": "@timestamp"
}
},
"session_end": {
"max": {
"field": "@timestamp"
}
},
"session_duration": {
"bucket_script": {
"buckets_path": {
"start": "session_start",
"end": "session_end"
},
"script": "(params.end - params.start) / 60000"
}
}
}
}
}
}
Security Monitoring Dashboard
Our security dashboard focuses on:
- Authentication Events: Success/failure tracking
- Geographic Anomalies: Unusual access locations
- Rate Limiting Violations: Potential brute force attacks
- Permission Denials: Unauthorized access attempts
Example visualization for failed login attempts:
{
"aggs": {
"over_time": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "5m"
},
"aggs": {
"by_ip": {
"terms": {
"field": "client_ip.keyword",
"size": 10
}
}
}
}
},
"query": {
"bool": {
"must": [{ "term": { "event.keyword": "login_failed" } }]
}
}
}
Setting Up Alerts
We've configured alerts to notify us of critical issues:
Error Rate Alert
{
"trigger": {
"schedule": {
"interval": "5m"
}
},
"input": {
"search": {
"indices": ["*"],
"body": {
"query": {
"bool": {
"must": [
{ "range": { "@timestamp": { "gte": "now-5m" } } },
{ "terms": { "log_level.keyword": ["ERROR", "CRITICAL"] } }
]
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 10
}
}
},
"actions": {
"notify_team": {
"webhook": {
"url": "https://hooks.slack.com/services/YOUR_WEBHOOK",
"body": "Error rate exceeded threshold: {{ctx.payload.hits.total}} errors in the last 5 minutes"
}
}
}
}
Performance Considerations
Since we're running in a limited environment, we've optimized our dashboards:
- Time Range Control: Default to shorter time ranges (last 15 minutes)
- Reduced Refresh Rate: Default to manual refresh or 1-minute intervals
- Aggregation Over Raw Data: Use aggregations instead of raw document tables
- Limited Cardinality: Avoid visualizations with high cardinality fields
Sharing Dashboards
To make dashboards accessible to team members:
- Export/Import: Share dashboard JSON definitions
- Saved Object Management: Export and import through Kibana UI
- Version Control: Store dashboard definitions in Git
Dashboard Templates
We've created templates for quick setup of new dashboards:
{
"attributes": {
"title": "Service Dashboard Template",
"hits": 0,
"description": "Template for monitoring any service",
"panelsJSON": "[{\"id\":\"request-rate\",\"type\":\"visualization\",\"panelIndex\":1,\"gridData\":{\"x\":0,\"y\":0,\"w\":24,\"h\":8},\"version\":\"7.10.0\"},{\"id\":\"error-rate\",\"type\":\"visualization\",\"panelIndex\":2,\"gridData\":{\"x\":24,\"y\":0,\"w\":24,\"h\":8},\"version\":\"7.10.0\"},{\"id\":\"response-time\",\"type\":\"visualization\",\"panelIndex\":3,\"gridData\":{\"x\":0,\"y\":8,\"w\":48,\"h\":8},\"version\":\"7.10.0\"}]"
}
}
Conclusion
Well-designed Kibana dashboards transform our raw log data into valuable insights for ft_transcendence. By taking the time to create effective visualizations and dashboards, we gain:
- Better understanding of system behavior
- Faster troubleshooting of issues
- Deeper insights into user activity
- Improved security monitoring
In our resource-constrained environment, these dashboards help us maximize the value of our observability data without requiring excessive resources.