4.3 KiB
4.3 KiB
Multi-Device Grafana Dashboard Setup
This directory contains Grafana dashboards configured to monitor multiple NixOS devices: mirai, tsuba, and ryu.
Dashboard Overview
1. Multi-Device System Metrics (multi-device-system-dashboard.json)
- Purpose: Compare system metrics across all devices on a single dashboard
- Features:
- Device filtering via template variable (can select one, multiple, or all devices)
- CPU usage comparison by device
- Memory usage comparison by device
- Disk usage comparison by device
- Network I/O comparison by device
- Service status overview
- System uptime tracking
- Load average trends
2. Device-Specific System Dashboard (device-specific-system-dashboard.json)
- Purpose: Detailed system monitoring for a single selected device
- Features:
- Device selection via template variable
- Detailed CPU, memory, and disk metrics
- Network and disk I/O operations
- System status indicators
- Load average trends (1m, 5m, 15m)
- Enhanced visualizations with thresholds and color coding
3. Multi-Device Process Monitoring (multi-device-processes-dashboard.json)
- Purpose: Monitor processes across all devices
- Features:
- Device filtering for process metrics
- Process resource usage table with device column
- CPU usage trends by process and device
- Memory usage (resident and virtual) by process
- Process I/O throughput monitoring
- Process count tracking per device
4. Legacy Dashboards
system-dashboard.json: Original single-device system dashboardprocesses-dashboard.json: Original single-device process dashboard
Device Labels
Each device is configured with specific labels for better organization:
- mirai:
device=mirai, type=server, arch=x86_64 - tsuba:
device=tsuba, type=server, arch=aarch64 - ryu:
device=ryu, type=desktop, arch=x86_64
Metrics Collection
Node Exporter (Port 9100)
Collects system-level metrics:
- CPU usage and load average
- Memory utilization
- Disk usage and I/O
- Network interface statistics
- SystemD service status
- System uptime
Process Exporter (Port 9256)
Collects process-level metrics:
- CPU usage per process
- Memory usage (resident and virtual)
- Process I/O operations
- Process count and lifecycle
Template Variables
Device Selection
All multi-device dashboards include a $device template variable that allows:
- Single device selection: Monitor one specific device
- Multiple device selection: Compare selected devices
- All devices: Monitor all available devices simultaneously
The variable automatically populates with available devices based on Prometheus metrics.
Usage Tips
- Quick Overview: Start with the multi-device system dashboard to get an overview of all devices
- Detailed Analysis: Use device-specific dashboards for in-depth analysis of a particular device
- Process Monitoring: Use the multi-device process dashboard to track resource-intensive processes across devices
- Filtering: Use the device template variable to focus on specific devices of interest
- Time Ranges: Adjust time ranges based on your monitoring needs (default: last 1 hour)
Customization
Adding New Devices
To add monitoring for new devices:
- Configure Prometheus exporters on the new device
- Add scrape targets to the Prometheus configuration in
grafana.nix - Include appropriate device labels
- The dashboards will automatically detect the new device
Modifying Queries
All dashboard queries use the device=~"$device" filter to support multi-device functionality. When customizing:
- Ensure queries include the device filter
- Use
{{device}}in legend formats to distinguish between devices - Consider device-specific thresholds if needed
Troubleshooting
Device Not Appearing
- Check if Prometheus can reach the device's exporters
- Verify the device label is correctly configured
- Check Prometheus targets at
prometheus.darksailor.dev/targets
Missing Metrics
- Ensure the required exporters are running on the target device
- Check firewall rules allow access to exporter ports (9100, 9256)
- Verify network connectivity between devices
Performance
- Limit the number of devices selected simultaneously for better performance
- Adjust refresh rates if dashboards become slow
- Consider shorter time ranges for detailed analysis