Overview
The monitoring system collects comprehensive system metrics using the gopsutil library. Metrics are cached for 2 seconds to reduce overhead.
Type Definition
type SystemStats struct {
Hostname string `json:"hostname"`
Platform string `json:"platform"`
OS string `json:"os"`
KernelVersion string `json:"kernel_version"`
CPUInfo []CPUInfo `json:"cpu_info"`
MemoryInfo MemoryInfo `json:"memory_info"`
DiskTotal uint64 `json:"disk_total"`
DiskUsed uint64 `json:"disk_used"`
DiskFree uint64 `json:"disk_free"`
RuntimeStats RuntimeStats `json:"runtime_stats"`
ProcessStats ProcessInfo `json:"process_stats"`
StartTime time.Time `json:"start_time"`
UptimeSecs int64 `json:"uptime_secs"`
HasTempData bool `json:"has_temp_data"`
NetworkInterfaces []NetworkInterface `json:"network_interfaces"`
NetworkConnections int `json:"network_connections"`
NetworkBytesSent uint64 `json:"network_bytes_sent"`
NetworkBytesRecv uint64 `json:"network_bytes_recv"`
}
Location: core/monitoring/stats.go:18
Collection Functions
CollectSystemStats
func CollectSystemStats(ctx context.Context, startTime time.Time) (*SystemStats, error)
Gathers all system statistics with context support and caching.
Context for cancellation and timeouts
Application start time (for uptime calculation)
Returns:
*SystemStats - Complete system metrics
error - Aggregated errors from individual collectors (non-fatal)
Location: core/monitoring/stats.go:48
Caching: Results are cached for 2 seconds (StatsRefreshInterval)
Example:
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
stats, err := monitoring.CollectSystemStats(ctx, serverStartTime)
if err != nil {
// Non-fatal - some metrics may be unavailable
log.Printf("Warning: %v", err)
}
fmt.Printf("CPU Usage: %.1f%%\n", stats.CPUInfo[0].Usage)
fmt.Printf("Memory Used: %.1fGB\n", float64(stats.MemoryInfo.Used)/1e9)
CollectSystemStatsWithoutContext
func CollectSystemStatsWithoutContext(startTime time.Time) (*SystemStats, error)
Convenience wrapper that uses context.Background().
Location: core/monitoring/stats.go:158
CPU Metrics
CPUInfo
type CPUInfo struct {
ModelName string `json:"model_name"`
Cores int32 `json:"cores"`
Frequency float64 `json:"frequency_mhz"`
Usage float64 `json:"usage"`
Temperature float64 `json:"temperature"`
}
Location: core/monitoring/cpu.go:12
CollectCPUInfoWithContext
func CollectCPUInfoWithContext(ctx context.Context) ([]CPUInfo, error)
Gathers CPU information including usage and temperature.
Location: core/monitoring/cpu.go:21
Example:
cpuInfo, err := monitoring.CollectCPUInfoWithContext(ctx)
for i, cpu := range cpuInfo {
fmt.Printf("CPU %d: %s (%.1f MHz)\n", i, cpu.ModelName, cpu.Frequency)
fmt.Printf(" Usage: %.1f%%\n", cpu.Usage)
if cpu.Temperature > 0 {
fmt.Printf(" Temp: %.1f°C\n", cpu.Temperature)
}
}
Temperature Detection
func IsCPUTemp(sensor string) bool
Identifies CPU temperature sensors by checking for common names:
coretemp (Intel)
k10temp (AMD)
cpu_thermal, cpu-thermal
cpu temperature
Location: core/monitoring/cpu.go:91
Memory Metrics
MemoryInfo
type MemoryInfo struct {
Total uint64 `json:"total"`
Used uint64 `json:"used"`
Free uint64 `json:"free"`
UsedPercent float64 `json:"used_percent"`
SwapTotal uint64 `json:"swap_total"`
SwapUsed uint64 `json:"swap_used"`
SwapPercent float64 `json:"swap_percent"`
}
Location: core/monitoring/memory.go:9
CollectMemoryInfoWithContext
func CollectMemoryInfoWithContext(ctx context.Context) (MemoryInfo, error)
Gathers memory and swap information.
Location: core/monitoring/memory.go:21
Example:
memInfo, err := monitoring.CollectMemoryInfoWithContext(ctx)
fmt.Printf("Memory: %.1fGB / %.1fGB (%.1f%%)\n",
float64(memInfo.Used)/1e9,
float64(memInfo.Total)/1e9,
memInfo.UsedPercent,
)
Network Metrics
NetworkInterface
type NetworkInterface struct {
Name string `json:"name"`
IPAddress string `json:"ip_address"`
BytesSent uint64 `json:"bytes_sent"`
BytesRecv uint64 `json:"bytes_recv"`
PacketsSent uint64 `json:"packets_sent"`
PacketsRecv uint64 `json:"packets_recv"`
}
Location: core/monitoring/network.go:11
NetworkStats
type NetworkStats struct {
Interfaces []NetworkInterface `json:"interfaces"`
ConnectionCount int `json:"connection_count"`
TotalBytesSent uint64 `json:"total_bytes_sent"`
TotalBytesRecv uint64 `json:"total_bytes_recv"`
}
Location: core/monitoring/network.go:20
CollectNetworkInfoWithContext
func CollectNetworkInfoWithContext(ctx context.Context) (NetworkStats, error)
Gathers network interface statistics and connection counts.
Features:
- Skips loopback interfaces
- Skips interfaces without addresses
- Extracts first non-local IPv4 address
- Counts all connection types
Location: core/monitoring/network.go:29
Example:
netStats, err := monitoring.CollectNetworkInfoWithContext(ctx)
for _, iface := range netStats.Interfaces {
fmt.Printf("%s (%s)\n", iface.Name, iface.IPAddress)
fmt.Printf(" Sent: %s\n", formatBytes(iface.BytesSent))
fmt.Printf(" Recv: %s\n", formatBytes(iface.BytesRecv))
}
fmt.Printf("Active connections: %d\n", netStats.ConnectionCount)
Runtime Metrics
RuntimeStats
type RuntimeStats struct {
GoVersion string `json:"go_version"`
NumGoroutine int `json:"num_goroutine"`
NumCPU int `json:"num_cpu"`
GOMAXPROCS int `json:"gomaxprocs"`
MemStats MemStats `json:"mem_stats"`
}
type MemStats struct {
Alloc uint64 `json:"alloc"`
TotalAlloc uint64 `json:"total_alloc"`
Sys uint64 `json:"sys"`
NumGC uint32 `json:"num_gc"`
PauseTotalNs uint64 `json:"pause_total_ns"`
HeapAlloc uint64 `json:"heap_alloc"`
HeapInuse uint64 `json:"heap_inuse"`
}
Location: core/monitoring/runtime.go:12
CollectRuntimeStats
func CollectRuntimeStats() RuntimeStats
Gathers Go runtime statistics (goroutines, memory, GC).
Location: core/monitoring/runtime.go:36
Example:
runtime := monitoring.CollectRuntimeStats()
fmt.Printf("Go Version: %s\n", runtime.GoVersion)
fmt.Printf("Goroutines: %d\n", runtime.NumGoroutine)
fmt.Printf("Heap Alloc: %.1fMB\n", float64(runtime.MemStats.HeapAlloc)/1e6)
fmt.Printf("GC Runs: %d\n", runtime.MemStats.NumGC)
Complete Examples
Health Check Endpoint
func healthHandler(e *core.RequestEvent) error {
ctx, cancel := context.WithTimeout(e.Request.Context(), 3*time.Second)
defer cancel()
stats, err := monitoring.CollectSystemStats(ctx, serverStartTime)
if err != nil {
// Some metrics may be unavailable, but continue
log.Printf("Metrics collection warning: %v", err)
}
health := map[string]interface{}{
"status": "healthy",
"uptime": stats.UptimeSecs,
"hostname": stats.Hostname,
"cpu_usage": func() float64 {
if len(stats.CPUInfo) > 0 {
return stats.CPUInfo[0].Usage
}
return 0
}(),
"memory_percent": stats.MemoryInfo.UsedPercent,
"goroutines": stats.RuntimeStats.NumGoroutine,
}
return e.JSON(200, health)
}
Metrics Dashboard
func metricsHandler(e *core.RequestEvent) error {
ctx, cancel := context.WithTimeout(e.Request.Context(), 5*time.Second)
defer cancel()
stats, err := monitoring.CollectSystemStats(ctx, serverStartTime)
if err != nil {
return e.JSON(500, map[string]string{
"error": err.Error(),
})
}
// Format for dashboard
dashboard := map[string]interface{}{
"system": map[string]interface{}{
"hostname": stats.Hostname,
"platform": stats.Platform,
"os": stats.OS,
"kernel_version": stats.KernelVersion,
"uptime_hours": stats.UptimeSecs / 3600,
},
"cpu": map[string]interface{}{
"model": stats.CPUInfo[0].ModelName,
"cores": stats.CPUInfo[0].Cores,
"frequency": stats.CPUInfo[0].Frequency,
"usage": stats.CPUInfo[0].Usage,
"temperature": stats.CPUInfo[0].Temperature,
},
"memory": map[string]interface{}{
"total_gb": float64(stats.MemoryInfo.Total) / 1e9,
"used_gb": float64(stats.MemoryInfo.Used) / 1e9,
"free_gb": float64(stats.MemoryInfo.Free) / 1e9,
"used_percent": stats.MemoryInfo.UsedPercent,
},
"disk": map[string]interface{}{
"total_gb": float64(stats.DiskTotal) / 1e9,
"used_gb": float64(stats.DiskUsed) / 1e9,
"free_gb": float64(stats.DiskFree) / 1e9,
},
"network": map[string]interface{}{
"interfaces": stats.NetworkInterfaces,
"connections": stats.NetworkConnections,
"total_sent_mb": float64(stats.NetworkBytesSent) / 1e6,
"total_recv_mb": float64(stats.NetworkBytesRecv) / 1e6,
},
"runtime": map[string]interface{}{
"go_version": stats.RuntimeStats.GoVersion,
"goroutines": stats.RuntimeStats.NumGoroutine,
"heap_alloc_mb": float64(stats.RuntimeStats.MemStats.HeapAlloc) / 1e6,
"gc_runs": stats.RuntimeStats.MemStats.NumGC,
},
}
return e.JSON(200, dashboard)
}
Alerting System
func monitorSystemHealth(ctx context.Context) {
ticker := time.NewTicker(1 * time.Minute)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
stats, err := monitoring.CollectSystemStats(ctx, serverStartTime)
if err != nil {
log.Printf("Failed to collect metrics: %v", err)
continue
}
// Check CPU
if len(stats.CPUInfo) > 0 && stats.CPUInfo[0].Usage > 80 {
sendAlert("High CPU usage: %.1f%%", stats.CPUInfo[0].Usage)
}
// Check memory
if stats.MemoryInfo.UsedPercent > 90 {
sendAlert("High memory usage: %.1f%%", stats.MemoryInfo.UsedPercent)
}
// Check disk
diskUsedPercent := float64(stats.DiskUsed) / float64(stats.DiskTotal) * 100
if diskUsedPercent > 85 {
sendAlert("High disk usage: %.1f%%", diskUsedPercent)
}
// Check goroutines
if stats.RuntimeStats.NumGoroutine > 10000 {
sendAlert("High goroutine count: %d", stats.RuntimeStats.NumGoroutine)
}
}
}
}
Caching
const StatsRefreshInterval = 2 * time.Second
var collector = &statsCollector{
lastCollected: time.Time{},
cachedStats: nil,
}
Behavior:
- Stats are cached for 2 seconds
- Thread-safe with RWMutex
- Double-check locking pattern for efficiency
Location: core/monitoring/stats.go:14
Error Handling
CollectSystemStats returns a multi-error if individual collectors fail:
err = errors.Join(multiError...)
This allows partial results even if some metrics are unavailable (e.g., temperature on systems without sensors).
Best Practices
- Use Contexts: Always pass contexts for timeout control
- Cache Results: Don’t collect metrics on every request - use the built-in cache or add your own
- Handle Partial Failures: Some metrics may be unavailable on certain systems
- Monitor Goroutines: High goroutine counts indicate leaks
- Set Thresholds: Define reasonable alert thresholds for your workload
- Format Units: Convert bytes to GB/MB for human readability
- Trend Analysis: Track metrics over time, not just current values