How to create log-based alerts for monitoring persistent disk utilization in Google Cloud

ManjuMJ
Staff

log-based-alerts-blog.png

Problem statement: Monitoring persistent disk utilization using standard log metrics

Monitoring persistent disk utilization for compute instances in Google Cloud can be challenging using standard log metrics, especially when multiple disks are added to a compute instance.

It’s important to have a solution that can monitor disk space and schedule alerts based on log metrics so you can track the performance of your systems and quickly identify potential problems. 

How to solve this problem?

Google Cloud compute instances are by default, associated with the boot disk as a mandatory attachment. Users may choose to attach additional disks as required for running their application workloads and to store the application configuration.

To measure the disk utilization for root and additional disk volumes, you can use the df utility on a periodic basis to monitor and write the stats to custom logs using the gcloud logging API. Then, custom logs can be streamed to Google Cloud Monitoring, from which alerts can be scheduled.

Key components of this solution:

  • Google Cloud gcloud logging API
  • Custom monitoring service script for checking disk utilization
  • Log-based alerting policy in Google Cloud

Google Cloud gcloud logging API

Custom logging can be achieved using the gcloud logging API, which enables you to configure alerting policies and real-time log streaming to Google Cloud Monitoring.

The gcloud logging API has the ability to read and write the log entry to/from Google Cloud Monitoring. Below are some samples involving gcloud logging API functionalities.

 

#To create a log entry in a given log, run:
 gcloud logging write LOG_NAME "A simple entry"

#To create a high severity log entry, run:
 gcloud logging write LOG_NAME "Urgent message" --severity=ALERT

 

Custom monitoring service script

The custom monitoring service script collects disk utilization metrics on a periodic basis and writes the same to Cloud Monitoring. The service is designed to accept two parameters, such as, warning limit and error limit. Based on the limits configured, disk utilization metrics will be written to custom logs, such as disk_mon_alert_logs and disk_mon_warning_logs

Log-based alerting policy in Google Cloud

Log-based alerting is a type of alerting that uses logs to detect and notify you of events that meet certain criteria. In Google Cloud, log-based alerting is provided by Cloud Monitoring.

Once log-based alerting is configured in Cloud Monitoring, Google Cloud will start monitoring the log stream for events that meet the condition you specified. If an event is detected, Google Cloud will send you an alert notification.

Steps to implement log-based alerts for monitoring persistent disk utilization in Google Cloud

Follow the steps below to implement log-based alerts for monitoring persistent disk utilization. 

Step 1: Prepare the monitoring script

Below is the sample monitoring script, which performs df utility execution every five minutes. Only the root and additional volumes are considered for extracting the final output from df utility execution.

Utilization value for each volume is compared against the configured warning and error limit values. Based on configured limits, logs will be written accordingly to disk_mon_alert_logs and disk_mon_warning_logs using gcloud logging write API.

Log into the GCE (Google Compute Engine) VM instance as the root user. Create a file with the below details.

Filename: diskUtilizationScript.sh
Path: /root/ 

 

#!/bin/bash

#=============================================
#Fetching the warning limit & alert limit info 
#=============================================

echo "Warning limit is set to: [$1]%"
echo "Error limit is set to: [$2]%"
warningLimit="$1"
errorLimit="$2"

#=============================================
#Fetching the project & hostname information
#=============================================

project_id=`gcloud config list --format='text(core.project)' | sed "s/^.*: //g"`
host_name=`hostname`
echo "ProjectId is: ${project_id}"
echo "HostName is: ${host_name}"

while true
do
        df -H | egrep -v "boot|tmpfs|:|Filesystem" | awk '{print $3"\t"$5"\t"$6}' > fileinp.txt
        readarray -t my_array < fileinp.txt

        for line in "${my_array[@]}"; do
                read diskused pcntused mountpoint<<< ${line}
                echo "$diskused--$pcntused -- $mountpoint"
                compValue=`echo ${pcntused%?}`
                if [[ $compValue > $1 ]]; then
                        if [[ $compValue > $2 ]]; then
                                JSON_STRING='{"Diskname":"'"$mountpoint"'","Diskspaceused":"'"$diskused"'","Usedpcnt":"'"$pcntused"'","Machine":"'"$host_name"'","Remarks":"'"Threshould is : $errorLimit% and current utilization is: $compValue%"'"}'
                                gcloud logging write "disk_mon_alert_logs" "${JSON_STRING}" --payload-type=json --severity=ALERT
                                echo "gcloud alert logging done"
                        else
                                JSON_STRING='{"Diskname":"'"$mountpoint"'","Diskspaceused":"'"$diskused"'","Usedpcnt":"'"$pcntused"'","Machine":"'"$host_name"'","Remarks":"'"Threshould is : $warningLimit% and current utilization is: $compValue%"'"}'
                                gcloud logging write "disk_mon_warning_logs" "${JSON_STRING}" --payload-type=json --severity=WARNING
                                echo "gcloud warning logging done"
                        fi
                fi
                #Flushing variables for loop
                mountpoint=""
                diskused=""
                pcntused=""
                echo "Execution Completed"
        done

 

Step 2: Configure the script as system service

Monitoring script diskUtilizationScript.sh can be scheduled as system service. Below are steps to be followed for configuring the script as service. Service will run automatically after system restart as well.

  1. Keep the script (diskUtilizationScript.sh) in your root directory.
  2. Give execution permission on script. 
     chmod +x /root/diskUtilizationScript.sh​
  3. Make a service for running the script. Just create a file in the following directory. You can give any name but it must end with .service extension. 
     sudo vim /etc/systemd/system/fs-monitor.service​
  4. Paste the below content into the above file. You can modify the warning limit (61) and error (75) limit numbers accordingly.
    [Unit]
    Description=FileSystem monitoring service Documentation=https://cloud.google.com/logging/docs/agent/ops-agent 
    
    [Service]
    Type=simple
    User=root
    Group=root
    TimeoutStartSec=0
    Restart=on-failure
    RestartSec=30s
    #ExecStartPre=
    ExecStart=/root/diskUtilizationScript.sh 61 75 >> /dev/null 2>&1
    SyslogIdentifier=Diskutilization
    #ExecStop=
    
    [Install]
    WantedBy=multi-user.target​
  5. Save the file and start the service using the command.
    sudo systemctl start fs-monitor.service​
  6. Check the status of the service using below command.
    sudo systemctl status fs-monitor.service​
  7. Use the below commands to stop/restart the service as required.
    sudo systemctl stop fs-monitor.service
    sudo systemctl restart fs-monitor.service​
  8. Use the below commands to check for service-related debug information as required.
     grep -is "Diskutilization" /var/log/daemon.log
     grep -is "Diskutilization" /var/log/syslog
     grep -is "Diskutilization" /var/log/messages​

Step 3: Configure notification channels

To configure notification channels for your alerts, in the Google Cloud Console, navigate to Cloud Monitoring → Notification channels. You can choose from available notifications or create a new channel.

Configure notification channels in Google CloudConfigure notification channels in Google Cloud

For our use case, we'll create two new notification channels - "Email" and "SMS," for warning and error limit notifications. 

  1. To create an Email notification channel, click on "ADD NEW" and proceed with the required values.
    ManjuMJ_0-1690840052012.png
  2. To create an SMS notification channel, click on "ADD NEW" and proceed with the required values. 
    ManjuMJ_1-1690840310459.png

Step 4: Configure the log-based alerting policy

Navigate to the Google Cloud Console → Cloud Logging. Follow the below steps to configure the new log-based alerting policies.

  1. Click on "Stream logs"
    stream-logs.png

  2. Click "Log name" and choose the log name, disk_mon_warning_logs and click "Apply" 
    log-name-disk-mon-warning.png

  3. Click "Create alert." 
    create-alert.png

  4. Fill in the required information and click "Save" to continue. 

    Provide the alert policy name and email content. 

    alert-details.png

    Select the log file name for the alert.

    log-file-name-alert.png

    Select the notification frequency and incident closure duration. 

    ManjuMJ_0-1690845054424.png

    Select the notification channel for alerting. 

    ManjuMJ_1-1690845075151.png

  5.  Repeat the previous steps to configure the alert policy on the disk_mon_alert_log file. Go to the Google Cloud Console → Cloud Logging. Click on "Log name" and choose the log name as disk_mon_alert_logs and click "Apply."
    stream-logs-2.png

  6. Click "Create alert" and fill in the required information. 

    alert-details.png

Once all the above steps are completed, custom log-based alerting will be enabled in your Google Compute Engine VM instance. 

What is the outcome? 

After configuring this alert, if the disk utilization for any volume exceeds 61% (and below 75%), then a warning alert will be triggered via the Email notification channel.

If the disk utilization for any volume exceeds 75%, then an error alert will be triggered via the SMS notification channel.

This solution enables Google Cloud users to configure custom alerting on disk space utilization, even on the additional disks attached to compute instances. You can easily customize limits and opt-in to alerts via various notification channels based on criticality. 

Google Cloud log-based alerting is a powerful tool that can help you to detect and respond to events that impact your Google Cloud infrastructure. By creating well-configured log-based alerts, you can improve the reliability and security of your applications.


Have questions? Please leave a comment below and someone from the Google Cloud team or Community will be happy to help.

2 Comments