CloudWatch
Last updated
Last updated
CloudWatch acts as a repository that stores metrics within AWS. A metric is a variable that can change over time. These can come from AWS resources that integrate with CloudWatch, or you can supply custom data for monitoring.
An example of an AWS service that publishes metrics to CloudWatch is EC2. CloudWatch receives metrics such as average CPU utilization (as a percentage) and network traffic received (measured in bytes). View the CloudWatch documentation for a complete list of services that publish metrics to CloudWatch.
With your metrics stored in CloudWatch, AWS can calculate statistics to give you a better insight into the performance of your resources. This information is displayed in customizable dashboards in the console (we'll cover more about this, and alarms, later in the lab). You can also configure alarms to be triggered when the metric meets specific criteria. You can set these alarms to alert your team via SNS or to take automated actions within AWS.
Alongside the metric functionality provided by CloudWatch, CloudWatch Logs also allows you to store logs from various services. Once the logs are stored in CloudWatch, they can be viewed directly in the AWS console and analyzed using the metric functionality. For example, you can set up a metric to look for the number of logs containing a specific response and trigger an alarm when a set number of these logs are received within a particular time period.
Dashboards: These pages in the AWS console can be customized to show different widgets. AWS generates some default dashboards for your whole account and individual services, but you can also create your own. There are many types of widgets that include graphs of metric values over time, alarm statuses, a simple number widget that displays the latest value of a metric, and text widgets that can be written in markdown. You can also alter the Dashboard scope to only display information about certain AWS services or regions, and set up centralized dashboards that look at data from multiple accounts.
Metrics: Metrics are how CloudWatch handles data. A metric is a collection of time-ordered data points. The data point value is the information you’re interested in, what percent CPU utilization an EC2 instance is at, or how many objects were uploaded to your S3 bucket. The metric then shows what these values have been at different times. They can be displayed in a graph to see at what times the values peaked or troughed.
Alarms: As metrics change over time, you may want to be notified if their values pass a specific threshold value over a certain time period. This is what alarms can do. You give them a threshold value and a metric to monitor, and if the metric value exceeds the threshold, the alarm state changes. Actions can then be configured for when the state of an alarm changes. These actions could involve sending the events to EventBridge, which can automatically message via SNS or perform auto-remediation via Lambda.
Log groups and log streams: CloudWatch allows you to store logs from different AWS services in one place where they can be monitored and analyzed. This includes logs from CloudTrail and bespoke logs from other services such as EC2 and Route53. Logs from the same source are stored as a log stream. Log streams with the same configuration settings (retention settings, access controls, etc.) can be bundled into a log group to streamline management.
Aggregation: Keeping all the data and logs from across your application in one place will allow you to diagnose problems more quickly. You should take care when giving permissions to CloudWatch, as blanket access may provide users with access to unauthorized information.
Automation: CloudWatch allows you to react to security and operational events automatically. This can help stop security incidents in their tracks rather than progressing until a team's office hours start or somebody notices something.
Visibility: Being able to query and monitor logs and data from multiple sources will greatly improve your ability to dig deeper into issues with your application and determine the true root causes.