Full-stack Observability and Monitoring on AWS

5 min readDec 29, 2023

When running applications and offering online services, it is crucial to determine, at any given time, whether the applications and systems are running as expected. Downtimes can lead to great financial losses. Hence, it is paramount that every organization has a real time view of its operations. AWS offers observability and monitoring tools that help organizations with obtaining logs, metrics and traces that are used to quickly make decisions. I watched a presentation by Rich McDonough, a Senior World Wide Cloud Ops Specialist at AWS made at the AWS Summit SF 2022 on Full-stack observability and application monitoring that shed a lot of insight on the topic. You can find the whole presentation below.

The following are the takeaways from the presentation

Monitoring vs Observability

Monitoring are activities that make one get information from a system. It comprises of activities such as tracing and logging. Observability on the other hand describes how well we can understand a system. It determines the metrics, logs and traces that we wish to collect from a system.

Pillars of Observability

There are 3 pillars of observability and include:

Logs. Logging data should be sequence aware and logging data should be received in the correct order and should should not be able to inject data which helps us to recreate the series of events happening in our environment
Metrics. Measures taken from systems eg CPU utilization, Disk usage etc
Traces. Application traces tell what an app is doing, how long it takes and how well it is doing it.

Components to build a full stack observability solution

Building an observability solution requires tools. We can make use of AWS Cloud native tools or Open source managed services. In this blog, we will be focussing on the AWS-native services.

Data is collected by collectors(agents) such as CloudWatch agent that collects logs and metrics from applications, servers, services etc. X-ray agent collects application trace data.

The data is then fed into CloudWatch in the form of metrics, traces and logs which are then analyzed to give meaningful information to a user.

Full stack observability strategies

When evaluating your observability solution, you can choose to watch things from the Outside-in or from the Inside-out. The choice is entirely up to you and depends on your Service Level Objectives (SLO) and what matters to your business.

The main question asked when analyzing observability solutions is: How does good look like?

When using the Outside-in strategy,we ask what does good like to end users? Typical Service Level Objectives(SLOs) in this strategy would be page load time, successful purchases, JS and HTML errors, conversion rates and new customer acquisition, new feature adoption rates and search engine traffic. The SLOs are dependent on end-user behavior and what they observe on the application.

When using the Inside-out strategy, we ask what does good look like for your backend applications? Typical SLOs that are backend focused include: slow SQL queries, integration health or container restarts, High/low CPU utilization, disk usage (IOPS), API response tome, errors, faults and retries. Comprises of internal-facing signals.

You should observe what matters to your business. Your business objectives should shape your goals, objectives and approach to observability. You should determine the signals you wish to receive from your workloads and what to create alarms and notifications for. Build a full stack observability solution that reduces your mean time to resolution to the lowest possible.

The best application should combine both worlds. The solution should have both the Outside-in view and the Inside-out view. The solution should be inclusive to know what is happening in all environments and be in a position to assess all outcomes.

AWS CloudWatch

The solution will be different for every organization. We need a service to help us analyze what is happening in all environments. AWS CloudWatch comes in handy in the cloud-native environment. It is:

Highly resilient
Fault tolerant
Built for scale

AWS CloudWatch can be easily integrated with all services on AWS which makes it a crucial element in observability

Hybrid, Distributed and On-premises workloads

AWS also offers solutions for hybrid and on-premises workloads. The hybrid, distributed and on-premises workloads can consume the AWS monitoring and observability tools through the internet through the Pay As You Go model. You can connect through a VPN, the public internet or Direct connect to access the services based on the required level of privacy. The tools would work on these environments just as they would in an AWS environment

Conclusion

In conclusion, keeping a close eye on how our applications and systems perform is important for businesses, and AWS provides some handy tools for this job. Rich McDonough’s talk at the AWS Summit SF 2022 taught us that observability is about understanding what’s happening in our systems, not just getting data from them. The three key things to watch out for are logs, metrics, and traces, which help us know what’s going on and fix issues quickly. AWS CloudWatch is like a superhero in this world, resilient and scalable, making it easy for us to keep an eye on things, especially in the cloud. Whether your stuff is in the cloud, a mix of places, or even your own space, AWS has got tools to help you see what’s happening, and that’s crucial for making sure everything runs smoothly and businesses stay on track.

Full-stack Observability and Monitoring on AWS

Written by Kevin Kiruri

No responses yet