Elastic-Introduction-to-application-performance-monitoring
Elastic-Introduction-to-application-performance-monitoring
to APM
The what, why, and how.
elastic.co
Table of Contents
End-to-end visibility 5
APM agents 7
Analysis 8
Key terms 9
Distributed tracing 9
Spans 10
Transactions 10
Traces 11
Technical capabilities 12
Ease of use 13
Deployment options 13
Security 14
Pricing 15
About Elastic 16
184
The adoption of microservices architectures effectively changed the way applications were built and
maintained, offering continuous delivery and scale that was previously unimaginable. But the distributed
and polyglot (written using different languages/frameworks) nature of these architectures adds complexity.
Continuous delivery and automation principles enabled frequent code pushes, but made the ability to
track performance impact more critical so that issues could be quickly patched or changes reverted if a
deployment didn’t go as planned.
Cloud-native development principles allow teams to continuously deliver great digital experiences to their
users, react quickly to feedback, and pivot as needed. What users don’t see are all of these gears humming
away in the background, powering their every request. So how do teams keep an eye on these applications?
As humans, we can’t process this volume of data in a meaningful way. We need an organized view with a clear
path to insight. Enter APM.
Because of its direct ties to user experience, APM helps connect IT to business goals. The relationship might
not be exactly 1:1, but it has a positive correlation. By providing continuous insight into how your applications
are performing, APM allows you to be proactive, rather than reacting to issues after an onslaught of customer
complaints via your ticketing system or even social media.
End-to-end visibility
Once an application is instrumented, the resulting data tells you exactly what’s happening inside of it. APM
tracks transactions all the way through their journey, so every request and response is recorded and measured,
no matter how complex your architecture is.
Linking all of these traces together, APM gives you the complete picture of your app’s performance — from a
bird’s-eye view of your services (and how they are interacting) all the way down to code-level insights.
Latency issues occur when a response to a request is taking longer than usual. An example is a user clicking to
see a product and the response taking more than a few seconds to load.
Errors, of course, signify when an unintended result has occurred. This occurs when the request isn’t
completed successfully.
The end goal of any investigation is to fix issues. To do this effectively, teams perform root cause analysis
to determine exactly what component is causing the problem. Can we quickly identify which service is the
performance bottleneck? Can we pinpoint the exact function call or method responsible for this issue? With
APM, the answer to both questions is “yes.”
Businesses are using APM across stages to help teams build, QA, deploy, and monitor apps more efficiently. If
organizations use APM across many teams, they can improve processes and create more efficient workflows,
leading to more time for innovation (and less time spent putting out fires).
We’ve seen the benefits; now let’s dive into how APM works.
APM agents
An APM agent is a library, plugin, or extension that monitors the performance metrics we described above.
Depending on what you need to monitor (and the language it’s written in), you may need one or multiple agents.
Once you’ve identified everything that you’d like to monitor, you’ll deploy APM agents to each of these pieces.
While agents vary per vendor, most agents instrument your code, collect performance data, and then send your
data to a server or collector.
Edge Machine
Application
APM Agent
Application
APM Agent
Application APM
APM Agent Platform
Application
APM Agent
Application
APM Agent
You’ll likely want to configure things like environment names, sampling rates, instrumentations, and metrics to
help your teams easily identify and analyze the data that’s streaming into your APM tool. Generally this is done
using the tool’s UI and API, or directly in the environment variables.
Analysis
Once the performance data has been sent to the location of your choice, you’re ready to analyze the data. Most
tools have a UI that will help you identify errors, latency issues, and other anomalies that are impacting your
users.
This is a great place to start your investigation about reported issues, or even better, catch issues before your
users are impacted by them. Identify precisely which services (down to the code level) are experiencing issues
to speed up root cause analysis and MTTR.
Distributed tracing
By tracing all of the requests, from the initial web request to your frontend service to queries made to your
backend services, distributed tracing enables you to analyze performance throughout your microservices
architecture all in one view.
Distributed tracing makes it easy to spot bottlenecks by displaying complete events by service, and then by
each request within that service. Surfacing errors and other issues in an actionable manner at the code level
makes investigations and MTTR faster.
• Start time
• Finish time
• A name
• A type
Transactions
Transactions are a type of span that describe an event and can also include multiple spans.
• A batch job
• A background job
Transactions have additional attributes associated with them, like data about the environment in which the
event is recorded:
You’ll see individual traces for complete actions like processing a payment, processing a completed order, and
updating shipping status.
When paired together with the logs and metrics from the application (and other aspects of your infrastructure),
traces provide complete visibility into your entire ecosystem.
Technical capabilities
Create a checklist of required features with your specific technical needs in mind. Be thorough and granular
with your requirements. While not an exhaustive list, here are some areas to think about:
• Does it have visual tools like waterfall charts and dependency maps to look into the
performance of distributed apps?
• Does it provide a flexible (and fast) query language to enable ad hoc investigation?
• Does it offer a flexible visualization framework that goes beyond standard vendor-
provided dashboards?
• How quickly can you instrument an application and go from zero to insight?
Deployment options
It is important that the chosen APM tool supports your software consumption preferences.
• Do you prefer a SaaS option to reduce operating and administrative costs?
• Do you want an option that can help draw down annual committed spend on your chosen cloud provider
(AWS, Azure, Google Cloud, etc.)?
• Do you need a self-managed offering because cloud is not an option for cost or compliance reasons?
• Do you have a multi-cloud or hybrid strategy and want to run your APM solution closer to your workload to
reduce data transfer costs or latencies?
If you have existing apps instrumented using an open standard like Jaeger, switching to a tool that supports it
can speed up and simplify migration. Open standards also help future-proof your investment.
• Is it built on a simple architecture? Or is there a patchwork foundation under the hood that will eventually
crack?
• Are there any limits on the volume (apps, metrics, queries, etc.) it can handle?
Security
The security review process should be a core part of your tool evaluation process. Be sure to consider the
following two angles:
1. The APM vendor’s commitment to security in how the tool is built and delivered
• Do the APM agents deployed in your applications need excessive privileges?
Pricing
Finally, it’s important to carefully weigh pricing options to make sure that your tool of choice doesn’t force you
to compromise on your visibility or monitoring goals. As for many of the other criteria, you’ll need to take both
current and future usage (and architecture) into account in the process. There is a lot of variance in how APM
solutions are priced — by number of agents, number of hosts, hardware resources, etc. — with some vendors
also imposing additional costs when you cross certain thresholds (for example, the number of containers or
metrics). Here are a few questions to ask about pricing models:
• Is the pricing model aligned with your business needs and architectural choices?
• How will the costs scale with planned growth and architectural evolution (e.g., monolith to microservices)?
• Is there a free tier? What is included in the free tier? Are there any usage limitations?
When you’re investigating an issue that’s impacting users, every second is valuable. With Elastic APM, your
performance data is saved as an index in Elasticsearch, enabling teams to search for and find bottlenecks in
real time. Elastic APM also features service maps powered by machine learning, custom alerting options, and
more so you can create better digital experiences for your users. Visit elastic.co/apm to learn more.
APM is one piece of the puzzle. Unify your logs, metrics, and APM traces in one platform for true observability
of your entire ecosystem. Visit elastic.co/observability to learn more.