Imagine, you are in charge of maintaining and operating an application. This one must deal with external services such as databases, remote APIs and so on.
The expected Service Level Objectives are mostly reached and everything is hunky-dory. Unfortunately, some errors are sometimes raised by your customers. You have recurrent flaws, lost transactions, error messages which are tricky to debug. You therefore try to deep dive into logs provided by several systems and platforms.
After struggling and going caving in your application logs and beyond (i.e. the whole ecosystem) you decide to invest in a log aggregator.
Everything should be fine right now shouldn’t it? Log messages should be aggregated and error messages should be captured in order to produce alerts.
Yes it should.
However, if you use a log aggregator for responding to all observability matters you will be challenged by unintended consequences. First, you would need to have clean logs and correctly implemented in your application. If your application is highly requested, you will be concerned about volumetry and retention. (i.e. by default, logs are written to files then aggregated by an aggregator). You should eventually manage all of these on your applications while correlating your transactions transiting within your platforms. At the end of the day, you would discover that the necessary source code instrumentation is complex, fragile, and difficult to maintain.
If you want to remind you few logging basics - sometimes it worths ;-), you can check this article out from Nicolas Carlier. How could we do that, in a proper way, especially in distributed applications? Google proposed in 2010 Dapper: a distributed system tracing architecture.
Based on that, the Cloud Native Computing Foundation (CNCF) has proposed a solution to this matter through OpenTracing and OpenTelemetry. I will expose how to implement them in a Spring Cloud based application.
Distributed Tracing: some concepts
- Span: The primary building block of a distributed trace. It contains references to other spans which allow to visualize a complete trace across several systems.
- Trace: Traces in OpenTracing are defined implicitly by their Spans. In particular, a Trace can be seen as a directed acyclic graph (DAG) of Spans.
- Tag:Tags are
key:valuepairs that enable user-defined annotation of spans in order to query, filter, and comprehend trace data.
You can have more details in the specification.
OpenTracing & OpenTelemetry
Both of these two incubating projects are sponsored by the Cloud Native Computing Foundation (CNCF).
OpenTracing provides APIs and instrumentation for distributed tracing and OpenTelemetry provides an observability framework for several languages and platforms.
One of the main advantages of these are they are cloud oriented and agnostic regarding the platforms and languages.
In addition to these features, these tools are interesting because they are not directly based on a log aggregation system and data are sent by default asynchronously through UDP.
Another point is the data gathering and processing. To get them and monitor your distributed platform, you then need a compatible dashboard solution.
With this tool, you will be able to search and analyse distributed transactions through all the gathered traces.
Implementing it in a Spring Cloud application
You can add this dependency in your project definition (either
Here is the gradle configuration:
You can then configure the Jaeger URL in your
# Default values opentracing: jaeger: udp-sender: host: localhost port: 6831 enabled: true
By default, all the transaction’s data (spans, traces, tags) are gathered and sent to Jaeger. You can configure this behaviour by setting these properties up.
Finally, you can explore your data using Jaeger:
To Sum up
We saw the basics and how to configure a distributed observability mechanism. Obviously, this feature doesn’t replace a log aggregator, but it is complementary. It could be helpful to analyse your transactions in a non-intrusive manner in your code.