Hacking your way to Observability — Part 3

A distributed tracing quick start with Jaeger and OpenTelemetry

Last time, we used Prometheus Alert Manager to configure rules that would send notifications via Slack when triggered. Even though having alerts and notifications it’s great, can metrics help you troubleshoot or explain a problem by themselves?. This is where the problem arises; metrics are good to tell you that something happened with a single instance, according to the boundaries you defined for their values, but as soon as you start working with a distributed system, metrics won’t tell you the story of a request that goes through multiple components. With the microservices boom, systems are becoming more complex, and to understand pathological behavior we need to understand the requests end to end. This is where distributed tracing helps you; it captures the activities performed in a request giving you the context missing in metrics and logs.

Tracing

A trace is a collection of spans, where each span is a record of an operation performed by a single service; they have a name, start time, duration, context, and additional metadata to bring additional information. Traces allows you to observe the journey of a request as it goes through all the services of a distributed system.

Instrumentation

You can manually instrument your application by coding the start and finish of spans in pieces of code that provide meaningful information to you. As an alternative, some frameworks offer automatic instrumentation, which saves time and reduces effort by avoiding the need to modify your codebase.

  • Zipkin: Initially developed by Twitter based on Google Daper paper.
  • AWS X-Ray: AWS Distributed Tracing System.
  • Google Cloud Trace: Distributed Tracing System for Google Cloud (Formerly Stackdriver Trace).
  • Azure Application Insights: Feature of Azure Monitor.

Jaeger

Jaeger is an open-source distributed tracing system initially developed by Uber. It is used for monitoring and troubleshooting microservices-based distributed systems.

  • Jaeger Agent: Network daemon that listens for spans sent over UDP.
  • Jaeger Collector: Receives the traces from the agents and runs them through a processing pipeline.
  • Storage: Component on which the traces are stored.
  • Jaeger Query: Service that retrieves traces from the storage and presents them on the UI.

Deploying Jaeger

There are many strategies to deploy Jaeger in Kubernetes:

  • Production: The components are deployed separately. The collector and query are configured to work with Cassandra or Elasticsearch, being Elasticsearch recommended over Cassandra.
  • Streaming: Replicates the production strategy, but it also includes the streaming capabilities of Kafka; it sits between the collector and storage to reduce the pressure on the storage under high load situations.
helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm repo update
helm install jaeger jaegertracing/jaeger-operator -n observability
Jaeger UI on port 30007

Configuring OpenTelemetry

Fortunately, configuring OpenTelemetry is a straightforward task. First, you need to choose which instrumentation you are going to use and instantiate it.

  • For automatic instrumentation: use Node SDK. Automatic instrumentation includes OpenTelemetry API, so we also have the ability to generate custom spans anytime.
kubectl port-forward service/hello-service-svc -n applications 8080curl http://localhost:8080/sayHello/iroh #On a different terminal
Hey! this is iroh, here is my message to you: It is important to draw wisdom from many different places
Jaeger Seach Form
Jaeger’s HTTP span
Jaeger’s MySQL span

Context Propagation

Before creating our custom spans, we need to answer first: How do you correlate the spans?. For the spans to be correlated, they should share some information; that information is shared through the context. The context contains information that can be passed between functions within the same process (in-process propagation) and between different processes (inter-process propagation).

Manual Instrumentation

To create your first span, you need to import the tracer from the file where we configured OpenTelemetry and call startSpanmethod. The span needs a name, and optionally you can include custom attributes and the context. In this case, we are retrieving the current context with context.active() ; if there is an active context, the span will be created within that context. After you start a span, you do some stuff with your code, and then you must end the span.

Conclusion

Distributed tracing may sound scary initially, but everything will make sense as soon as you get started. Also, the OpenTelemetry community it’s making things even easier for us, and it's improving very fast. You just saw how easy it is to combine automatic and manual instrumentation to get the best of your services tracing data.

References

Integration Consultant | Technology Enthusiast | Problem Solver | Always willing to learn something new