Part of the series: Fun with OpenTelemetry .NET
You’ve hooked up OpenTelemetry .NET into your application and configured an exporter (Jaeger, Zipkin, OpenTelemetryProtocol, etc.) but you’re not seeing any of your data anywhere. You looked around for an exception but there’s nothing.
Is this hitting close to home? Don’t worry, it’s not you. This common situation is, surprisingly, very difficult to work through.
Why is that? The OpenTelemetry Specification is very specific and clear about how implementations should approach error handling:
OpenTelemetry implementations MUST NOT throw unhandled exceptions at run time.
For that reason, most errors (including those thrown during export) are swallowed up silently. This design is great for our process when it is doing real work, but it makes troubleshooting difficult (if not impossible) when things aren’t working as expected.
There is some good news, though, there are actually error messages and we can get at them!
Where do the errors go?
Each OpenTelemetry library (SDK, API, instrumentation, exporter, etc.) has an EventSource that it will write to. Here’s where the ZipkinExporter catches and logs an exception thrown during export:
Great! How do we capture the events?
Built-in Mechanism
The main OpenTelemetry library has a built-in mechanism for capturing and writing the internal events to disk. Details can be found in the OpenTelemetry SDK guide.
To turn it on you have to drop a special JSON configuration file next to your exe. Based on your configuration, OpenTelemetry .NET will write a .log file with all the events while capturing is turned on. You can even toggle it on or off while the application is running.
The problem I ran into recently was I couldn’t access the file system where my application was running. What can we do in this case?
Write internal OpenTelemetry events to logs
My application is writing JSON to stdout which is being captured and pumped into ELK automatically. If only I could direct the internal OpenTelemetry .NET events into the logging, I could use Kibana to query it like any other error in my application!
Introducing: Macross.OpenTelemetry.Extensions
Right now this library has one extension which will listen to the internal event sources and write them to an ILogger
. However you have your logging configured, files, stdout, console, whatever, the OpenTelemetry events will join them.
Easy enough to turn on, add this in your ConfigureServices
logic:
if (_Configuration.GetValue<bool>("LogOpenTelemetryEvents")) { services.AddOpenTelemetryEventLogging(); }
You will get output that looks something like this (it depends on your logging setup):
{ "TimestampUtc": "2021-01-27T06:35:57.1453202Z", "ThreadId": 7, "LogLevel": "Error", "GroupName": "OpenTelemetry", "CategoryName": "Macross.OpenTelemetry.OpenTelemetryEventListener", "Content": "Failed to export activities: '{0}'", "EventId": 1, "EventName": "FailedExport", "Payload": [ "System.Net.Http.HttpRequestException: No connection could be made because the target machine actively refused it. (localhost:9411)\r\n ---> System.Net.Sockets.SocketException (10061): No connection could be made because the target machine actively refused it.\r\n at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)\r\n at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)\r\n at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|283_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)\r\n at System.Net.Http.HttpConnectionPool.DefaultConnectAsync(SocketsHttpConnectionContext context, CancellationToken cancellationToken)\r\n at System.Net.Http.ConnectHelper.ConnectAsync(Func`3 callback, DnsEndPoint endPoint, HttpRequestMessage requestMessage, CancellationToken cancellationToken)\r\n --- End of inner exception stack trace ---\r\n at System.Net.Http.ConnectHelper.ConnectAsync(Func`3 callback, DnsEndPoint endPoint, HttpRequestMessage requestMessage, CancellationToken cancellationToken)\r\n at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\r\n at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\r\n at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\r\n at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)\r\n at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\r\n at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)\r\n at System.Net.Http.HttpClient.SendAsyncCore(HttpRequestMessage request, HttpCompletionOption completionOption, Boolean async, Boolean emitTelemetryStartStop, CancellationToken cancellationToken)\r\n at OpenTelemetry.Exporter.Zipkin.ZipkinExporter.Export(Batch`1& batch)" ] }
The extension exposes some options to configure the event sources you want to listen to, the levels to capture, and you can change the format of the message if you are so inclined.
In that example I am turning on the event capturing using a flag in my settings, but that is totally optional. You could leave it on for all warnings, or whatever.
Notes:
- Right now the extension doesn’t support hot-reloading. Meaning, you have to bounce the application to turn it on or off. If anyone needs that drop a comment on here or on GitHub and I’ll add it.
- If you are trying to debug an issue exporting using the JaegerExporter watch out, there be dragons. Writing this blog I tried to trap an error using Jaeger and it wasn’t throwing. I think this is because the UDP socket just sends off the packets to nowhere? They might be stuck in a buffer which would eventually throw, I’m not sure. It’s on my TODO list to look at.
Excellent! “hot-reloading” is a very common ask, as restarting the app may destroy the active repro of some scenarios.
(Out-of-proc tools like Perfview can be on/off without affecting the target app.)