feat: Add support to send OTEL traces via OTLP#4899
feat: Add support to send OTEL traces via OTLP#4899jamescrosswell wants to merge 12 commits intomainfrom
Conversation
Semver Impact of This PR⚪ None (no version bump detected) 📋 Changelog PreviewThis is how your changes will appear in the changelog. Features ✨
Dependencies ⬆️Deps
Other
🤖 This preview updates automatically when you update the PR. |
|
There was a problem hiding this comment.
This is unfortunately a bit more complex for SDK users to initialise because the TracerProviderBuilder.AddOtlpExporter method has no override with an IServiceProvider parameter (so there's no way to read the DSN from the sentry options that get added to the service registry later on).
There was a problem hiding this comment.
Without adding an explicit reference to this package, we get an error because the other packages try to load types from a different assembly version. It's the workaround for this problem basically.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4899 +/- ##
==========================================
- Coverage 73.86% 73.55% -0.31%
==========================================
Files 496 497 +1
Lines 17927 17987 +60
Branches 3511 3527 +16
==========================================
- Hits 13241 13231 -10
- Misses 3826 3882 +56
- Partials 860 874 +14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Odd that we're still seeing this after having moved to conventional commits. It doesn't look like the danger workflow is aware of our release.yml file or accounts for this in it's logic. @Flash0ver any ideas what we might be doing wrong? |
I'm not sure how to remove the Danger Changelog Missing automation off the top of my head ... I'll look into it tomorrow. |
src/Sentry/DynamicSamplingContext.cs
Outdated
There was a problem hiding this comment.
This is problematic... the alternative would be to drop the guard here:
sentry-dotnet/src/Sentry/DynamicSamplingContext.cs
Lines 34 to 38 in 6f8312d
...or to replace it with a log rather than throwing an exception. However then I think we risk sending invalid envelopes to Sentry and losing events if Sentry drops these.
I think this is the correct approach. The only code that we have which was using this was in the SentryMessageHandler implementations.
There is one small additional risk because it's also used by the GetBaggage method, which is part of the public SDK. I added some code to log a warning if that method is called when OTEL instrumentation is being used:
sentry-dotnet/src/Sentry/Internal/Hub.cs
Lines 313 to 328 in 6f8312d
And I added some info about the possible exception to the public XML docs for that method:
sentry-dotnet/src/Sentry/IHub.cs
Lines 73 to 75 in 6f8312d
It's not great, but I can't think of any better ways to handle it. @Flash0ver thoughts?
There was a problem hiding this comment.
Hmmm .... not sure if ArgumentOutOfRangeException is the correct exception here, as it's usually used when numeric values are greater than or less than expected ... maybe just ArgumentException ... but I'm not really answering the question, aren't I?!
Another thought that I'm having is that we're quite inconsistent with Argument validation:
At some locations we throw, e.g. ArgumentNullException, at other places we don't throw but only log via the DiagnosticLogger if enabled. But again, no solution to the problem.
Perhaps we could solve this issue with an Analyzer.
Throw at run-time, and warn at build-time for invocations of this method when Sentry.OpenTelemetry is referenced. This may cause some false positives in cases where Sentry.OpenTelemetry is referenced, but actually not initialized and the Instrumenter is not Instrumenter.OpenTelemetry at runtime. But it would be a bit of a "best effort" solution. At least there shouldn't be false negatives ... unless I am missing something.
(thought: it seems that every time I'm out of ideas, I suggest some Compiler Extension to fix it 😉)
There was a problem hiding this comment.
Follow-up thought:
I suppose the best solution would be to make this method actually not invocable when the is Instrumenter.OpenTelemetry. Well, if the option is not set at runtime, we can't just remove this method at build time.
Just for some brain-storming: Maybe we could re-direct the invocation - via an Interceptor - to another method if Sentry.OpenTelemetry is referenced ... but this solution has the same issue as the Analyzer from above: at build-time, we only know whether Sentry.OpenTelemetry is referenced, not if OpenTelemetry is actually initialized .. well, we could scan for the respective AddSentry/UseSentry invocation, but what if there are more than one Hubs ... we can't really draw the connection there unambiguously.
There was a problem hiding this comment.
we're quite inconsistent with Argument validation:
At some locations we throw, e.g.ArgumentNullException, at other places we don't throw but only log via theDiagnosticLoggerif enabled. But again, no solution to the problem.
I don't really want to throw an exception here... it's not by design. However:
- I can't change the signature for the public API (it has a public
GetBaggagemethod that returns a non-nullable type) - The only other option would be to suppress the error and return an empty
SentryIdrepresenting theTraceId... but then we risk Sentry dropping events.
Maybe, if the GetBaggage() method is called when OTEL tracing is enabled, we log a warning and simply return an emply baggage header? That way we avoid the possibility of our code erroring out (we never call this method if OTEL tracing is enabled... we can but an assert in there to ensure this) but we do make it really clear the method shouldn't be used with OTEL tracing?
Edit: I've gone ahead and done this... also added an Assert to the underlying methods in the DSC that throw the exception, to ensure we never call these in our own code when OTEL tracing is enabled.
There was a problem hiding this comment.
Hmmm .... not sure if
ArgumentOutOfRangeExceptionis the correct exception here, as it's usually used when numeric values are greater than or less than expected ... maybe justArgumentException... but I'm not really answering the question, aren't I?!
We could change that... I think the traceId is essentially a GUID and you get an out of range if this is empty (i.e. all zeros)... if you squint, a guid is kind of a number but you have to squint 😜 Not really related to this PR however, since that code was added ages ago. We can create a new issue to address this if you think it's worth doing.
|
Dang... Warning Needs a bump to the Otel package dependencies, including some (minor) breaking changes. |
| } | ||
|
|
||
| /// <inheritdoc /> | ||
| public override PropagationContext Extract<T>(PropagationContext context, T carrier, Func<T, string, IEnumerable<string>> getter) |
There was a problem hiding this comment.
@Flash0ver turns out I had to bring the bump to 1.10.0 forward in order to address this... and it looks like the bump introduces a breaking change - they changed the nullability on some of the methods in a base class that we're overriding. Unfortunately this is on a class that is also marked as public in the Sentry SDK (although in practise I think people are only going to be using this in order to pass their own propagator in to be wrapped... so unlikely that the change to the API contract will be more than a binary breaking change).
Still... bit of a pain.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
|
|
||
| public SentryId TraceId => Activity.Current?.TraceId.AsSentryId() ?? default; | ||
| public SpanId SpanId => Activity.Current?.SpanId.AsSentrySpanId() ?? default; | ||
| public SpanId? ParentSpanId => Activity.Current?.ParentSpanId.AsSentrySpanId(); |
There was a problem hiding this comment.
Root activity ParentSpanId returns zero instead of null
Medium Severity
OtelPropagationContext.ParentSpanId returns a non-null zero SpanId ("0000000000000000") for root activities instead of null. When Activity.Current is a root activity (no parent), Activity.ParentSpanId returns default(ActivitySpanId), which AsSentrySpanId() converts to a valid SpanId. The ?. operator only produces null when Activity.Current itself is null. This causes events captured during root OTEL activities to include a bogus parent_span_id in their trace context, inconsistent with SentryPropagationContext which correctly uses null for root spans.


Resolves: #4859