-
Notifications
You must be signed in to change notification settings - Fork 15
Remove Stats datastructure in favour of OTel #249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
525b8b3
to
1141f1c
Compare
1141f1c
to
c6f2f06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the changes looks good, thank you!
Albeit OTel metrics assertions are quite verbose and repetitive, it would be nice to sugar it up a bit to reduce the tests code. I would recommend to look into expanding docappendertest.AssertOTelMetrics
helper to accept a metrics assertion map.
docappendertest.AssertOTelMetricsWithMap(t, rm.ScopeMetrics[0].Metrics, map[string]int64{
"elasticsearch.bulk_requests.count": 1,
"elasticsearch.indexer.created": 2,
"elasticsearch.indexer.destroyed": 3,
})
With such helper most if not all of the tests could be simmered down a bit.
b450780
to
89c79b2
Compare
89c79b2
to
e650c89
Compare
e2de81c
to
f48cdad
Compare
f48cdad
to
bd8e118
Compare
const N = 10 | ||
for i := 0; i < N; i++ { | ||
addMinimalDoc(t, indexer, "logs-foo-testing") | ||
} | ||
<-time.After(1 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Reviewers: I have created this ticket to improve this wait using a channel: #253
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a bunch of smaller questions in the first review.
@@ -274,7 +268,7 @@ func TestAppenderRetry(t *testing.T) { | |||
|
|||
indexer, err := docappender.New(client, docappender.Config{ | |||
FlushInterval: time.Minute, | |||
FlushBytes: 750, // this is enough to flush after 9 documents | |||
FlushBytes: 800, // this is enough to flush after 9 documents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this value change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to #249 (comment), I'm not sure how the doc size is calculated, but increasing this allows the tests to pass. cc @kruskall since he added this test I believe. If it stays at 750, it triggers a flush on 9 docs on my Mac, but 8 in CI, causing the metrics below to be incorrect.
assert.True(t, exist) | ||
switch status.AsString() { | ||
case "Success": | ||
assert.Equal(t, int64(8), dp.Value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this number from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Q. I should maybe add some comments, since this took me some time to figure out. I initially tested this by checking the actual values produced by Stats
. Then I made sure the number makes sense. Basically, the test sends 10 docs. 3 of them failing, hence the first Success
above is 7. Then a retry happens on TooMany
that should succeed, which leads to a Success
of 8
@@ -92,7 +92,7 @@ func TestAppender(t *testing.T) { | |||
|
|||
rdr := sdkmetric.NewManualReader(sdkmetric.WithTemporalitySelector( | |||
func(ik sdkmetric.InstrumentKind) metricdata.Temporality { | |||
return metricdata.DeltaTemporality | |||
return metricdata.CumulativeTemporality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this change to Cumulative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in #249 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, a lot of the tests check the metrics before and after some event/condition. This originated from using Stats
in which one reads absolute values (Cumulative
in OTEL) and then do a comparison. However, we have an opportunity here to leverage DeltaTemporality
and simplifying the test structures. But this will require some deeper thought and will really blow up this PR's diff. So I was thinking of creating a new issue to address this is a separate PR in the next iter once this one has been merged. cc @raultorrecilla
Closes #246
Remove the
Stats
datastructure and leverage the current OTel implementation.Updated Unit Tests