feat: support APM Server intake API version 2 by watson · Pull Request #465 · elastic/apm-agent-nodejs

watson · 2018-07-23T09:03:52Z

Closes #356

Checklist

Implement code
Add tests
Fix TAV tests

watson · 2018-07-23T09:09:30Z

This PR should technically be ready for testing with the POC. The checklist above is meant for making it production ready

roncohen · 2018-07-23T14:21:10Z

would be great to get some benchmarks to see if this implementation is better than the current intake protocol, for example from a memory usage perspective.

Qard · 2018-07-23T23:19:12Z

lib/agent.js

+        agent._apmServer.flush(cb)
+      })
+    } else {
+      process.nextTick(cb)


Can you log something here so we can tell when someone tries to send something without _apmServer set?

Qard · 2018-07-23T23:19:56Z

lib/agent.js

+  if (this._apmServer) {
+    this._apmServer.flush(cb)
+  } else {
+    process.nextTick(cb)


Again, log something so we know when _apmServer is not set.

Qard · 2018-07-23T23:58:33Z

lib/instrumentation/transaction.js

  })

+  Object.defineProperty(this, 'timestamp', {
+    get: function () {


Could use shorthand here. get () {

Qard · 2018-07-24T00:00:42Z

lib/instrumentation/transaction.js


  if (this.ended) {
-    this._agent.logger.debug('transaction already ended - cannot build new span %o', {id: this.id})
+    this._agent.logger.debug('transaction already ended - cannot build new span %o', {id: this.id}) // TODO: Should this be supported in the new API?


When everything is fully streaming, I don't think there's any reason to prevent further spans. It's only useful without streaming to ensure the transaction doesn't stay open forever.

The reason why I'm hesitant with this is that we might have a transaction that kicks off some form of background job that gets contextually attached to the transaction that started it. Any spans that's created as a result of this would get associated with the transaction. This will make the span waterfall almost useless as it can suddenly cover a period of days or weeks. Am I being too paranoid?

That'd be user mode queuing though. We won't actually form that link unless we explicitly bind the callback. The logical continuation to the next job would be out-of-band from the request, it'd be triggered by the completion of another job in the queue.

I could see maybe time-based jobs naively using setTimeout for everything, and that maybe being an issue. But that's just not great design and we probably should be drawing attention to it in some way. 🤔

Yeah setTimeout and friends are probably the most common way to trigger this.

Question is however if the spans happening after the transaction ends are relevant to the user at all?

@elastic/apm-agent-devs @roncohen @makwarth In the new intake API, we get the ability to record spans that start after the parent transaction ends. Is this something we want to do or should those just be ignored? I.e. are they adding any value to the user or are they just noise?

I.e. are they adding any value to the user or are they just noise?

I think that depends on how the application is making use of that. I'd rather default to not ignore than to ignore these spans.

If we don't ignore them by default, would it be a good idea to have a config option to ignore them?

It could even have 3 settings:

Ignore spans started after the transaction ends

Ignore spans started before the transaction ends, but ended after

Allow all spans

i understand you're worried about the case where people would set up a recurring periodic job from inside a transaction, because it would mean spans continue to happen in an endless stream. Two ideas:

We could experiment in making the assumption that there are two categories of delayed execution process.nextTick()/setImmediate() and setTimeout()/setInterval(). The first one could indicate that the developer intended this to be part of the transaction, while the second category will start a new transaction.

We could set a numerical limit to the number of spans that we will consider, after a transaction has ended - or a wall clock timeout which would be the limit to how much time we'll continue to "monitor" a transaction after is has ended.

We could also include a special escape hatch that specifically calls to setTimeout or setInterval after the transaction ends would not continue the context. That way, anything else in the code path will continue, but anything that could potentially be significantly delayed will not.

watson · 2018-07-24T09:56:05Z

To make the scope of this PR as small as possible, I've made it go to a different branch than master. This way we can merge it when we're ok with the current scope of it and add new v2 features in new PR's to the same branch. Then when we're ready to land support for v2 in master, we make a PR that merges the api-v2 branch into master

watson · 2018-07-24T10:02:21Z

To help this PR getting merge, I've created the meta issue #471 and moved all the non-essential todo items from this PR to that

Closes elastic#356

watson · 2018-07-31T14:07:54Z

jenkins, run tav tests please

Qard · 2018-07-31T21:20:38Z

lib/agent.js

+    this._apmServer.flush(cb)
+  } else {
+    this.logger.warn(new Error('cannot flush agent before it is started'))
+    process.nextTick(cb)


Would it be worthwhile for the flush callback be able to receive that error?

I left it out as this only occurs if the agent isn't started. So from the outside I want the user to use the module in the same way no matter if the agent is started or not. And normally errors coming from the _apmServer is just logged, so I think it's best to maintain that behavior here as well.

Actually I just realized that the callback given to captureError will pass a similar error along in the callback: https://github.com/elastic/apm-agent-nodejs/pull/465/files#diff-7403da6ce2244c65d641fdfcc095be93R331

But I think maybe that should be avoided in captureError. If so the captureError callback will never be called with an error either.

Qard · 2018-07-31T21:21:27Z

lib/config.js

  active: true,
  logLevel: 'info',
-  hostname: os.hostname(),
+  hostname: os.hostname(), // TODO: Should we just let the http client default to this?


Defaulting in the http client sounds reasonable to me.

Qard · 2018-07-31T21:35:58Z

lib/instrumentation/index.js

+    if (!payload) return agent.logger.debug('transaction ignored by filter %o', {id: transaction.id})
+    truncate.transaction(payload)
+    agent.logger.debug('sending transaction %o', {id: transaction.id})
+    if (agent._apmServer) agent._apmServer.sendTransaction(payload)


Is there any way for agent._apmServer to be null when this._started is true? 🤔

No you're right - if this._started === true, then we'll also have an agent._apmServer. Would that be a better check? The current check is guaranteed to always work, whereas the conditions of the other might change in the future right?

I just made that comment, because this check is already in an outer if block checking this._started, so it seemed redundant. Github cuts off the preview just after that line though. 😅

Oh yes... I'll remove it

Qard · 2018-07-31T22:18:33Z

lib/request.js

-      architecture: process.arch,
-      platform: process.platform
-    }
-  }


It seems like process, system and runtime stuff no longer exists? I don't see it in the http client PR either. Is that correct?

Yeah, I can see it's a bit confusing given the current state of the all the in-progress dependencies.

As you can see here, this PR depends on a custom branch of the http client called v2-1 that currently lives on my fork: https://github.com/elastic/apm-agent-nodejs/pull/465/files#diff-b9cfc7f2cdf78a7f4b91a753d10865a2R74

In that branch the metadata responsibilities have been moved to the http client: https://github.com/watson/apm-nodejs-http-client/blob/4f02d7eda173b738a309fdca2e3464a1bbae5652/index.js#L293-L332

But there's no PR for this branch yet as I don't want to make the current http client PR too complicated, so once the the current PR is merged, I'll open a new one for the v2-1 branch 😅

Just merged the first http client PR and created the next based on my v2-1 branch: elastic/apm-nodejs-http-client#6 - soon it should hopefully be less confusing

Ah. Gotcha. I was trying to match up the two PRs to make sure the functionality lined up between the two. 😅

Qard · 2018-07-31T22:22:34Z

test/_apm_server.js

-          client.request('transactions', {}, body, () => {})
+      req.pipe(zlib.createGunzip()).pipe(ndjson.parse()).on('data', function (data) {
+        if (req.method !== 'POST') throw new Error(`Unexpected HTTP method: ${req.method}`)
+        if (req.url !== '/v2/intake') throw new Error(`Unexpected HTTP url: ${req.url}`)


You could do the req field validation outside the pipe sequence and data handler. Also, the assert module might be cleaner here.

Qard · 2018-07-31T23:25:40Z

test/agent.js

-        t.equal(body.errors[0].exception.message, 'with callback')
+      .on('data-error', function (data) {
+        t.equal(data.exception.message, 'with callback')
+        t.end()


Does tape work correctly when you explicitly t.end() while using a t.plan(...)?

Yes, all it does when reaching the t.end is just fail if the number of assertions doesn't match what's planned. This is a way to validate our expectations of the async execution order. I.e. when we reach this point, we should have made the planned number of assertions.

Ah, cool. 👍

Qard · 2018-07-31T23:37:08Z

test/instrumentation/_agent.js

    agent._instrumentation = sharedInstrumentation
    agent._instrumentation.currentTransaction = null
-    agent._instrumentation._queue._clear()
+    agent._apmServer = mockClient(expected, cb || noop) // TODO: Expected will not work here


Is this comment still accurate? What's the issue?

Hmm I think you're right. I think this line is completely irrelevant and can just be removed. I'll update it

Qard · 2018-08-01T01:02:12Z

test/instrumentation/span.js

-    t.ok(payload.start > 0)
-    t.ok(payload.duration > 0)
-    assert.stacktrace(t, 'myTest3', __filename, payload.stacktrace, agent)
+    t.deepEqual(Object.keys(payload), ['transactionId', 'timestamp', 'name', 'type', 'start', 'duration'])


You don't think we need to validate the contents of the object here?

Qard · 2018-08-01T01:18:46Z

test/integration/no-sampling.js

-        var data = JSON.parse(Buffer.concat(buffers))
-        t.equal(data.transactions.length, 20, 'expect 20 transactions to be sent')
        t.end()
+        process.exit()


Does this process.exit() need to be here? I'd rather not have these hiding in the tests as they could cause confusion when adding more tests to the file in the future, if you don't notice them.

Qard · 2018-08-01T01:21:15Z

Finally managed to get a complete review it. Looks good, other than a couple more comments.

Qard · 2018-08-02T17:32:22Z

Approved for v2 branch. Any remaining issues can be fixed in another PR, before merging v2 to master.

Closes #356

Closes elastic#356

Closes #356

Closes elastic#356

Closes #356

Closes elastic#356

Closes #356

Closes elastic#356

Closes #356

Closes elastic#356

Closes #356

Closes elastic#356

watson added in progress breaking change intake api [zube]: In Progress labels Jul 23, 2018

watson self-assigned this Jul 23, 2018

watson requested a review from Qard July 23, 2018 09:03

watson mentioned this pull request Jul 23, 2018

Intake Protocol V2 POC #356

Closed

Qard reviewed Jul 23, 2018

View reviewed changes

Qard reviewed Jul 24, 2018

View reviewed changes

watson mentioned this pull request Jul 24, 2018

Add support for intake API v2 #471

Closed

21 tasks

watson force-pushed the v2-api branch from 8423307 to 785eb08 Compare July 31, 2018 13:51

feat: support APM Server intake API version 2

4427eda

Closes elastic#356

watson force-pushed the v2-api branch from 785eb08 to 4427eda Compare July 31, 2018 13:59

Qard reviewed Jul 31, 2018

View reviewed changes

Qard reviewed Aug 1, 2018

View reviewed changes

fix: address PR review comments

fee45aa

Qard approved these changes Aug 2, 2018

View reviewed changes

watson merged commit af1dd76 into elastic:api-v2 Aug 2, 2018

zube bot added [zube]: Done and removed [zube]: In Review labels Aug 2, 2018

watson deleted the v2-api branch August 2, 2018 18:09

watson added a commit that referenced this pull request Aug 2, 2018

feat: support APM Server intake API version 2 (#465)

810966c

Closes #356

watson added a commit that referenced this pull request Aug 6, 2018

feat: support APM Server intake API version 2 (#465)

d9a0b3d

Closes #356

watson added a commit that referenced this pull request Aug 9, 2018

feat: support APM Server intake API version 2 (#465)

af5f08f

Closes #356

watson added a commit that referenced this pull request Aug 10, 2018

feat: support APM Server intake API version 2 (#465)

f82afc3

Closes #356

watson added a commit that referenced this pull request Aug 29, 2018

feat: support APM Server intake API version 2 (#465)

cdd53a4

Closes #356

Qard pushed a commit to Qard/apm-agent-nodejs that referenced this pull request Aug 29, 2018

feat: support APM Server intake API version 2 (elastic#465)

e9cc3e1

Closes elastic#356

Qard pushed a commit to Qard/apm-agent-nodejs that referenced this pull request Aug 30, 2018

feat: support APM Server intake API version 2 (elastic#465)

9b06c86

Closes elastic#356

watson added a commit that referenced this pull request Aug 30, 2018

feat: support APM Server intake API version 2 (#465)

7e0c8bf

Closes #356

Qard pushed a commit to Qard/apm-agent-nodejs that referenced this pull request Aug 30, 2018

feat: support APM Server intake API version 2 (elastic#465)

af89cb4

Closes elastic#356

watson added a commit that referenced this pull request Sep 6, 2018

feat: support APM Server intake API version 2 (#465)

7d356ad

Closes #356

watson added a commit that referenced this pull request Sep 13, 2018

feat: support APM Server intake API version 2 (#465)

7575859

Closes #356

Qard pushed a commit to Qard/apm-agent-nodejs that referenced this pull request Sep 13, 2018

feat: support APM Server intake API version 2 (elastic#465)

6097731

Closes elastic#356

Qard pushed a commit to Qard/apm-agent-nodejs that referenced this pull request Sep 13, 2018

feat: support APM Server intake API version 2 (elastic#465)

91d1bd5

Closes elastic#356

watson added a commit that referenced this pull request Oct 19, 2018

feat: support APM Server intake API version 2 (#465)

bf190a3

Closes #356

watson added a commit to watson/apm-agent-nodejs that referenced this pull request Oct 27, 2018

feat: support APM Server intake API version 2 (elastic#465)

44187cb

Closes elastic#356

Qard pushed a commit that referenced this pull request Oct 31, 2018

feat: support APM Server intake API version 2 (#465)

eec21f3

Closes #356

watson added a commit to watson/apm-agent-nodejs that referenced this pull request Nov 6, 2018

feat: support APM Server intake API version 2 (elastic#465)

23ff370

Closes elastic#356

watson added a commit that referenced this pull request Nov 6, 2018

feat: support APM Server intake API version 2 (#465)

b26891d

Closes #356

Qard pushed a commit that referenced this pull request Nov 8, 2018

feat: support APM Server intake API version 2 (#465)

a629cc0

Closes #356

watson added a commit that referenced this pull request Nov 10, 2018

feat: support APM Server intake API version 2 (#465)

79fe510

Closes #356

watson added a commit to watson/apm-agent-nodejs that referenced this pull request Nov 13, 2018

feat: support APM Server intake API version 2 (elastic#465)

f6f226f

Closes elastic#356

watson added a commit to watson/apm-agent-nodejs that referenced this pull request Nov 13, 2018

feat: support APM Server intake API version 2 (elastic#465)

299d571

Closes elastic#356

watson added a commit to watson/apm-agent-nodejs that referenced this pull request Nov 13, 2018

feat: support APM Server intake API version 2 (elastic#465)

0bbf193

Closes elastic#356

Conversation

watson commented Jul 23, 2018 • edited by zube bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

watson commented Jul 23, 2018

Uh oh!

roncohen commented Jul 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

watson Jul 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

watson commented Jul 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

watson commented Jul 24, 2018

Uh oh!

watson commented Jul 31, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

watson Aug 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

watson Aug 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

watson commented Jul 23, 2018 •

edited by zube bot

Loading

roncohen commented Jul 23, 2018 •

edited

Loading

watson Jul 25, 2018 •

edited

Loading

watson commented Jul 24, 2018 •

edited

Loading

watson Aug 1, 2018 •

edited

Loading

watson Aug 1, 2018 •

edited

Loading

watson Aug 1, 2018 •

edited

Loading