We have 2 jmx configurations one in conf.d/jmx.d/conf.yaml:
init_config:
is_jmx: true
collect_default_metrics: true
instances:
- name: riemann
host: localhost
port: 7199
and another in conf.d/kafka.d/conf.yaml:
init_config:
is_jmx: true
collect_default_metrics: true
instances:
- name: riemann
host: localhost
port: 7199
refresh_beans_initial: 60
Normally metrics are collected for both of those instances (it is a single application), but sometimes after a deployment (application restart) we completely lose metrics for one of those instances; only restart of the datadog agent resolves the issue.
Using different names for the instances does seem to resolve the issue for us.
I suspect this might be because the instances that fail to fetch metrics are stored in a map keyed on instance name so during an application restart one of the instances would overwrite the other.