Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 51 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,36 +2,43 @@


### CGROUP-METRICS
Node Module for reading [cgroup](https://www.kernel.org/doc/Documentation/cgroup-v1/) metrics. Reads from `/sys/fs/cgroup/`.
Node Module for reading [cgroup v1](https://www.kernel.org/doc/Documentation/cgroup-v1/) and [cgroup v2](https://docs.kernel.org/admin-guide/cgroup-v2.html) metrics. Reads from `/sys/fs/cgroup/`.

### Memory Metrics:
[Memory](https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt) reads from path `/sys/fs/cgroup/memory/memory`:

Raw values:
- `stat.rss`: # of bytes of anonymous and swap cache memory
- `kmem.usage_in_bytes`: current kernel memory allocation
- `limit_in_bytes`: limit of memory usage
#### Raw values ([cgroup v1](https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt)):
- `memory.stat.rss`: Resident Set Size - anonymous and swap cache memory (bytes)
- `memory.kmem.usage_in_bytes`: Kernel memory usage (bytes)
- `memory.limit_in_bytes`: Memory limit for the cgroup (bytes)

#### Raw values ([cgroup v2](https://docs.kernel.org/admin-guide/cgroup-v2.html#memory)):
- `memory.stat.anon`: Anonymous memory usage (bytes)
- `memory.stat`: Sum of `kernel_stack + slab + percpu + sock + vmalloc` (bytes)
- `memory.max`: Memory limit for the cgroup (bytes, or "max" for unlimited)

#### Calculated values:
- `containerUsage()`: Total memory usage (combines anonymous memory + kernel memory)
- `containerUsagePercentage()`: Memory usage as percentage of limit

Calculated values:
- `containerUsage()`: `stats.rss` + `kmem.usage_in_bytes`
- `containerUsagePercentage()`:`stats.rss` + `kmem.usage_in_bytes` / `limit_in_bytes`

### CPU Metrics:

Raw CPU values:
[CPU](https://www.kernel.org/doc/Documentation/cgroup-v1/cpuacct.txt) reads from path `/sys/fs/cgroup/`:
#### Raw values ([cgroup v1](https://www.kernel.org/doc/Documentation/cgroup-v1/cpuacct.txt)):
- `cpuacct.usage`: Total CPU time consumed by all tasks (nanoseconds)
- `cpuacct.stat.user`: CPU time spent in user mode
- `cpuacct.stat.system`: CPU time spent in kernel mode

- `cpuacct.usage`: total CPU time (in nanoseconds) since the start of the container obtained by this cgroup (CPU time obtained by all the tasks) in the system
- `cpuacct.stat`: reports the user and system CPU time consumed by all tasks in this cgroup (including tasks lower in the hierarchy)
- `user`: CPU time (in nanoseconds) spent by tasks of the cgroup in user mode
- `system`: CPU time (in nanoseconds) spent by tasks of the cgroup in kernel mode
- `timestamp`: timestamp of when the measurement was taken
#### Raw values ([cgroup v2](https://docs.kernel.org/admin-guide/cgroup-v2.html#cpu)):
- `cpu.stat.usage_usec`: Total CPU time consumed by all tasks (microseconds)
- `cpu.stat.user_usec`: CPU time spent in user mode (microseconds)
- `cpu.stat.system_usec`: CPU time spent in kernel mode (microseconds)

Both calls will return an object containing one or more `CpuMetric` objects for a specific cpu task:
- `cpuNanosSinceContainerStart`: total CPU time (in nanoseconds) since the start of the container obtained by this cgroup in the system
#### Returned CPU metrics format:
All CPU metrics include:
- `cpuNanosSinceContainerStart`: total CPU time since container start
- `timestamp`: timestamp of when the measurement was taken

Calculated CPU values:
#### Calculated CPU values:
- `calculateUsage`: takes two instances of calls to `cpuacct.usage` or `cpuacct.stat` and returns the calculated usage in percentage of CPU time:
` second time since container start - first time since container start / total time`

Expand Down Expand Up @@ -113,23 +120,40 @@ console.log(`Memory usage in the container: ${metrics["memory.containerUsage"]}`

### Error Handling

If there is no container running or there is an issue reading the file path, the function call will error something like this:
#### File System Errors

If there is no container running or cgroup files are missing:
```
Error: Error reading file /sys/fs/cgroup/memory/memory.stat. ENOENT: no such file or directory, open '/sys/fs/cgroup/memory/memory.stat'
```

If cgroup files are empty:
```
Error: Error reading file /sys/fs/cgroup/memory/memory.stat, Message: ENOENT: no such file or directory, open '/sys/fs/cgroup/memory/memory.stat'
Error: Error reading file /sys/fs/cgroup/memory/memory.stat. File is empty: /sys/fs/cgroup/memory/memory.stat.
```

If one of the files is empty, it will return an error like this:
#### Data Validation Errors

For malformed memory metrics:
```
Error: One or more metrics are malformed. rss: 1234, kmemUsage: NaN
```

For malformed CPU metrics:
```
Error: Error reading file /sys/fs/cgroup/memory/memory.stat, Message: File is empty
Error: Error reading file /sys/fs/cgroup/cpuacct/cpuacct.stat. Malformed cpuacct.stat file: invalid CPU fields
```

If a file is malformed, it will return an error like this:
#### cgroup v2 Specific Errors

For malformed cgroup v2 memory data:
```
Error: One or more metrics are malformed. containerUsage: 1234, limit: NaN
Error: Malformed memory.stat file: invalid anon field
```
Or:

For malformed cgroup v2 CPU data:
```
Error reading file /sys/fs/cgroup/cpuacct/cpuacct.stat, Message: Cannot read property 'split' of undefined
Error: Malformed cpu.stat file: invalid usage_usec field
```

### Contributing
Expand Down
139 changes: 108 additions & 31 deletions lib/utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -15,49 +15,116 @@ const fs = require('fs');
const flat = require('flat');

/**
* Reads metrics from `/sys/fs/cgroup/`
* @param {String} metric What metric to read from `/sys/fs/cgroup/`
* @returns metric value (could be object or number)
* Read cgroup metrics (supports both v1 and v2)
* @param {string} metric Metric path to read (using cgroup v1 format) e.g., 'memory/memory.stat'
* @returns {number|Object} Parsed metric value
*/
function readandFormatMetric(metric) {
// Check whether cgroup v2 is enabled
const isV2 = fs.existsSync('/sys/fs/cgroup/cgroup.controllers');

// Map v1 paths to v2 paths
let filePath = metric;
if (isV2) {
const pathMap = {
'memory/memory.stat': 'memory.stat',
'memory/memory.kmem.usage_in_bytes': 'memory.stat',
'memory/memory.limit_in_bytes': 'memory.max',
'cpuacct/cpuacct.usage': 'cpu.stat',
'cpuacct/cpuacct.stat': 'cpu.stat'
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since these are hardcoded, should we modify the readme on the procedure to add more metrics if needed in the future?

filePath = pathMap[metric] || metric;
}

try {
const data = fs.readFileSync(`/sys/fs/cgroup/${metric}`).toString();
// check file is not empty
if (data.length === 0) {
throw Error("File is empty");
}
const data = fs.readFileSync(`/sys/fs/cgroup/${filePath}`).toString();
if (data.length === 0) throw new Error(`File is empty: /sys/fs/cgroup/${filePath}.`);

if (metric === 'memory/memory.stat') {
// parse rss
const rss = data.split('\n')[1].split(' ')[1];
return(parseInt(rss, 10));
if (isV2) {
const stats = parseKeyValueData(data);
if (!isValidNumber(stats.anon)) {
throw new Error('Malformed memory.stat file: invalid anon field');
}
return stats.anon;
}
// Handle cgroup v1
const stats = parseKeyValueData(data);
if (!isValidNumber(stats.rss)) {
throw new Error('Malformed memory.stat file: invalid rss field');
}
return stats.rss;
}

if (metric === 'memory/memory.kmem.usage_in_bytes') {
if (isV2) {
const stats = parseKeyValueData(data);
return ['kernel_stack', 'slab', 'percpu', 'sock', 'vmalloc']
.reduce((sum, field) => sum + (stats[field] || 0), 0);
}
// Handle cgroup v1
return parseInt(data.trim(), 10);
}
if (metric.includes('cpuacct')) {
const timestamp = getTimestamp();
if (metric.includes('stat')) {
const user = data.split('\n')[0].split(' ')[1];
const system = data.split('\n')[1].split(' ')[1];

if (metric === 'memory/memory.limit_in_bytes') {
// Both v1 and v2 use same parsing
return parseInt(data.trim(), 10);
}

if (metric === 'cpuacct/cpuacct.usage') {
if (isV2) {
const stats = parseKeyValueData(data);
if (!isValidNumber(stats.usage_usec)) {
throw new Error('Malformed cpu.stat file: invalid usage_usec field');
}
return {
user: {
cpuNanosSinceContainerStart: parseInt(user, 10),
timestamp: timestamp
},
system: {
cpuNanosSinceContainerStart: parseInt(system, 10),
timestamp: timestamp
}
cpuNanosSinceContainerStart: stats.usage_usec * 1000, // μs -> ns
timestamp: Date.now()
};
}
// Handle cgroup v1
const timestamp = Date.now();
return { cpuNanosSinceContainerStart: parseInt(data.trim(), 10), timestamp };
}

if (metric === 'cpuacct/cpuacct.stat') {
if (isV2) {
const stats = parseKeyValueData(data);
if (!isValidNumber(stats.user_usec) || !isValidNumber(stats.system_usec)) {
throw new Error('Malformed cpu.stat file: invalid CPU fields');
}
const timestamp = Date.now();
// Convert microseconds to USER_HZ to match v1
// https://docs.kernel.org/admin-guide/cgroup-v1/cpuacct.html
const USER_HZ = 100;
const userTicks = Math.round(stats.user_usec / (1000000 / USER_HZ));
const systemTicks = Math.round(stats.system_usec / (1000000 / USER_HZ));

return {
user: { cpuNanosSinceContainerStart: userTicks, timestamp },
system: { cpuNanosSinceContainerStart: systemTicks, timestamp }
};
}
return {
cpuNanosSinceContainerStart: parseInt(data.trim(), 10),
timestamp: timestamp
// Handle cgroup v1
const timestamp = Date.now();
const stats = parseKeyValueData(data);
if (!isValidNumber(stats.user) || !isValidNumber(stats.system)) {
throw new Error('Malformed cpuacct.stat file: invalid CPU fields');
}
return {
user: { cpuNanosSinceContainerStart: stats.user, timestamp },
system: { cpuNanosSinceContainerStart: stats.system, timestamp }
};
}
return parseInt(data.trim(), 10);
} catch (e) {
throw Error(`Error reading file /sys/fs/cgroup/${metric}, Message: ${e.message || e}`)
throw new Error(`Error reading file /sys/fs/cgroup/${filePath}. ${e.message || e}`);
}
}

function isValidNumber(value) {
return typeof(value) === "number" && !isNaN(value);
}

function formatMetrics(metrics, flatten) {
if (flatten) {
return flat(metrics);
Expand All @@ -66,8 +133,18 @@ function formatMetrics(metrics, flatten) {
return metrics;
}

function getTimestamp() {
return Date.now();
function parseKeyValueData(data) {
const stats = {};
data.split('\n').filter(line => line.trim()).forEach(line => {
const parts = line.trim().split(/\s+/);
if (parts.length >= 2) {
const value = parseInt(parts[1], 10);
if (!isNaN(value)) {
stats[parts[0]] = value;
}
}
});
return stats;
}

module.exports = {
Expand Down
Loading