cephtrace is a project that delivers various eBPF based ceph tracing tools. These tools can be used to trace different ceph components dynamically, without the need to restart or reconfigure any of the ceph related services. Currently radostrace and osdtrace have been implemented.
These tools can provide a great insight on the per-io based performance, and help to quickly identify any potential performance bottlenecks.
To start:
git clone https://github.com/taodd/cephtrace
cd cephtrace
git submodule update --init --recursive
On a Debian or Ubuntu based system, use the following apt command to start the build dependencies. If using a system with a different package manager, a different set of commands will be required:
sudo apt-get install g++ clang libelf-dev libc6-dev-i386 libdw-dev
Build the binaries:
cd cephtrace
make
It is possible to build the binaries on a different machine and then transfer them to the target host, as long as they are running the same versions of underlying packages as the builder machine.
Debug symbols are required for these tools to work. Each tool needs a different debug symbol package. For ubuntu, we now support fetching debug symbols from debuginfod server automatically.
Please install libdebuginfod package first:
sudo apt-get install libdebuginfod-dev
You can also manually install debug packages in case debuginfod isn't working, please refer to Getting dbgsymddeb Packages. These are required debug packages for each tool:
- For
radostrace:sudo apt-get install ceph-common-dbgsym librbd1-dbgsym librados2-dbgsym - For
osdtrace:sudo apt-get install ceph-osd-dbgsym
If tracing from a compute-only node that carries rbd client connections only, librbd1-dbgsym and librados2-dbgsym packages are sufficient to use radostrace.
radostrace can trace any librados based ceph client, including vm with rbd volume attached, rgw, cinder, glance...
below is an example for tracing a vm which is doing 4k random read on a rbd volume.
:~$ sudo ./radostrace
pid client tid pool pg acting w/r size latency object[ops][offset,length]
19015 34206 419357 2 1e [1,11,121,77,0] W 0 887 rbd_header.374de3730ad0[watch ]
19015 34206 419358 2 1e [1,11,121,77,0] W 0 8561 rbd_header.374de3730ad0[call ]
19015 34206 419359 2 39 [0,121,11,77,1] R 4096 1240 rbd_data.374de3730ad0.0000000000000000[read ][0, 4096]
19015 34206 419360 2 39 [0,121,11,77,1] R 4096 1705 rbd_data.374de3730ad0.0000000000000000[read ][4096, 4096]
19015 34206 419361 2 39 [0,121,11,77,1] R 4096 1334 rbd_data.374de3730ad0.0000000000000000[read ][12288, 4096]
19015 34206 419362 2 2b [77,11,1,0,121] R 4096 2180 rbd_data.374de3730ad0.00000000000000ff[read ][4128768, 4096]
19015 34206 419363 2 2b [77,11,1,0,121] R 4096 857 rbd_data.374de3730ad0.00000000000000ff[read ][4186112, 4096]
19015 34206 419364 2 2b [77,11,1,0,121] R 4096 717 rbd_data.374de3730ad0.00000000000000ff[read ][4190208, 4096]
19015 34206 419365 2 2b [77,11,1,0,121] R 4096 499 rbd_data.374de3730ad0.00000000000000ff[read ][4059136, 4096]
19015 34206 419366 2 2b [77,11,1,0,121] R 4096 1315 rbd_data.374de3730ad0.00000000000000ff[read ][4161536, 4096]
...
...
Each row represent one IO sent from the client to the ceph cluster, below is the explanation for each column:
pid: ceph client process idclient: ceph client global id, a unique number to identify the clienttid: operation idpool: pool id the operation is sent topg: pg id the operation is sent to, pool.pg is the pgid we usually refer toacting: the OSD acting set this operation is sent tow/r: whether this operation is write or readsize: the write/read size of this operationlatency: the latency of this request in microsecondobject[ops][offset,length]: the object name, detailed osd op name, op's offset and length
:~$ sudo ./osdtrace -x
- These have been tested on Ubuntu distribution with 5.15 and 6.8 kernels. Other platforms have not been tested.
- It has not yet been tested with containerized processes.