Skip to content

Conversation

@markhpc
Copy link

@markhpc markhpc commented Feb 5, 2017

Uncoordinated executions of getput where no rank is specified results in every copy using a default rank of 0. When bynode or byproc container naming is used, this causes each copy to target the same set of containers. This PR amends the naming scheme so that when rank isn't specified, the fully qualified hostname is used instead. A side benefit of this PR is that it is easier to look at a list of containers and eyeball per-host write statistics anomalies.

Signed-off-by: Mark Nelson mnelson@redhat.com

Signed-off-by: Mark Nelson <mnelson@redhat.com>
@markseger
Copy link
Owner

I think this is worthy of more discussion. First and foremost I don't think you should ever run uncoordinated copies of getput and quite honestly I can't think of any situation where you would want to. The point of gpmulti is to coordinate multiple copies and it does a very nice job of it. However I also don't recommend running gpmulti either. ;) That's really the job of gpsuite which runs gmpulti AND takes its output and reduces each pass to a merged, single line of output. I've been using gpsuite for many years and have run as many as 64 parallel copies of getput for over 2000 parallel threads. I also don't see any reason why one couldn't run even more.

My utility gpsum is also expecting gpsuite output and which reduces that output to an even denser format. Here's an example of what it looks like, for a single size object. WHen there are more, you get more columns and the headers line up better ;) The CLT column is the number of clients that ran the tests and PRC is the number of processes/client, so the TOTL column is the product of the 2:

``
*** IOPS *** *** put *** *** get ***
Totl Clt Prc 1k 1k
1 1 1 98.46 182.89
16 4 4 1224.08 2244.29
32 4 8 1875.70 4264.36
64 4 16 3087.64 7300.27
96 4 24 3710.17 8636.94
128 4 32 4497.56 9844.99
192 4 48 5361.17 12649.99
256 4 64 6257.89 14380.43
320 4 80 6481.22 16177.29
384 4 96 6734.29 17433.67
448 4 112 6896.92 18658.28
512 4 128 6943.08 18932.47

-mark

@markhpc
Copy link
Author

markhpc commented Feb 5, 2017

Hi Mark,

I've wrapped getput as a benchmark in CBT, which is a framework that will deploy a ceph cluster (and eventually swift and gluster), then run a set of parametrically defined benchmarks against it and run background tracing and montoring tools like collectl for each test. It was designed to rapidly and repeatedly deploy ceph clusters and analyze performance of tests on them using tools like perf, blktrace, collectl, valgrind, etc.

In some ways CBT is similar to gpsuite, though different in that it's more benchmark agnostic but also dumber in it's coordination capabilities. Initially it was primarily designed to automate rados bench tests, though it now supports a number of other benchmarks and tests including fio, cosbench, cephtestrados, and now getput.

CBT relies on pdsh for executing commands remotely, so it would be a bit annoying to pass a rank to each getput process. I could work around the issue by assinging hostname -f in the container prefix, but this seems like a better idea and also ensures that if multiple users run getput on the same cluster at the same time using the default rank, they don't try to write into the same buckets.

@markseger
Copy link
Owner

markseger commented Feb 6, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants