getput: Make bynode use the FQDN if no rank is given. #4

markhpc · 2017-02-05T13:51:08Z

Uncoordinated executions of getput where no rank is specified results in every copy using a default rank of 0. When bynode or byproc container naming is used, this causes each copy to target the same set of containers. This PR amends the naming scheme so that when rank isn't specified, the fully qualified hostname is used instead. A side benefit of this PR is that it is easier to look at a list of containers and eyeball per-host write statistics anomalies.

Signed-off-by: Mark Nelson mnelson@redhat.com

Signed-off-by: Mark Nelson <mnelson@redhat.com>

markseger · 2017-02-05T14:25:21Z

I think this is worthy of more discussion. First and foremost I don't think you should ever run uncoordinated copies of getput and quite honestly I can't think of any situation where you would want to. The point of gpmulti is to coordinate multiple copies and it does a very nice job of it. However I also don't recommend running gpmulti either. ;) That's really the job of gpsuite which runs gmpulti AND takes its output and reduces each pass to a merged, single line of output. I've been using gpsuite for many years and have run as many as 64 parallel copies of getput for over 2000 parallel threads. I also don't see any reason why one couldn't run even more.

My utility gpsum is also expecting gpsuite output and which reduces that output to an even denser format. Here's an example of what it looks like, for a single size object. WHen there are more, you get more columns and the headers line up better ;) The CLT column is the number of clients that ran the tests and PRC is the number of processes/client, so the TOTL column is the product of the 2:

``
*** IOPS *** *** put *** *** get ***
Totl Clt Prc 1k 1k
1 1 1 98.46 182.89
16 4 4 1224.08 2244.29
32 4 8 1875.70 4264.36
64 4 16 3087.64 7300.27
96 4 24 3710.17 8636.94
128 4 32 4497.56 9844.99
192 4 48 5361.17 12649.99
256 4 64 6257.89 14380.43
320 4 80 6481.22 16177.29
384 4 96 6734.29 17433.67
448 4 112 6896.92 18658.28
512 4 128 6943.08 18932.47

-mark

markhpc · 2017-02-05T22:15:03Z

Hi Mark,

I've wrapped getput as a benchmark in CBT, which is a framework that will deploy a ceph cluster (and eventually swift and gluster), then run a set of parametrically defined benchmarks against it and run background tracing and montoring tools like collectl for each test. It was designed to rapidly and repeatedly deploy ceph clusters and analyze performance of tests on them using tools like perf, blktrace, collectl, valgrind, etc.

In some ways CBT is similar to gpsuite, though different in that it's more benchmark agnostic but also dumber in it's coordination capabilities. Initially it was primarily designed to automate rados bench tests, though it now supports a number of other benchmarks and tests including fio, cosbench, cephtestrados, and now getput.

CBT relies on pdsh for executing commands remotely, so it would be a bit annoying to pass a rank to each getput process. I could work around the issue by assinging hostname -f in the container prefix, but this seems like a better idea and also ensures that if multiple users run getput on the same cluster at the same time using the default rank, they don't try to write into the same buckets.

markseger · 2017-02-06T18:21:04Z

I took a close look at what you're doing and while i do get it and am happy to support your efforts I'm thinking we may need to do things a little differently, perhaps just adding another switch and here's the reason. the rank has multiple uses depending on what you're doing. for example, if you want to create random objects, a unique object number is created using the rank. there's also a mechanism to allow on to bypass the load balancers and talk directly to swift proxy nodes, and essentially doing a round-robin mechanism taking into account a thread count and rank. I didn't study it super closely though those 2 did jump out at me. It also turns out I've never actually used by-node in any of my testing and always use by process or shared. Might ceph testing have the need for these as well? In any event, I think I'm thinking of having some way to simply say to use fqdn instead of rank in the container name for both bynode as well as byprocess. Then, if someone does choose that and also says they want to use --rank, --proxies or generate random object numbers I can generate an error message that says those options are mutually exclusive with what-ever-switch-we-want-to use. in fact, if I later find other cases that are mutually exclusive I can just add future tests for them as well. I think all I'd need to do it change: if not options.use-fdqn: bynode_val = get_fqhostname() options.rank = 0 and then when building names use either bynode_val OR rank hmm, what about --use-fqdnn or --fdqn-rank? --use-fqdn-rank or --use-fdqn-for-rank would be more explicit but also getting pretty wordy too ;) wadda ya think? so I assume from all this you are finding getput useful? I'm also assuming outside of your testing environment I could still use the old fashioned way of using gpsuite ;). Have you tried that at all? Basically it lets you define a set of tests with different names and just run them by saying "gpsuite --suite foo" and it takes care of the parallelism and merging of results. It also allows you to run scripts between tests so if you want to pull collectl logs or even do some analysis of them you can do that too. We had one case where we want to measure total cpu usage by all the swift processes by test and was able to pretty easily to that too.

…

-mark

On Sun, Feb 5, 2017 at 5:15 PM, Mark Nelson ***@***.***> wrote: Hi Mark, I've wrapped getput as a benchmark in CBT, which is a framework that will deploy a ceph cluster (and eventually swift and gluster), then run a set of parametrically defined benchmarks against it and run background tracing and montoring tools like collectl for each test. It was designed to rapidly and repeatedly deploy ceph clusters and analyze performance of tests on them using tools like perf, blktrace, collectl, valgrind, etc. In some ways CBT is similar to gpsuite, though different in that it's more benchmark agnostic but also dumber in it's coordination capabilities. Initially it was primarily designed to automate rados bench tests, though it now supports a number of other benchmarks and tests including fio, cosbench, cephtestrados, and now getput. CBT relies on pdsh for executing commands remotely, so it would be a bit annoying to pass a rank to each cosbench process. I could work around the issue by assinging hostname -f in the container prefix, but this seems like a better idea and also ensures that if multiple users run getput on the same cluster at the same time using the default rank, they don't try to write into the same buckets. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADr36fBMjvS7H7azK9YRCcwWnxTKJVfqks5rZknngaJpZM4L3gIg> .

Make bynode use the FQDN if no rank is given.

f0b5732

Signed-off-by: Mark Nelson <mnelson@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getput: Make bynode use the FQDN if no rank is given. #4

getput: Make bynode use the FQDN if no rank is given. #4

Uh oh!

markhpc commented Feb 5, 2017

Uh oh!

markseger commented Feb 5, 2017

Uh oh!

markhpc commented Feb 5, 2017 •

edited

Loading

Uh oh!

markseger commented Feb 6, 2017 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

getput: Make bynode use the FQDN if no rank is given. #4

Are you sure you want to change the base?

getput: Make bynode use the FQDN if no rank is given. #4

Uh oh!

Conversation

markhpc commented Feb 5, 2017

Uh oh!

markseger commented Feb 5, 2017

Uh oh!

markhpc commented Feb 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markseger commented Feb 6, 2017 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

markhpc commented Feb 5, 2017 •

edited

Loading