-
Notifications
You must be signed in to change notification settings - Fork 9
getput: Make bynode use the FQDN if no rank is given. #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Mark Nelson <mnelson@redhat.com>
|
I think this is worthy of more discussion. First and foremost I don't think you should ever run uncoordinated copies of getput and quite honestly I can't think of any situation where you would want to. The point of gpmulti is to coordinate multiple copies and it does a very nice job of it. However I also don't recommend running gpmulti either. ;) That's really the job of gpsuite which runs gmpulti AND takes its output and reduces each pass to a merged, single line of output. I've been using gpsuite for many years and have run as many as 64 parallel copies of getput for over 2000 parallel threads. I also don't see any reason why one couldn't run even more. My utility gpsum is also expecting gpsuite output and which reduces that output to an even denser format. Here's an example of what it looks like, for a single size object. WHen there are more, you get more columns and the headers line up better ;) The CLT column is the number of clients that ran the tests and PRC is the number of processes/client, so the TOTL column is the product of the 2: `` -mark |
|
Hi Mark, I've wrapped getput as a benchmark in CBT, which is a framework that will deploy a ceph cluster (and eventually swift and gluster), then run a set of parametrically defined benchmarks against it and run background tracing and montoring tools like collectl for each test. It was designed to rapidly and repeatedly deploy ceph clusters and analyze performance of tests on them using tools like perf, blktrace, collectl, valgrind, etc. In some ways CBT is similar to gpsuite, though different in that it's more benchmark agnostic but also dumber in it's coordination capabilities. Initially it was primarily designed to automate rados bench tests, though it now supports a number of other benchmarks and tests including fio, cosbench, cephtestrados, and now getput. CBT relies on pdsh for executing commands remotely, so it would be a bit annoying to pass a rank to each getput process. I could work around the issue by assinging |
|
I took a close look at what you're doing and while i do get it and am happy
to support your efforts I'm thinking we may need to do things a little
differently, perhaps just adding another switch and here's the reason.
the rank has multiple uses depending on what you're doing. for example, if
you want to create random objects, a unique object number is created using
the rank.
there's also a mechanism to allow on to bypass the load balancers and talk
directly to swift proxy nodes, and essentially doing a round-robin
mechanism taking into account a thread count and rank.
I didn't study it super closely though those 2 did jump out at me. It also
turns out I've never actually used by-node in any of my testing and always
use by process or shared. Might ceph testing have the need for these as
well?
In any event, I think I'm thinking of having some way to simply say to use
fqdn instead of rank in the container name for both bynode as well as
byprocess. Then, if someone does choose that and also says they want to
use --rank, --proxies or generate random object numbers I can generate an
error message that says those options are mutually exclusive with
what-ever-switch-we-want-to use. in fact, if I later find other cases that
are mutually exclusive I can just add future tests for them as well. I
think all I'd need to do it change:
if not options.use-fdqn:
bynode_val = get_fqhostname()
options.rank = 0
and then when building names use either bynode_val OR rank
hmm, what about --use-fqdnn or --fdqn-rank? --use-fqdn-rank or
--use-fdqn-for-rank would be more explicit but also getting pretty wordy
too ;)
wadda ya think?
so I assume from all this you are finding getput useful? I'm also assuming
outside of your testing environment I could still use the old fashioned way
of using gpsuite ;). Have you tried that at all? Basically it lets you
define a set of tests with different names and just run them by saying
"gpsuite --suite foo" and it takes care of the parallelism and merging of
results. It also allows you to run scripts between tests so if you want to
pull collectl logs or even do some analysis of them you can do that too. We
had one case where we want to measure total cpu usage by all the swift
processes by test and was able to pretty easily to that too.
…-mark
On Sun, Feb 5, 2017 at 5:15 PM, Mark Nelson ***@***.***> wrote:
Hi Mark,
I've wrapped getput as a benchmark in CBT, which is a framework that will
deploy a ceph cluster (and eventually swift and gluster), then run a set of
parametrically defined benchmarks against it and run background tracing and
montoring tools like collectl for each test. It was designed to rapidly and
repeatedly deploy ceph clusters and analyze performance of tests on them
using tools like perf, blktrace, collectl, valgrind, etc.
In some ways CBT is similar to gpsuite, though different in that it's more
benchmark agnostic but also dumber in it's coordination capabilities.
Initially it was primarily designed to automate rados bench tests, though
it now supports a number of other benchmarks and tests including fio,
cosbench, cephtestrados, and now getput.
CBT relies on pdsh for executing commands remotely, so it would be a bit
annoying to pass a rank to each cosbench process. I could work around the
issue by assinging hostname -f in the container prefix, but this seems
like a better idea and also ensures that if multiple users run getput on
the same cluster at the same time using the default rank, they don't try to
write into the same buckets.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADr36fBMjvS7H7azK9YRCcwWnxTKJVfqks5rZknngaJpZM4L3gIg>
.
|
Uncoordinated executions of getput where no rank is specified results in every copy using a default rank of 0. When bynode or byproc container naming is used, this causes each copy to target the same set of containers. This PR amends the naming scheme so that when rank isn't specified, the fully qualified hostname is used instead. A side benefit of this PR is that it is easier to look at a list of containers and eyeball per-host write statistics anomalies.
Signed-off-by: Mark Nelson mnelson@redhat.com