feat: add additional sections to "Manage Slurm" how-to#123
Conversation
reference/glossary.md
Outdated
| `sackd` | ||
| Slurm Auth and Credential Kiosk daemon. Typically used to provide cluster login nodes. |
There was a problem hiding this comment.
This does group all the Slurm services but the glossary is no longer alphabetically ordered. My preference would be to keep the alphabetical ordering as I'd expect users to be looking up single terms in the glossary rather than reading the whole page, so proximity to related terms is less important.
Having a "Related terms:" section under "Resources:" may be an idea.
There was a problem hiding this comment.
We can go back to things being in alphabetical order. Related terms: seems like a nice idea, but I think it would get repetitive quite quickly since everything is inter-related.
I've found using the Glossary in tandem with the {term} directive helpful in existing how-tos since we can link to the term and the reader can click on it if they want an expanded definition.
AshleyCliff
left a comment
There was a problem hiding this comment.
Looking good! I've left some comments and a couple nitpick edits.
| the different components and services of your Slurm deployment. | ||
|
|
||
| ## Managing the Slurm controller | ||
|
|
There was a problem hiding this comment.
Can we add a short intro to this section to break up the headers - maybe a quick description of what the slurm controller is, the corresponding charm, and what processes it controls?
There was a problem hiding this comment.
Hmm... I disagree here. I feel like we'd be adding superfluous content here for the sake of styling rather actually adding useful information that belongs within this how-to.
We shouldn't include a description of what the Slurm controller is or what the charm is doing here because that kind of information belongs in Explanation documentation. Most people responsible for managing Slurm will already know what the Slurm controller is, and they don't want to be distracted my by an explanation of what the Slurm controller is or which charm happens to be operating it. They're trying to get down to business.
Instead, we should have a dedicated explanation page that explains the differences between upstream and our charms like Charmed Kubeflow does for those that want to understand Charmed HPC on a deeper level; however, it's the Slurm documentation's job to explain what Slurm is, not ours. The requirements for this how-to should be to quickly provide operators with instructions on how to manage in Slurm in their Charmed HPC cluster, not explain what Slurm is or how the deployment is composed.
There was a problem hiding this comment.
Short contextual description within a how-to are fine, we just don't want to be spending pages and pages worth of text in a how-to providing information that's not necessary at the time. I was thinking more of a quick intro statement similar to the top one for the page that's specific to that section and similarly for the nodes and partitions section (I tried to draft an example but failed).
I agree a dedicated explanation page going through the details of the differences would be good to have and that we don't need to explain the controller itself.
There was a problem hiding this comment.
I believe that using our glossary terms in tandem with the {term} directive is how we can provide more information to users that need additional context, but avoid clogging up the how-to with additional information that's not directly related to the task the reader wants to complete.
I'm more or less avoidant of quick intro statements for sections because I find it's quite easy to just end up repeating yourself. They're sort of like:
## Manage the Slurm controller
This section provides instructions on how to manage Slurm controller.
You can just infer from the header that "hey, the stuff in this section is probably about managing the Slurm controller". Being explicit isn't really necessary here imho unless there's data that backs that we need to be more explicit. We had some quick section intros in the "Integrate with COS" how-to that I removed because they were just redundant
Signed-off-by: Jason C. Nucciarone <nuccitheboss@ubuntu.com>
5c7b7a7 to
4396aee
Compare
|
@AshleyCliff ready for round 2. I applied your suggestions for the admonition updates and purging "users' jobs". To your other comments, I'm not convinced that we need additional filler after the "Managing the Slurm controller" and "Managing compute nodes and partitions" headers. While the styling might make it feel like we're missing content, I don't think we should add short blurbs like "what's the Slurm controller" or an explanation of the charm that's behind the service's operation because I feel information like that belongs in a different location outside of the how-to. Also, regarding your comment about adding a "Next Steps" section to the end of this how-to, I added a discussion item for next week's docs jam session. It's out-of-scope for this PR, but I could see a "Related Info" section being helpful once we have more content in "Reference" and "Explanation"; however, I'd like to see if users visiting our docs are left wanting for this additional information. |
AshleyCliff
left a comment
There was a problem hiding this comment.
One missed "users'" to purge and then we're good to go.
We can continue the discussion around intro statements for new sections but I don't consider it a blocking issue for getting this PR and section through.
Signed-off-by: Jason C. Nucciarone <nuccitheboss@ubuntu.com>
4396aee to
b172e7b
Compare
|
@AshleyCliff purged the last users' jobs! We should be good to go here. We can revisit the section intro statements once we expand the "Manage Slurm" how-to to include even more things; however, my preference is to not have them since they usually just end up repeating information that can be inferred from the header. There were a couple section intros in the "Integrate with COS" how-to that I purged because they didn't really add important information when scrutinized. |
|
Merging on behalf of @AshleyCliff since her power is out ⚡ |
Pre-submission checklist
Summary of Changes
This PR adds additional sections to the "Manage Slurm" how-to, and it adds more terms to the Glossary.
The added sections add documentation for:
set-node-stateaction.set-node-configaction.default_node_stateanddefault_node_reasonconfiguration options.Key things to note:
default_node_stateanddefault_node_reasonare not documented at https://charmhub.io/slurmd/configurations yet because we're still working through code reviews on feat(slurmd): implementdefault-node-stateanddefault-node-reasonslurm-charms#177.Related Issues, PRs, and Discussions
set-node-stateaction and related configuration options #112