Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions hammers/jackhammer
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import time
import os
import threading
from typing import List, Literal
from ocs.ocs_client import OCSClient


class TermColors:
Expand Down Expand Up @@ -50,6 +51,7 @@ with open(sys_config_file, 'r') as stream:
sys_config = yaml.safe_load(stream)

use_hostmanager: bool = sys_config.get('use_hostmanager', False)
hm_instance_id: str = sys_config.get('hostmanager_instance_id')
Comment on lines 53 to +54
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my original issue on this:

This may require adding the host manager instance-id to sys_config.yaml, or maybe just assuming there is only one host manager (which there should be now, in all cases on site).

What I was trying to suggest here is that having use_hostmanager and hm_instance_id feels redundant. If hm_instance_id is present, we'll be using the HM. And we never want to provide an hm_instance_id and set use_hostmanager = False.

Instead I'd suggest something like this:

Suggested change
use_hostmanager: bool = sys_config.get('use_hostmanager', False)
hm_instance_id: str = sys_config.get('hostmanager_instance_id')
hm_instance_id: str = sys_config.get('hostmanager_instance_id')
if hm_instance_id:
use_hostmanager: bool = True
else:
if 'use_hostmanager' in sys_config:
print("'use_hostmanager' config is deprecated. Provide " +
"'hostmanager_instance_id' or remove from sys_config.yml.")
use_hostmanager: bool = sys_config.get('use_hostmanager', False)

docker_compose_cmd: str = sys_config.get('docker_compose_command', 'docker compose')

def get_pysmurf_controller_docker_service(slot: int) -> str:
Expand Down Expand Up @@ -404,6 +406,11 @@ def hammer_func(args):
reboot = not (args.no_reboot)
all_slots = len(slots) == len(sys_config['slot_order'])

if use_hostmanager:
cprint("Bringing down all agents via hostmanager", style=TermColors.HEADER)
hm = OCSClient(hm_instance_id)
hm.update(requests=[('all', 'down')])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this for just the requested slots? I could imagine it would be disruptive in some cases to restart the det-controller instance of all slots when you are only hammering one of them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can change it to only bring down the controllers for the slots of interest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is a good idea, and matches the behavior that jackhammer used to have before it was updated for use with Host Manager.

As currently written, jackhammer will also bring down the monitor, suprsync, and det-crate agents.


Comment on lines +409 to +413
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would at least move this to after the prompt for "Are you sure (y/n)?". Hammering and then backing out with a 'n' response would result in everything on the ocs side of things being offline otherwise.

Following the conversation in this issue -- I think you should move it a bit further down even, below the dump_docker_logs() line so that we get the agent logs too.

If my theory on the logs being too long is correct, then maybe we just limit the log dump to the past X lines via the -n X flag?

reboot_str = "hard" if reboot else "soft"
cmd = input(f"You are {reboot_str}-resetting slots {slots}. "
"Are you sure (y/n)? ")
Expand Down Expand Up @@ -484,6 +491,10 @@ def hammer_func(args):
setup_smurfs(slots)
print("Finished configuring pysmurf!")

if use_hostmanager:
cprint("Bringing up all agents via hostmanager", style=TermColors.HEADER)
hm.update(requests=[('all', 'up')])

# Enters into an ipython notebook for the first specified slot
cprint(f"Entering pysmurf slot {slots[0]}", style=TermColors.HEADER)
enter_pysmurf(slots[0], agg=args.agg)
Expand Down