Conversation
- Fix executables call with an empty env When the environment is emptied, it is not guaranteed that "ls" or "sleep" are in the PATH of the new shell. Use /usr/bin/env "cmd" to ensure the system version of this command is used. - Fix all the python tests related to register_instance - Fix all libmuscle/python tests
Both for initialization and monitoring itself
|
Closes #312 |
LourensVeen
left a comment
There was a problem hiding this comment.
Okay, I've left a bunch of comments, but I also still have some second thoughts whether it's good to merge this with the profiling events. It may be better to have a separate table (instance, rank, hostname, pid) in the ProfileStore, and another one with (hostname, pid, time_start, time_stop, cpu, mem_max), and then you can join that together and/or with the other data in all sorts of creative ways when analysing the results.
Let me sleep on that, and maybe discuss tomorrow?
| def _register_instance( | ||
| self, instance_id: str, locations: List[str], | ||
| ports: List[List[str]], version: str = '') -> Any: | ||
| ports: List[List[str]], pid: int, hostname: str, version: str = '') -> Any: |
There was a problem hiding this comment.
Instances that use MPI will have a hostname/pid for each rank. How would that be passed here?
There was a problem hiding this comment.
Only the process calling "register_instance" will be monitored, if we want to also monitor all the other MPI processes, then I think that the following needs to be done:
- The MPI Process that calls register_instance must provide all PIDS and hostnames
- We need to call this method for each provided PID/Hostname Pair
I think the backend is flexible enough to have Many PIDs for a single Instance ID
libmuscle/python/libmuscle/native_instantiator/test/test_process_manager.py
Outdated
Show resolved
Hide resolved
| e.stop_time.nanoseconds, port_name, port_operator, | ||
| e.port_length, e.slot, e.message_number, e.message_size, | ||
| e.message_timestamp) | ||
| e.message_timestamp, e.cpu_percent, e.memory_usage) |
There was a problem hiding this comment.
I'm guessing that both cpu_percent and memory_usage are averages? Or is memory_usage a maximum? That would probably actually be more useful.
libmuscle/python/libmuscle/manager/test/test_profile_database.py
Outdated
Show resolved
Hide resolved
libmuscle/python/libmuscle/native_instantiator/agent/map_client.py
Outdated
Show resolved
Hide resolved
| cur.execute("COMMIT") | ||
| cur.close() | ||
|
|
||
| def add_event( |
There was a problem hiding this comment.
Why a separate function? Can't we just call add_events(instance_id, [event]) directly instead?
There was a problem hiding this comment.
I made it because a single agent might submit for various instances, which with add_events means you have to pre-sort, or submit with arrays of length 1. but either way is fine I think.
libmuscle/python/libmuscle/native_instantiator/agent_manager.py
Outdated
Show resolved
Hide resolved
libmuscle/python/libmuscle/native_instantiator/native_instantiator.py
Outdated
Show resolved
Hide resolved
|
Oh, and could you merge develop into your branch? I've fixed the CI, so then we can see what it says. |
…cified, use the existing env (default behaviour)
b9d4647 to
4e07f29
Compare
…d monitor asynchronous
…isfy threading tests
This PR implements a way to get the node agents (if applicable) to monitor each instance that registers itself in the instance registry
The monitoring is quite basic and can be extended, it is logged with a DEBUG level to the main log.
After this is merged, probably there should also be a way to easily extract this info from the database