-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Description
We have written the startup script for Transferqueue + Datasystem backend as follows:
class Trainer:
def __init__(self, config: dict):
self.config = config
self._initialize_transferqueue()
def _initialize_transferqueue(self):
# 1. Initialize TransferQueueController (single controller only)
self.tq_controller = TransferQueueController.remote()
# 2. Prepare necessary information of the controller
self.tq_controller_info = process_zmq_server_info(self.tq_controller)
tq_config = OmegaConf.create({}, flags={"allow_objects": True}) # Note: Need to generate a new DictConfig
# with allow_objects=True to maintain ZMQServerInfo instance. Otherwise it will be flattened to dict
tq_config.controller_info = self.tq_controller_info
self.config = OmegaConf.merge(tq_config, self.config)
# 3. Create TransferQueueClient
self.tq_client = TransferQueueClient(
client_id="Trainer",
controller_info=self.tq_controller_info,
)
# 4. Connect to DataSystem
self.tq_client.initialize_storage_manager(manager_type=self.config["manager_type"], config=self.config)
return self.tq_clientWe found TransferQueueClient requires controller_info during initialization and holds it, but StorageManager also needs to pass controller_info during initialization.
From the user's perspective, the relationship between StorageManager and controller may not be directly perceptible. Users might forget to add controller_info when passing in the configuration, resulting in ValueError. Perhaps we should tolerate this behavior instead of throwing an exception directly.
Metadata
Metadata
Assignees
Labels
No labels