- mlagents.trainers.trainer.on_policy_trainer
- mlagents.trainers.trainer.off_policy_trainer
- mlagents.trainers.trainer.rl_trainer
- mlagents.trainers.trainer.trainer
- mlagents.trainers.settings
class OnPolicyTrainer(RLTrainer)The PPOTrainer is an implementation of the PPO algorithm.
| __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str)Responsible for collecting experiences and training an on-policy model.
Arguments:
behavior_name: The name of the behavior associated with trainer configreward_buff_cap: Max reward history to track in the reward buffertrainer_settings: The parameters for the trainer.training: Whether the trainer is set for training.load: Whether the model should be loaded.seed: The seed the model will be initialized withartifact_path: The directory within which to store artifacts from this trainer.
| add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> NoneAdds policy to trainer.
Arguments:
parsed_behavior_id: Behavior identifiers that the policy should belong to.policy: Policy to associate with name_behavior_id.
class OffPolicyTrainer(RLTrainer)The SACTrainer is an implementation of the SAC algorithm, with support for discrete actions and recurrent networks.
| __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str)Responsible for collecting experiences and training an off-policy model.
Arguments:
behavior_name: The name of the behavior associated with trainer configreward_buff_cap: Max reward history to track in the reward buffertrainer_settings: The parameters for the trainer.training: Whether the trainer is set for training.load: Whether the model should be loaded.seed: The seed the model will be initialized withartifact_path: The directory within which to store artifacts from this trainer.
| save_model() -> NoneSaves the final training model to memory Overrides the default to save the replay buffer.
| save_replay_buffer() -> NoneSave the training buffer's update buffer to a pickle file.
| load_replay_buffer() -> NoneLoads the last saved replay buffer from a file.
| add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> NoneAdds policy to trainer.
class RLTrainer(Trainer)This class is the base class for trainers that use Reward Signals.
| end_episode() -> NoneA signal that the Episode has ended. The buffer must be reset. Get only called when the academy resets.
| @abc.abstractmethod
| create_optimizer() -> TorchOptimizerCreates an Optimizer object
| save_model() -> NoneSaves the policy associated with this trainer.
| advance() -> NoneSteps the trainer, taking in trajectories and updates if ready. Will block and wait briefly if there are no trajectories.
class Trainer(abc.ABC)This class is the base class for the mlagents_envs.trainers
| __init__(brain_name: str, trainer_settings: TrainerSettings, training: bool, load: bool, artifact_path: str, reward_buff_cap: int = 1)Responsible for collecting experiences and training a neural network model.
Arguments:
brain_name: Brain name of brain to be trained.trainer_settings: The parameters for the trainer (dictionary).training: Whether the trainer is set for training.artifact_path: The directory within which to store artifacts from this trainerreward_buff_cap:
| @property
| stats_reporter()Returns the stats reporter associated with this Trainer.
| @property
| parameters() -> TrainerSettingsReturns the trainer parameters of the trainer.
| @property
| get_max_steps() -> intReturns the maximum number of steps. Is used to know when the trainer should be stopped.
Returns:
The maximum number of steps of the trainer
| @property
| get_step() -> intReturns the number of steps the trainer has performed
Returns:
the step count of the trainer
| @property
| threaded() -> boolWhether or not to run the trainer in a thread. True allows the trainer to update the policy while the environment is taking steps. Set to False to enforce strict on-policy updates (i.e. don't update the policy when taking steps.)
| @property
| should_still_train() -> boolReturns whether or not the trainer should train. A Trainer could stop training if it wasn't training to begin with, or if max_steps is reached.
| @property
| reward_buffer() -> Deque[float]Returns the reward buffer. The reward buffer contains the cumulative rewards of the most recent episodes completed by agents using this trainer.
Returns:
the reward buffer.
| @abc.abstractmethod
| save_model() -> NoneSaves model file(s) for the policy or policies associated with this trainer.
| @abc.abstractmethod
| end_episode()A signal that the Episode has ended. The buffer must be reset. Get only called when the academy resets.
| @abc.abstractmethod
| create_policy(parsed_behavior_id: BehaviorIdentifiers, behavior_spec: BehaviorSpec) -> PolicyCreates a Policy object
| @abc.abstractmethod
| add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> NoneAdds policy to trainer.
| get_policy(name_behavior_id: str) -> PolicyGets policy associated with name_behavior_id
Arguments:
name_behavior_id: Fully qualified behavior name
Returns:
Policy associated with name_behavior_id
| @abc.abstractmethod
| advance() -> NoneAdvances the trainer. Typically, this means grabbing trajectories from all subscribed trajectory queues (self.trajectory_queues), and updating a policy using the steps in them, and if needed pushing a new policy onto the right policy queues (self.policy_queues).
| publish_policy_queue(policy_queue: AgentManagerQueue[Policy]) -> NoneAdds a policy queue to the list of queues to publish to when this Trainer makes a policy update
Arguments:
policy_queue: Policy queue to publish to.
| subscribe_trajectory_queue(trajectory_queue: AgentManagerQueue[Trajectory]) -> NoneAdds a trajectory queue to the list of queues for the trainer to ingest Trajectories from.
Arguments:
trajectory_queue: Trajectory queue to read from.
deep_update_dict(d: Dict, update_d: Mapping) -> NoneSimilar to dict.update(), but works for nested dicts of dicts as well.
@attr.s(auto_attribs=True)
class RewardSignalSettings() | @staticmethod
| structure(d: Mapping, t: type) -> AnyHelper method to structure a Dict of RewardSignalSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle the special Enum selection of RewardSignalSettings classes.
@attr.s(auto_attribs=True)
class ParameterRandomizationSettings(abc.ABC) | __str__() -> strHelper method to output sampler stats to console.
| @staticmethod
| structure(d: Union[Mapping, float], t: type) -> "ParameterRandomizationSettings"Helper method to a ParameterRandomizationSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle the special Enum selection of ParameterRandomizationSettings classes.
| @staticmethod
| unstructure(d: "ParameterRandomizationSettings") -> MappingHelper method to a ParameterRandomizationSettings class. Meant to be registered with cattr.register_unstructure_hook() and called with cattr.unstructure().
| @abc.abstractmethod
| apply(key: str, env_channel: EnvironmentParametersChannel) -> NoneHelper method to send sampler settings over EnvironmentParametersChannel Calls the appropriate sampler type set method.
Arguments:
key: environment parameter to be sampledenv_channel: The EnvironmentParametersChannel to communicate sampler settings to environment
@attr.s(auto_attribs=True)
class ConstantSettings(ParameterRandomizationSettings) | __str__() -> strHelper method to output sampler stats to console.
| apply(key: str, env_channel: EnvironmentParametersChannel) -> NoneHelper method to send sampler settings over EnvironmentParametersChannel Calls the constant sampler type set method.
Arguments:
key: environment parameter to be sampledenv_channel: The EnvironmentParametersChannel to communicate sampler settings to environment
@attr.s(auto_attribs=True)
class UniformSettings(ParameterRandomizationSettings) | __str__() -> strHelper method to output sampler stats to console.
| apply(key: str, env_channel: EnvironmentParametersChannel) -> NoneHelper method to send sampler settings over EnvironmentParametersChannel Calls the uniform sampler type set method.
Arguments:
key: environment parameter to be sampledenv_channel: The EnvironmentParametersChannel to communicate sampler settings to environment
@attr.s(auto_attribs=True)
class GaussianSettings(ParameterRandomizationSettings) | __str__() -> strHelper method to output sampler stats to console.
| apply(key: str, env_channel: EnvironmentParametersChannel) -> NoneHelper method to send sampler settings over EnvironmentParametersChannel Calls the gaussian sampler type set method.
Arguments:
key: environment parameter to be sampledenv_channel: The EnvironmentParametersChannel to communicate sampler settings to environment
@attr.s(auto_attribs=True)
class MultiRangeUniformSettings(ParameterRandomizationSettings) | __str__() -> strHelper method to output sampler stats to console.
| apply(key: str, env_channel: EnvironmentParametersChannel) -> NoneHelper method to send sampler settings over EnvironmentParametersChannel Calls the multirangeuniform sampler type set method.
Arguments:
key: environment parameter to be sampledenv_channel: The EnvironmentParametersChannel to communicate sampler settings to environment
@attr.s(auto_attribs=True)
class CompletionCriteriaSettings()CompletionCriteriaSettings contains the information needed to figure out if the next lesson must start.
| need_increment(progress: float, reward_buffer: List[float], smoothing: float) -> Tuple[bool, float]Given measures, this method returns a boolean indicating if the lesson needs to change now, and a float corresponding to the new smoothed value.
@attr.s(auto_attribs=True)
class Lesson()Gathers the data of one lesson for one environment parameter including its name, the condition that must be fullfiled for the lesson to be completed and a sampler for the environment parameter. If the completion_criteria is None, then this is the last lesson in the curriculum.
@attr.s(auto_attribs=True)
class EnvironmentParameterSettings()EnvironmentParameterSettings is an ordered list of lessons for one environment parameter.
| @staticmethod
| structure(d: Mapping, t: type) -> Dict[str, "EnvironmentParameterSettings"]Helper method to structure a Dict of EnvironmentParameterSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure().
@attr.s(auto_attribs=True)
class TrainerSettings(ExportableSettings) | @staticmethod
| structure(d: Mapping, t: type) -> AnyHelper method to structure a TrainerSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure().
@attr.s(auto_attribs=True)
class CheckpointSettings() | prioritize_resume_init() -> NonePrioritize explicit command line resume/init over conflicting yaml options. if both resume/init are set at one place use resume
@attr.s(auto_attribs=True)
class RunOptions(ExportableSettings) | @staticmethod
| from_argparse(args: argparse.Namespace) -> "RunOptions"Takes an argparse.Namespace as specified in parse_command_line, loads input configuration files
from file paths, and converts to a RunOptions instance.
Arguments:
args: collection of command-line parameters passed to mlagents-learn
Returns:
RunOptions representing the passed in arguments, with trainer config, curriculum and sampler configs loaded from files.