diff --git a/README.md b/README.md index 00bf842..231085a 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,24 @@ -# RoboDesk +# RoboDesk with A Diverse Set of Distractors -[![PyPI](https://img.shields.io/pypi/v/robodesk.svg)](https://pypi.python.org/pypi/robodesk/#history) +A Multi-Task Reinforcement Learning Benchmark with A Diverse Set of Distractors -A Multi-Task Reinforcement Learning Benchmark + -![Robodesk Banner](https://i.imgur.com/1qp1SUh.gif) +This repository contains a version of [RoboDesk](https://github.com/google-research/robodesk) that supports a rich set of challenging distractors, including camera and lighting noises, and even natural video noises. The distractors can be individually switched on or off. When they are all off (default), the environment behaves identically to the original RoboDesk. Support for these distractors is added by [Tongzhou Wang](https://ssnl.github.io/). -If you find this open source release useful, please reference in your paper: +This environment is designed for advancing learning with noisy observations and rewards, a step beyond controlled toy environments and towards more realistic cases. Here is a (likely incomplete) list of projects using this environment: ++ [Denoised MDPs: Learning World Models Better Than The World Itself](https://ssnl.github.io/denoised_mdp). ICML 2022. + +If you find this open source release useful, please reference the following entries in your paper: ``` +@misc{wang2022robodeskdistractor, + author = {Tongzhou Wang}, + title = {RoboDesk with A Diverse Set of Distractors}, + year = {2022}, + howpublished = {\url{https://github.com/SsnL/robodesk}}, +} + @misc{kannan2021robodesk, author = {Harini Kannan and Danijar Hafner and Chelsea Finn and Dumitru Erhan}, title = {RoboDesk: A Multi-Task Reinforcement Learning Benchmark}, @@ -23,11 +33,14 @@ If you find this open source release useful, please reference in your paper: - **Complexity:** The high-dimensional image inputs contain objects of different shapes and colors, whose initial positions are randomized to avoid naive memorization and require learning algorithms to generalize. - **Robustness:** We carefully designed and tested RoboDesk to ensure fast and stable physics simulation. This avoids objects from intersecting, getting stuck, or quickly flying away, a common problem with some existing environments. - **Lightweight:** RoboDesk comes as a self-contained Python package with few dependencies. The source code is clean and pragmatic, making it a useful blueprint for creating new MuJoCo environments. +- **Distractors:** RoboDesk provides an easy-to-use API to turn on various distractors, i.e., environment noises. Available distractors include + - Shaky and flickering *environment headlights*; + - Shaky *camera movements*; + - *TV* playing natural videos, with hue controlled by buttons; + - Noisy *button sensor*, affecting the indicator lights on desks and TV hue. ## Training Agents -Installation: `pip3 install -U robodesk` - The environment follows the [OpenAI Gym][gym] interface: ```py @@ -48,9 +61,11 @@ while not done: ## Tasks -![Robodesk Tasks](https://i.imgur.com/OwTT2pk.gif) +Crucially, the `tv_green_hue` task gives out reward based on a distractor (TV image green-ness), but its optimal strategy is agnostic of the distractor state (i.e., simply always pushing the button). -The behaviors above were learned using the [Dreamer](https://github.com/danijar/dreamer) agent. These policies have been learned from scratch and only from pixels, not proprioceptive states. +For an environment with **all** distractors and `tv_green_hue`, here is an example of a signal-noise factorization identified by a [Denoised MDP](https://ssnl.github.io/denoised_mdp) model: + +https://user-images.githubusercontent.com/5674597/172927710-84c805dd-4326-4064-9079-237c26102812.mp4 | Task | Description | | :-------- | :---------- | @@ -63,14 +78,29 @@ The behaviors above were learned using the [Dreamer](https://github.com/danijar/ | `flat_block_in_shelf` | Push the green flat block into the shelf, navigating around the other blocks. | | `lift_upright_block` | Grasp the blue upright block and lift it above the table. | | `lift_ball` | Grasp the magenta ball and lift it above the table. | +| `tv_green_hue` | Push the green button to affect the TV hue to be more green. | ## Environment Details ### Constructor +Two entry points are available: ++ `robodesk.RoboDesk`: A regular RoboDesk environment with top-down camera view of the desk. ++ `robodesk.RoboDeskWithTV`: A RoboDesk environment with a TV placed in the scene, and a further camera view looking at three desks and the TV. The leftmost desk is where the robot operates. + ```py -robodesk.RoboDesk(task='open_slide', reward='dense', action_repeat=1, episode_length=500, image_size=64) +# Regular environment +robodesk.RoboDesk(task='open_slide', reward='dense', action_repeat=1, + episode_length=500, image_size=64) + +# Environment with noisy camera +robodesk.RoboDesk(task='open_slide', reward='dense', action_repeat=1, + episode_length=500, image_size=64, distractors={'camera'}) + +# Environment with a TV in scene and all distractors turned on +robodesk.RoboDeskWithTV(task='open_slide', reward='dense', action_repeat=1, + episode_length=500, image_size=64, distractors="all") ``` | Parameter | Description | @@ -80,6 +110,8 @@ robodesk.RoboDesk(task='open_slide', reward='dense', action_repeat=1, episode_le | `action_repeat` | Reduces the control frequency by applying each action multiple times. This is faster than using an environment wrapper because only the needed images are rendered. | | `episode_length` | Time limit for the episode, can be `None`. | | `image_size` | Size of the image observations in pixels, used for both height and width. | +| `distractors` | Configures distractors. `"all"` turns on all distractors, `"none"` turns off all of them.
Can also be a subset of all available distractors: `{'camera', 'env_light', 'button'}` for `RoboDesk`; and `{'camera', 'env_light', 'button', 'tv}` for `RoboDeskWithTV`. | +| `tv_video_file_pattern`
(Only for `RoboDeskWithTV`) | Glob pattern that specifies the TV video files to use. Requires [`scikit-video`](http://www.scikit-video.org/stable/) to be installed. | ### Reward diff --git a/animation.gif b/animation.gif new file mode 100644 index 0000000..891a081 Binary files /dev/null and b/animation.gif differ diff --git a/robodesk/__init__.py b/robodesk/__init__.py index cecab33..ec6edfd 100644 --- a/robodesk/__init__.py +++ b/robodesk/__init__.py @@ -1 +1,3 @@ -from .robodesk import RoboDesk +from .robodesk import RoboDesk, RoboDeskWithTV + +__all__ = ['RoboDesk', 'RoboDeskWithTV'] diff --git a/robodesk/assets/desk.xml b/robodesk/assets/desk.xml index f8f5652..2c4a740 100644 --- a/robodesk/assets/desk.xml +++ b/robodesk/assets/desk.xml @@ -172,27 +172,49 @@ - + + + + + + - + + + - + + + diff --git a/robodesk/assets/desks_with_tv.xml b/robodesk/assets/desks_with_tv.xml new file mode 100644 index 0000000..61a6e6d --- /dev/null +++ b/robodesk/assets/desks_with_tv.xml @@ -0,0 +1,488 @@ + + + + + diff --git a/robodesk/robodesk.py b/robodesk/robodesk.py index 7c95c42..3ba8eb3 100644 --- a/robodesk/robodesk.py +++ b/robodesk/robodesk.py @@ -5,6 +5,7 @@ from __future__ import print_function import os +import glob from dm_control import mujoco from dm_control.utils import inverse_kinematics @@ -12,16 +13,85 @@ import numpy as np from PIL import Image - -class RoboDesk(gym.Env): - """Multi-task manipulation environment.""" +from .utils import CameraSpec, NumPyRNGWrapper, EnvLightManager, TVManager, ButtonManager + + +class RoboDeskBase(gym.Env): + r""" + Multi-task manipulation environment. + + Common Arguments:: + + task (str): Task of the environment, defining the reward function. + Default: "open_slide". + reward (str): Type of the reward, also affecting the reward function. + Choices: "dense", "sparse", "success". Default: "dense". + action_repeat (int): Default: 1. + episode_length (int): Default: 500. + image_size (int): Default: 64. + + Following arguments are useful for advanced use cases and customization of the + environment, including settings for various distractors and noises. If you are + looking for a simple-to-use API for noisy environment with distractors, see + RoboDeskNoisy or RoboDeskNoisyWithTV. + + Distractors / Noises Arguments:: + + [ Environment Lighting ] + env_light_noise_strength (float): Controlling level of environment light + jittering and flickering. This tends to + lead to dimmer scene, so adjusting + `headlight_brightness` might be needed. + Should be within [0, 1]. Default: 0. + headlight_brightness (float): Adjusting the headlight level. Useful when the + rendering is darker than ideal, e.g., due to + noisy lighting. Default: 0.4. + + [ Button Sensor ] + button_sensor_noise_strength (float): Controlling level of noise in button + sensor reading. Many elements depend + on the amount the button pressed (e.g., + hue of TV). Turning this on makes the + button reading noisy. Notably this + affects the brightness of the lights, + and the TV hue that is controlled by + button. With this turned on (value > 0) + their brightness will reflect the + noisy sensor reading (rather than + having binary states on vs. off). + Should be within [0, 1]. Default: 0. + + [ Camera Location ] + camera_noise_strength (float): Controlling level of camera shaking. + Should be within [0, 1]. Default: 0. + + [ TV ] + tv_video_file_pattern (str): A glob pattern that selects videos to be played + on a TV in the environment. If `None`, TV will + not show any video. Default: None. + """ + + MODEL_PATH: str + CAMERA_SPEC: CameraSpec def __init__(self, task='open_slide', reward='dense', action_repeat=1, - episode_length=500, image_size=64): + episode_length=500, image_size=64, + headlight_brightness=0.4, + button_sensor_noise_strength=0, + env_light_noise_strength=0, + camera_noise_strength=0, + tv_video_file_pattern=None): assert reward in ('dense', 'sparse', 'success'), reward - model_path = os.path.join(os.path.dirname(__file__), 'assets/desk.xml') - self.physics = mujoco.Physics.from_xml_path(model_path) + # model = os.path.join(os.path.dirname(__file__), 'assets/desks_with_tv.xml') + self.physics = mujoco.Physics.from_xml_path(self.MODEL_PATH) + + # Adjust headlight + self.physics.model.vis.headlight.ambient[:] = headlight_brightness * 0.25 # default 0.4 -> 0.1 + self.physics.model.vis.headlight.diffuse[:] = headlight_brightness # default 0.4 -> 0.4 + self.physics.model.vis.headlight.specular[:] = headlight_brightness * 0.8 + 0.18 # default 0.4 -> 0.5 + + # Create a copy for IK self.physics_copy = self.physics.copy(share_model=True) self.physics_copy.data.qpos[:] = self.physics.data.qpos[:] @@ -35,6 +105,25 @@ def __init__(self, task='open_slide', reward='dense', action_repeat=1, self.reward = reward self.success = None + # RNG + # [env, button, camera, env_light, tv_init, tv_run] + seed, button_seed, cam_seed, env_light_seed, tv_seed = NumPyRNGWrapper.split_seed(seed=None, n=5) + self.np_rng = NumPyRNGWrapper(seed) + + # Managers of specific elements + button_manager = ButtonManager(self.physics, button_sensor_noise_strength, button_seed) + self.elem_managers = dict( + camera=self.CAMERA_SPEC.get_camera_manager(self.physics, camera_noise_strength, cam_seed), + button=button_manager, + env_light=EnvLightManager(self.physics, swing_scale=env_light_noise_strength, + flicker_scale=env_light_noise_strength, seed=env_light_seed), + tv=TVManager(self.physics, tv_video_file_pattern, button_manager, tv_seed) + ) + + # Noises + self.button_sensor_noise_strength = button_sensor_noise_strength + self.tv_video_file_pattern = tv_video_file_pattern + # Action space self.end_effector_scale = 0.01 self.wrist_scale = 0.02 @@ -44,6 +133,7 @@ def __init__(self, task='open_slide', reward='dense', action_repeat=1, self.action_repeat = action_repeat self.num_steps = 0 self.episode_length = episode_length + assert episode_length % action_repeat == 0, "episode_length must be divisible by action_repeat" self.original_pos = {} self.previous_z_angle = None @@ -87,6 +177,7 @@ def __init__(self, task='open_slide', reward='dense', action_repeat=1, 'ball', reward_type)), 'lift_flat_block': (lambda reward_type: self._lift_block( 'flat_block', reward_type)), + 'tv_green_hue': (lambda reward_type: self._tv_hue('green', reward_type)) } self.core_tasks = list(self.reward_functions)[0:12] @@ -94,6 +185,22 @@ def __init__(self, task='open_slide', reward='dense', action_repeat=1, self.task = task # pylint: enable=g-long-lambda + def seed(self, seed=None): + seed, button_seed, cam_seed, env_light_seed, tv_seed = NumPyRNGWrapper.split_seed(seed, 5) + self.np_rng.seed(seed) + self.elem_managers['button'].seed(button_seed) + self.elem_managers['camera'].seed(cam_seed) + self.elem_managers['env_light'].seed(env_light_seed) + self.elem_managers['tv'].seed(tv_seed) + + def get_random_state(self): + return (self.np_rng.get_random_state(), {k: m.get_random_state() for k, m in self.elem_managers.items()}) + + def set_random_state(self, random_state): + self.np_rng.set_random_state(random_state[0]) + for k, m in self.elem_managers.items(): + m.set_random_state(random_state[1][k]) + @property def action_space(self): return gym.spaces.Box(-np.ones(self.action_dim), np.ones(self.action_dim)) @@ -112,25 +219,12 @@ def observation_space(self): return gym.spaces.Dict(spaces) def render(self, mode='rgb_array', resize=True): - params = {'distance': 1.8, 'azimuth': 90, 'elevation': -60, - 'crop_box': (16.75, 25.0, 105.0, 88.75), 'size': 120} - camera = mujoco.Camera( - physics=self.physics, height=params['size'], - width=params['size'], camera_id=-1) - camera._render_camera.distance = params['distance'] # pylint: disable=protected-access - camera._render_camera.azimuth = params['azimuth'] # pylint: disable=protected-access - camera._render_camera.elevation = params['elevation'] # pylint: disable=protected-access - camera._render_camera.lookat[:] = [0, 0.535, 1.1] # pylint: disable=protected-access - - image = camera.render(depth=False, segmentation=False) - camera._scene.free() # pylint: disable=protected-access - - if resize: - image = Image.fromarray(image).crop(box=params['crop_box']) - image = image.resize([self.image_size, self.image_size], - resample=Image.ANTIALIAS) - image = np.asarray(image) - return image + for m in self.elem_managers.values(): + m.pre_render() + return self.elem_managers['camera'].render( + render_size=min(480, int(self.image_size / 64 * 120)), + image_size=self.image_size if resize else None, + ) def _ik(self, pos): out = inverse_kinematics.qpos_from_site_pose( @@ -176,6 +270,9 @@ def step(self, action): self.physics.step() self.physics_copy.data.qpos[:] = self.physics.data.qpos[:] + for m in self.elem_managers.values(): + m.step() + if self.reward == 'dense': total_reward += self._get_task_reward(self.task, 'dense_reward') elif self.reward == 'sparse': @@ -194,12 +291,13 @@ def step(self, action): done = True else: done = False + return self._get_obs(), total_reward, done, {'discount': 1.0} def _get_init_robot_pos(self): init_joint_pose = np.array( [-0.30, -0.4, 0.28, -2.5, 0.13, 1.87, 0.91, 0.01, 0.01]) - init_joint_pose += 0.15 * np.random.uniform( + init_joint_pose += 0.15 * self.np_rng.uniform( low=self.physics.model.actuator_ctrlrange[:self.num_joints, 0], high=self.physics.model.actuator_ctrlrange[:self.num_joints, 1]) return init_joint_pose @@ -212,22 +310,26 @@ def reset(self): self.physics.reset() # Randomize object positions. - self.physics.named.data.qpos['drawer_joint'] -= 0.10 * np.random.random() - self.physics.named.data.qpos['slide_joint'] += 0.20 * np.random.random() + self.physics.named.data.qpos['drawer_joint'] -= 0.10 * self.np_rng.random() + self.physics.named.data.qpos['slide_joint'] += 0.20 * self.np_rng.random() - self.physics.named.data.qpos['flat_block'][0] += 0.3 * np.random.random() - self.physics.named.data.qpos['flat_block'][1] += 0.07 * np.random.random() - self.physics.named.data.qpos['ball'][0] += 0.48 * np.random.random() - self.physics.named.data.qpos['ball'][1] += 0.08 * np.random.random() + self.physics.named.data.qpos['flat_block'][0] += 0.3 * self.np_rng.random() + self.physics.named.data.qpos['flat_block'][1] += 0.07 * self.np_rng.random() + self.physics.named.data.qpos['ball'][0] += 0.48 * self.np_rng.random() + self.physics.named.data.qpos['ball'][1] += 0.08 * self.np_rng.random() self.physics.named.data.qpos['upright_block'][0] += ( - 0.3 * np.random.random() + 0.05) + 0.3 * self.np_rng.random() + 0.05) self.physics.named.data.qpos['upright_block'][1] += ( - 0.05 * np.random.random()) + 0.05 * self.np_rng.random()) # Set robot position. self.physics.data.qpos[:self.num_joints] = self._get_init_robot_pos() self.physics.data.qvel[:self.num_joints] = np.zeros(9) + # Reset managers + for m in self.elem_managers.values(): + m.reset() + # Relax object intersections. self.physics.forward() @@ -239,6 +341,7 @@ def reset(self): self.original_pos['flat_block'] = self.physics.named.data.xpos['flat_block'] self.drawer_opened = False + return self._get_obs() def _did_not_move(self, block_name): @@ -372,6 +475,22 @@ def _lift_block(self, block_name, reward_type='dense_reward'): threshold = success_criteria[block_name] return 1 * (self.physics.named.data.xpos[block_name][2] > threshold) + def _tv_hue(self, color, reward_type='dense_reward'): + tv_manager: TVManager = self.elem_managers['tv'] + assert tv_manager.tv_enabled, "TV-based reward can only be used when TV is enabled" + cidx = {'red': 0, 'green': 1, 'blue': 2}[color] + + self.elem_managers['tv'].ensure_texure_updated() + tv_reward = self.elem_managers['tv'].tv_tex[..., cidx].mean() / 255 + if reward_type == 'success': + return tv_reward + elif reward_type == 'dense_reward': + dist_reward = self._get_dist_reward( + self.physics.named.data.xpos[color + '_button']) + press_button = ( + self.physics.named.data.qpos[color + '_light'][0] < -0.00453) + return 0.5 * tv_reward + 0.25 * dist_reward + 0.25 * press_button + def _get_task_reward(self, task, reward_type): reward = self.reward_functions[task](reward_type) reward = max(0, min(1, reward)) @@ -384,3 +503,94 @@ def _get_obs(self): 'end_effector': self.physics.named.data.site_xpos['end_effector'], 'qpos_objects': self.physics.data.qvel[self.num_joints:].copy(), 'qvel_objects': self.physics.data.qvel[self.num_joints:].copy()} + + +class RoboDesk(RoboDeskBase): + r""" + Multi-task manipulation environment. + + Arguments:: + + task (str): Task of the environment, defining the reward function. + Default: "open_slide". + reward (str): Type of the reward, also affecting the reward function. + Choices: "dense", "sparse", "success". Default: "dense". + action_repeat (int): Default: 1. + episode_length (int): Default: 500. + image_size (int): Default: 64. + distractors (str or set): Subset of ``{'button', 'env_light', 'camera'}``, + specifying which noise distractors are enabled. + String "all" means all of them, and string "none" + means the empty set. Default: "none". + """ + + + MODEL_PATH = os.path.join(os.path.dirname(__file__), 'assets/desk.xml') + CAMERA_SPEC = CameraSpec() + AVAILABLE_DISTRACTORS = {'button', 'env_light', 'camera'} + + def __init__(self, task='open_slide', reward='dense', action_repeat=1, + episode_length=500, image_size=64, distractors='none'): + if distractors == 'all': + distractors = self.AVAILABLE_DISTRACTORS + elif distractors == 'none': + distractors = set() + assert isinstance(distractors, set) and self.AVAILABLE_DISTRACTORS.issuperset(distractors), \ + ('RoboDesk supports `distractor` argument being "all", "none", or a set ' + f'containing elements of {self.AVAILABLE_DISTRACTORS}') + super().__init__(task, reward, action_repeat, episode_length, image_size, + button_sensor_noise_strength=1 if 'button' in distractors else 0, + env_light_noise_strength=0.7 if 'env_light' in distractors else 0, + headlight_brightness=0.9 if 'env_light' in distractors else 0.4, + camera_noise_strength=1 if 'camera' in distractors else 0) + + +class RoboDeskWithTV(RoboDeskBase): + r""" + Multi-task manipulation environment with TV. Different from `RoboDesk`, this + scene contains a TV in addition to the desk. The TV can sequentially play + videos from disk (see ``tv_video_file_pattern`` argument). + + Arguments:: + + task (str): Task of the environment, defining the reward function. + Default: "open_slide". + reward (str): Type of the reward, also affecting the reward function. + Choices: "dense", "sparse", "success". Default: "dense". + action_repeat (int): Default: 1. + episode_length (int): Default: 500. + image_size (int): Default: 96 (higher than default RoboDesk due to further camera view). + distractors (str or set): Subset of ``{'button', 'env_light', 'camera', 'tv'}``, + specifying which noise distractors are enabled. + String "all" means all of them, and string "none" + means the empty set. Default: "none". + tv_video_file_pattern (str): A glob pattern that selects videos to be played + on a TV in the environment. If `None`, TV will + not show any video. Default: None. + """ + + MODEL_PATH = os.path.join(os.path.dirname(__file__), 'assets/desks_with_tv.xml') + CAMERA_SPEC = CameraSpec( + elevation_offset=13, + distance_offset=1.15, + lookat_offset=np.array([0.875, 0.4, 0.875]), + cropbox_for_render_size_120=np.array([0, 25.892, 120, 120]), + ) + AVAILABLE_DISTRACTORS = {'button', 'env_light', 'camera', 'tv'} + + def __init__(self, task='open_slide', reward='dense', action_repeat=1, + episode_length=500, image_size=96, distractors='none', + tv_video_file_pattern=None): + if distractors == 'all': + distractors = self.AVAILABLE_DISTRACTORS + elif distractors == 'none': + distractors = set() + assert isinstance(distractors, set) and self.AVAILABLE_DISTRACTORS.issuperset(distractors), \ + ('RoboDesk supports `distractor` argument being "all", "none", or a set ' + f'containing elements of {self.AVAILABLE_DISTRACTORS}') + super().__init__(task, reward, action_repeat, episode_length, image_size, + tv_video_file_pattern=tv_video_file_pattern if 'tv' in distractors else None, + button_sensor_noise_strength=1 if 'button' in distractors else 0, + env_light_noise_strength=1 if 'env_light' in distractors else 0, + headlight_brightness=0.9 if 'env_light' in distractors else 0.4, + camera_noise_strength=1 if 'camera' in distractors else 0) diff --git a/robodesk/utils.py b/robodesk/utils.py new file mode 100644 index 0000000..280d9fd --- /dev/null +++ b/robodesk/utils.py @@ -0,0 +1,606 @@ +import glob + +import cv2 +import numpy as np +from PIL import Image + +from dm_control import mujoco +from dm_control.mujoco.engine import Pose + + +class NumPyRNGWrapper(object): + r""" + A wrapper for managing NumPy old and new random APIs. + """ + + def __init__(self, seed): + self.seed(seed) + + def seed(self, seed=None): + # seed: Union[int, np.random.SeedSequence, None] + if hasattr(np.random, 'Generator'): + self.rng = np.random.Generator(np.random.PCG64(seed)) + else: + self.np_rng = np.random.RandomState(seed=seed) + + def uniform(self, low=0.0, high=1.0, size=None): + return self.rng.uniform(low=low, high=high, size=size) + + def random(self): + return self.rng.random() + + def integers(self, *args, **kwargs): + if hasattr(np.random, 'Generator'): + return self.rng.integers(*args, **kwargs) + else: + return self.rng.randint(*args, **kwargs) + + def normal(self, loc=0, scale=1, size=None): + return self.rng.normal(loc=loc, scale=scale, size=size) + + def permutation(self, list): + return self.rng.permutation(list) + + def get_random_state(self): + if hasattr(np.random, 'Generator'): + return self.rng.bit_generator.state + else: + return self.rng.get_state() + + def set_random_state(self, random_state): + if hasattr(np.random, 'Generator'): + self.rng.bit_generator.state = random_state + else: + self.rng.set_state(random_state) + + @staticmethod + def split_seed(seed, n: int): + # Splits a seed into n seeds + if hasattr(np.random, 'Generator'): + if not isinstance(seed, np.random.SeedSequence): + seed = np.random.SeedSequence(seed) + seeds = seed.spawn(n) + else: + seeds = np.random.RandomState(seed=seed).randint(2 ** 32, size=n).tolist() + return seeds + + +# Get camera loc from a dm_control.mujoco.engine.Pose +def pose2from(pose): + x, y, z = pose.lookat + flat_distance = pose.distance * np.cos(pose.elevation * np.pi / 180) + lookfrom = np.array([ + x - flat_distance * np.cos(pose.azimuth * np.pi / 180), + y - flat_distance * np.sin(pose.azimuth * np.pi / 180), + z - pose.distance * np.sin(pose.elevation * np.pi / 180), + ]) + return lookfrom + + +# Construct a dm_control.mujoco.engine.Pose from camera loc and target loc +def from2pose(lookfrom, lookat): + at2from = lookat - lookfrom + distance = np.linalg.norm(at2from) + elevation = np.arcsin(at2from[-1] / distance) * 180 / np.pi + azimuth = np.arctan2(at2from[1], at2from[0]) * 180 / np.pi + return Pose( + lookat=lookat, + distance=distance, + azimuth=azimuth, + elevation=elevation, + ) + + +class SmoothRandomWalker(object): + r""" + A simple implementation of a smooth random N-d walk. We only really just + choose the `pullr` argument, which roughly says how large the range is. But + below is the entire process description for completeness. + + + The random walk tracks (location, velocity, acceleration). At each step, + + + Acceration update: + + A random N-d acceleration change (with random direction) is sampled and + applied, simulating random forces. + + If the location is `> pullr` distance away from origin, a pulling + acceleration `-pull * location` is used instead. + + + Velocity update: + + Additively affected by the acceration, and then decayed by `vdecay`. + + + Location update: + + Additively affected by the velocity. + + Finally, `loc_scale` optionally scales the location before yielding it. + + `time_scale` roughly tunes the simulation speed. + """ + def __init__(self, dim=3, pull=0.1, pullr=0.075, randastddev=0.005, + vdecay=0.9, loc_scale=1, time_scale=0.5, *, np_rng: NumPyRNGWrapper): + self.dim = dim + self.pull = pull + self.pullr = pullr + self.randastddev = randastddev + self.vdecay = vdecay + self.loc_scale = loc_scale + self.time_scale = time_scale + self.np_rng = np_rng + + def __iter__(self): + if self.loc_scale == 0: + while True: + yield 0 + else: + loc = self.np_rng.normal(size=[self.dim], scale=self.pullr / np.sqrt(self.dim)) + v = self.np_rng.normal(size=[self.dim], scale=self.randastddev) + while True: + yield loc * self.loc_scale + a = self.np_rng.normal(size=[self.dim], scale=self.randastddev) + if np.linalg.norm(loc) >= self.pullr: + a = -self.pull * loc + v += a * self.time_scale + v = v * (self.vdecay ** self.time_scale) + loc = loc + v * self.time_scale + + +class CameraSpec(object): + r""" + Represents a camera at a specific location aimed at a specific direction. + This is specified via offsets with the default camera location (`default_pose`). + + The camera can be noisily jittered, controlled via `jitter_scale` argument. + + NB: Jittering is applied at the default camera location, before applying + offsets so that the same `jitter_scale` roughly translates to similar + amounts of noise regardless of the offset. + """ + default_pose: Pose = Pose( + elevation=-60, + distance=1.8, + azimuth=90, + lookat=np.array([0, 0.535, 1.1]), + ) + default_lookfrom: np.ndarray = pose2from(default_pose) + + offset_pose: Pose + cropbox_for_render_size_120: np.ndarray + + def __init__(self, elevation_offset=0, distance_offset=0, azimuth_offset=0, + lookat_offset=np.array((0, 0, 0)), + cropbox_for_render_size_120=np.array((16.75, 25.0, 105.0, 88.75))): + self.offset_pose = Pose( + elevation=elevation_offset, + distance=distance_offset, + azimuth=azimuth_offset, + lookat=lookat_offset, + ) + self.cropbox_for_render_size_120 = cropbox_for_render_size_120 + + def get_camera_manager(self, physics, jitter_scale, np_rng): + return CameraManager(self, physics, jitter_scale, np_rng) + + +class EnvElementManager(object): + r""" + Managing a single element of the environment. + """ + + def reset(self): + r""" + Called in env.reset() + """ + pass + + def step(self): + r""" + Called after each physics step. I.e., if action_repeat = N, this gets + called N times in env.step(...), *before* any rendering or reward + computation. + """ + pass + + def pre_render(self): + r""" + Called in env.render(...) right before rendering. + """ + pass + + def seed(self, seed): + pass + + def get_random_state(self): + pass + + def set_random_state(self, random_state): + pass + + + +class CameraManager(EnvElementManager): + r""" + Handles noisy camera. + """ + + def __init__(self, camera_spec: 'CameraSpec', physics, jitter_scale, seed): + self.camera_spec = camera_spec + self.physics = physics + self.np_rng = NumPyRNGWrapper(seed) + self.jitter_scale = jitter_scale + self.reset() + + def seed(self, seed): + self.np_rng.seed(seed) + + def get_random_state(self): + return self.np_rng.get_random_state() + + def set_random_state(self, random_state): + self.np_rng.set_random_state(random_state) + + def reset(self): + self.jitter_iter = dict( + lookfrom=iter(SmoothRandomWalker(loc_scale=0.3 * self.jitter_scale, + np_rng=self.np_rng)), + lookat=iter(SmoothRandomWalker(loc_scale=0.65 * self.jitter_scale, + np_rng=self.np_rng)), + ) + self.step() + + def step(self): + self.jitter_amount = {k: next(v) for k, v in self.jitter_iter.items()} + + def render(self, render_size, image_size=None): + # Apply jittering + pose = from2pose( + self.camera_spec.default_lookfrom + self.jitter_amount['lookfrom'], + self.camera_spec.default_pose.lookat + self.jitter_amount['lookat'], + ) + # Apply offset + pose = Pose( + elevation=pose.elevation + self.camera_spec.offset_pose.elevation, + distance=pose.distance + self.camera_spec.offset_pose.distance, + azimuth=pose.azimuth + self.camera_spec.offset_pose.azimuth, + lookat=pose.lookat + self.camera_spec.offset_pose.lookat, + ) + # Apply pose to camera + camera = mujoco.MovableCamera( + physics=self.physics, height=render_size, width=render_size) + camera.set_pose(*pose) + # Render + image = camera.render(depth=False, segmentation=False) + camera._scene.free() + if image_size is not None: + cropbox = self.camera_spec.cropbox_for_render_size_120 * render_size / 120 + image = Image.fromarray(image).crop(box=cropbox) + image = image.resize([image_size, image_size], resample=Image.LANCZOS) + image = np.asarray(image) + return image + + +class EnvLightManager(EnvElementManager): + r""" + Helps adjusting env lights (total 3) according to flickering and jittering/swinging. + """ + + # Radius of light swing + SWING_RADIUS = np.array([0.5, 1 / np.sqrt(2), 1]) + # Each time step, apply an angular velocity sampled from N(mu, sigma).clamp(0, 0.8) + SWING_ANGULAR_SPEED_MEAN = np.array([np.sqrt(5), 1, 1 / np.sqrt(3)]) / 30 + SWING_ANGULAR_SPEED_STDDEV = np.array([np.sqrt(5), 1, 1 / np.sqrt(3)]) / 60 + SWING_ANGULAR_SPEED_MAX = 0.8 + + # Flicking is brightness adjustment, modelling as a sine curve + noise: + # brightness <- sin( T * period + t0 ) * magnitude + offset + N(0, sigma) + # brightness <- clamp(brightness, [0, 1]) + FLICKER_PERIOD = np.array([1, np.sqrt(8), np.sqrt(28)]) / 15 + FLICKER_SINE_MAGNITUDE = np.array([0.3, 0.3, 0.3]) + FLICKER_SINE_OFFSET = -0.15 + FLICKER_NOISE_STDDEV = np.array([0.15, 0.15, 0.15]) + FLICKER_BRIGHTNESS_MAX = 1 + + def __init__(self, physics, swing_scale, flicker_scale, seed): + self.physics = physics + self.swing_scale = swing_scale + self.flicker_scale = flicker_scale + self.np_rng = NumPyRNGWrapper(seed) + self.initial_diffuse = self.physics.model.light_diffuse[:, 0].copy() + self.initial_pos = self.physics.model.light_pos.copy() + self.initial_sq_dist = (self.initial_pos ** 2).sum(-1) + self.reset() + + def seed(self, seed): + self.np_rng.seed(seed) + + def get_random_state(self): + return self.np_rng.get_random_state() + + def set_random_state(self, random_state): + self.np_rng.set_random_state(random_state) + + def reset(self): + self.swing_angle = self.np_rng.normal(size=3) + self.flicker_period_offset = self.np_rng.normal(size=3) + self.num_time_steps = 0 + self.mujoco_outdated = True + + def step(self): + self.mujoco_outdated = True + # Track timestep for flicker computation + self.num_time_steps += 1 + # Swing + swing_angular_speed = self.np_rng.normal( + loc=self.SWING_ANGULAR_SPEED_MEAN, + scale=self.SWING_ANGULAR_SPEED_STDDEV, + ) + swing_angular_speed = np.clip(swing_angular_speed, 0, self.SWING_ANGULAR_SPEED_MAX) + self.swing_angle += swing_angular_speed + self.swing_angle %= 2 * np.pi + + def pre_render(self): + if not self.mujoco_outdated: + return + # Emit to mujoco + t = self.num_time_steps + + # Swing + theta = self.swing_angle + radius = self.SWING_RADIUS * self.swing_scale # scale here + loc_delta = np.stack([np.sin(theta), np.cos(theta)], axis=-1) * radius[:, None] + self.physics.model.light_pos[:, :2] = loc_delta + self.initial_pos[:, :2] + curr_sq_dist = (self.physics.model.light_pos ** 2).sum(-1) + light_dir = -self.physics.model.light_pos / (curr_sq_dist ** 0.5)[:, None] + self.physics.model.light_dir[:] = light_dir + + # Flicker, i.e., brightness + br_mean = np.sin( + self.flicker_period_offset + t * self.FLICKER_PERIOD + ) * self.FLICKER_SINE_MAGNITUDE + self.FLICKER_SINE_OFFSET + br = self.np_rng.normal(loc=br_mean, scale=self.FLICKER_NOISE_STDDEV) + br = np.clip(br, 0, self.FLICKER_BRIGHTNESS_MAX) + # Swing further -> dimmer + distance_sq_ratio = self.initial_sq_dist / curr_sq_dist + br = br * distance_sq_ratio + # Apply flicker scaling here + br = self.flicker_scale * br + (1 - self.flicker_scale) * self.initial_diffuse + for ii in range(3): + self.physics.model.light_diffuse[ii, :] = br[ii] + + self.mujoco_outdated = False + + +class ButtonManager(EnvElementManager): + r""" + Button noise is an additive quantity that gets applied to normal button + readings. The noise follows a diffusion-like process like the following: + + # A hidden diffusion process that spreads to N(0, 1) + corr_noise_0 ~ N(0, 1) + corr_noise_t = alpha * corr_noise_{t-1} + (1 - alpha) * N(0, 1) + + # Real additive noise is just that plus a small noise + noise_t = corr_noise_scale * corr_noise_t + iid_noise_scale * N(0, 1) + """ + CORR_NOISE_DIFFUSE_ALPHA = 0.96 ** 0.5 + + def __init__(self, physics, noise_scale, seed): + self.physics = physics + self.noise_scale = noise_scale + if self.noise_scale != 0: + # Turn off original noiseless light mechanism. + # We will use the althernative one that supports noise. + # See NOTE [ Button to Light Mechanism ] in desk.xml. + for c in ['red', 'green', 'blue']: + physics.named.model.geom_rgba[f'{c}_light_rise_cylinder'][-1] = 0 + self.corr_noise_scale = noise_scale * 0.295 + self.iid_noise_scale = noise_scale * 0.0125 + self.np_rng = NumPyRNGWrapper(seed) + self.reset() + # button_to_light_offset=0.78, + + def seed(self, seed): + self.np_rng.seed(seed) + + def get_random_state(self): + return self.np_rng.get_random_state() + + def set_random_state(self, random_state): + self.np_rng.set_random_state(random_state) + + def get_normalized_button(self): + r""" + Returns normalized "how much is it pressed" for each button. + + 0 means not pressed at all. + 1 means fully pressed. + + In the noiseless setting (based on the initial RoboDesk xml), a + reading value > 0.9 is interpreted as "button being pressed", which + turns on the corresponding light which has alpha 0.4. This logic is + entirely handled by mujoco (and specified in xml). + + For noisy environments, the lights are more than just being binary, and + have brightness (alpha) levels. The linear between button and brightness + is coded in `_update_noises_and_lights()`. + + See also NOTE [ Button to Light Mechanism ] in desk.xml. + """ + if self.button_reading_outdated: + # get button joint from negated light joints + button_jnt = -np.asarray([ + self.physics.named.data.qpos['red_light'][0], + self.physics.named.data.qpos['green_light'][0], + self.physics.named.data.qpos['blue_light'][0], + ], dtype=np.float32) + # Convert JNT via linear tsfm + self.button_reading = button_jnt / 0.005 + self.noise + self.button_reading_outdated = False + return self.button_reading + + def _update_button_noises(self, corr_noise): + assert self.noise_scale != 0 + self.corr_noise = corr_noise + self.noise = self.corr_noise * self.corr_noise_scale + self.np_rng.normal(size=(3,)) * self.iid_noise_scale + self.button_reading_outdated = True + + def _update_lights_based_noisy_button(self): + assert self.noise_scale != 0 + # Update lights based on (noisy) reading: a linear transform from button reading to light alpha + light_alpha = (self.get_normalized_button() - 0.12) * 6 + 0.4 # 0.9 leads to very dim lights, so use an adjusted bias + light_alpha = np.clip(light_alpha, -1, 1) + for ci, c in enumerate(['red', 'green', 'blue']): + a = light_alpha[ci].item() + self.physics.named.model.geom_rgba[f'{c}_light_overlay'][-1] = max(a, 0) + self.physics.named.model.geom_rgba[f'{c}_light_neg_overlay'][-1] = max(-a, 0) + + def reset(self): + self.button_reading_outdated = True + if self.noise_scale == 0: + self.noise = 0 + else: + self._update_button_noises(self.np_rng.normal(size=(3,))) + self._update_lights_based_noisy_button() + + def step(self): + if self.noise_scale != 0: + self._update_button_noises( + self.corr_noise * self.CORR_NOISE_DIFFUSE_ALPHA + + self.np_rng.normal(size=(3,)) * (1 - self.CORR_NOISE_DIFFUSE_ALPHA) + ) # diffuse + self._update_lights_based_noisy_button() + + +from .video_source import RandomVideoSource, ConcatRollingImageSource + + +class TVManager(EnvElementManager): + r""" + NOTE [ TV Frame Sampling ] + + Unlike other elements of the environment, the TV (when enabled) consumes RNG at initialization, randomly + sampling the video files to load. As the consequence, this means that after initialization, this specific + environment's TV can only display the loaded files (regardless of `.seed/reset()` calls), and that different + seeds at initialization causes different set of videos being plays (regardless of `.seed/reset()` calls). + + While this is a bit contrary to the common assumption that calling `env.seed(s); env.reset()` should always make + `env` behave the same after wards, this is done for three reasons: + + Natural video noise is meant to be ignored. Having training and testing env using different sets of videos + better test the generalization of removal (i.e., whether the agent just overfit to the training videos). + + This follows the behavior of many RL codebases for learning with distractors. Notably, [1] introduces noisy + video backgrounds and utlizes this strategy (probably for the above reason). + + Loading videos is expensive. Doing so at every `.seed/reset()` is not practical. + + [1] Zhang, Amy, et al. "Learning invariant representations for reinforcement learning without reconstruction." + ICLR 2021. + """ + + def __init__(self, physics, video_file_pattern, button_manager: ButtonManager, seed, + button_controls_hue=np.array([0., 1., 0.])): # by default, only let green button control TV green hue + self.physics = physics + self.video_file_pattern = video_file_pattern + self.tv_enabled = video_file_pattern is not None + if self.video_file_pattern is None: + return + + # Get the mujoco texture for TV + assert 'tv_texture' in physics.named.model.tex_adr.axes.row.names, 'Model should have tv_texture' + tex_adr = physics.named.model.tex_adr['tv_texture'] + tex_w = physics.named.model.tex_width['tv_texture'] + tex_h = physics.named.model.tex_height['tv_texture'] + size = tex_w * tex_h * 3 + full_tex = physics.named.model.tex_rgb[tex_adr:tex_adr + size].reshape(tex_h, tex_w, 3) + front_h0 = tex_h * 3 // 6 + front_ht = tex_h * 4 // 6 + full_tex[:front_h0] = 0 + full_tex[front_ht:] = 0 + self.tv_tex = full_tex[front_h0:front_ht] + self.tv_texid = physics.named.model.mat_texid['tv_material'] + + # Fetch the TV video files + tv_video_files = sorted(glob.glob(video_file_pattern)) + self.tv_source = ConcatRollingImageSource( + [RandomVideoSource((60, 60), tv_video_files) for _ in range(2)]) + self.tv_source.seed(seed) + + # Button effects + self.button_manager = button_manager + self.button_controls_hue = button_controls_hue + + self.reset() + + def seed(self, seed): + if self.video_file_pattern is None: + return + self.tv_source.seed(seed) + + def get_random_state(self): + if self.video_file_pattern is None: + return None + return self.tv_source.get_random_state() + + def set_random_state(self, random_state): + if self.video_file_pattern is None: + assert random_state is None + return + self.tv_source.set_random_state(random_state) + + def reset(self): + if self.video_file_pattern is None: + return + self.tv_source.reset() + self.texure_outdated = True # whether the CPU mujoco model needs updating + self.mujoco_outdated = True # whether the GPU mujoco model needs updating + + def step(self): + if self.video_file_pattern is None: + return + self.tv_source.step() + self.texure_outdated = True + + def pre_render(self): + if self.video_file_pattern is None: + return + self.ensure_mujoco_updated() + + def ensure_texure_updated(self): + if self.video_file_pattern is None: + return + if self.texure_outdated: + img = self.tv_source.get_image() + + # adjust based on button: + # not pressed -> 0 hue change + # fully pressed -> 0.75 hue change + lerp_w = self.button_manager.get_normalized_button() * self.button_controls_hue * 0.75 + lerp_w = np.clip(lerp_w, 0, 1) + img = np.round(img * (1 - lerp_w) + 255 * lerp_w).astype(np.uint8) + + cv2.resize( + img, + self.tv_tex.shape[:2][::-1], + dst=self.tv_tex, + interpolation=cv2.INTER_NEAREST, + ) + self.mujoco_outdated = True + self.texure_outdated = False + + def ensure_mujoco_updated(self): + if self.video_file_pattern is None: + return + self.ensure_texure_updated() + if self.mujoco_outdated: + from dm_control.mujoco.wrapper.mjbindings import mjlib + # push updated tex to GPU + with self.physics.contexts.gl.make_current() as ctx: + ctx.call( + mjlib.mjr_uploadTexture, + self.physics.model.ptr, + self.physics.contexts.mujoco.ptr, + self.tv_texid, + ) + self.mujoco_outdated = False + diff --git a/robodesk/video_source.py b/robodesk/video_source.py new file mode 100644 index 0000000..3d0b5e5 --- /dev/null +++ b/robodesk/video_source.py @@ -0,0 +1,146 @@ +import numpy as np +import cv2 + +from .utils import NumPyRNGWrapper + + +class RandomVideoSource(object): + r""" + Class that manages a list of video frames collected from video files. The + default contrast change and sharpening values are chosen to make video more + legible when rendered in RoboDesk environment. + + attr:`contrast_change` and :attr:`sharpen` processes the loaded frames, + usually for the purpose of better visual clarity. + """ + def __init__(self, shape, video_files, num_frames=1000, + interpolation=cv2.INTER_NEAREST, contrast_change=20, sharpen=0.5): + self.height, self.width = shape + self.init_np_rng, self.np_rng = [NumPyRNGWrapper(s) for s in NumPyRNGWrapper.split_seed(None, 2)] + self.video_files = video_files + self.num_frames = num_frames + self.interpolation = interpolation + self.contrast_change = contrast_change + self.sharpen = sharpen + self.frames_loaded = False + self.reset() + + def seed(self, seed): + init_np_rng, np_rng = [NumPyRNGWrapper(s) for s in NumPyRNGWrapper.split_seed(seed, 2)] + return self.set_random_state((init_np_rng.get_random_state(), np_rng.get_random_state())) # go through the checks + + def get_random_state(self): + return (self.init_random_state_before_loading, self.np_rng.get_random_state()) + + def set_random_state(self, random_state): + if self.frames_loaded: + # If frames have been loaded, we have consumed `init_np_rng`. Since + # the set of loaded frames depends on the `init_np_rng` state, + # future loaded random state must contain the same `init_np_rng` + # state that was used to load the frames, otherwise it would be + # inconsistent as the different `init_np_rng` state would have + # loaded a different set of frames. + assert np.array_equiv( + np.asarray(self.init_random_state_before_loading, dtype=object), + np.asarray(random_state[0], dtype=object), + ), ( + "Incompatible random state for RandomVideoSource that have already loaded frames. " + "Either set seed / random state before loading frames (i.e., first rendering) or " + "use the same seed / random state that RandomVideoSource uses when loading random frames." + ) + else: + # If not loaded yet, okay to use any rng + self.init_np_rng.set_random_state(random_state[0]) + self.init_random_state_before_loading = self.init_np_rng.get_random_state() + self.np_rng.set_random_state(random_state[1]) + + def load_frames_if_needed(self): + if self.frames_loaded: + return + self.init_random_state_before_loading = self.init_np_rng.get_random_state() + self.frames = np.empty( + (self.num_frames, self.height, self.width, 3), dtype=np.uint8) + frame_ii = 0 + try: + import skvideo.io + except ImportError: + raise RuntimeError('TV videos requires scikit-video') + while True: + for video_f in self.init_np_rng.permutation(self.video_files): + video = skvideo.io.vread(video_f) + for video_ii in range(video.shape[0]): + self.frames[frame_ii] = self.process_frame(video[video_ii]) + frame_ii += 1 + if frame_ii == self.num_frames: + self.frames_loaded = True + return # always exit here + if frame_ii == 0: + raise RuntimeError('Given videos do not have any frame') + + def process_frame(self, frame): + frame = cv2.resize( + frame, (self.width, self.height), interpolation=self.interpolation) + if self.contrast_change != 0: + contrast_delta = max(-127, min(127, float(self.contrast_change))) + f = 131 * (contrast_delta + 127) / (127 * (131 - contrast_delta)) + alpha_c = f + gamma_c = 127 * (1 - f) + frame = cv2.addWeighted(frame, alpha_c, frame, 0, gamma_c) + if self.sharpen != 0: + s = float(self.sharpen) + kernel = np.array([ + [0, -s, 0], + [-s, 1 + 4 * s, -s], + [0, -s, 0], + ]) + frame = cv2.filter2D(src=frame, ddepth=-1, kernel=kernel) + return frame + + def reset(self): + self.frame_ii = self.np_rng.integers(0, self.num_frames) + + def get_image(self): + self.load_frames_if_needed() + return self.frames[self.frame_ii % self.num_frames] + + def step(self): + self.frame_ii += 1 + + + +class ConcatRollingImageSource(object): + def __init__(self, sources, axis=1): + assert len(sources) > 0 + self.sources = sources + self.axis = axis + self.roll_offset = 0 + + def seed(self, seed): + for source, s in zip(self.sources, NumPyRNGWrapper.split_seed(seed, len(self.sources))): + source.seed(s) + + def get_random_state(self): + return [source.get_random_state() for source in self.sources] + + def set_random_state(self, random_state): + for source, source_random_state in zip(self.sources, random_state): + source.set_random_state(source_random_state) + + def get_image(self): + image: np.ndarray = np.concatenate( + [s.get_image() for s in self.sources], + axis=self.axis, + ) + if self.roll_offset != 0: + image = np.roll(image, axis=self.axis, shift=self.roll_offset) + return image + + def step(self): + self.roll_offset += 1 + for s in self.sources: + s.step() + + def reset(self): + self.roll_offset = 0 + for s in self.sources: + s.reset()