feature(xrk): add q-transformer by rongkunxue · Pull Request #783 · opendilab/DI-engine

rongkunxue · 2024-03-22T07:33:12Z

Description

Related Issue

TODO

Check List

merge the latest version source branch/repo, and resolve all the conflicts
pass style check
pass all the tests

PaParaZz1 · 2024-03-22T11:21:11Z

ding/policy/__init__.py

 from .ppo import PPOPolicy, PPOPGPolicy, PPOOffPolicy
 from .sac import SACPolicy, DiscreteSACPolicy, SQILSACPolicy
 from .cql import CQLPolicy, DiscreteCQLPolicy
+from .qtransformer import QtransformerPolicy 


QTransformerPolicy

PaParaZz1 · 2024-03-22T11:22:11Z

dizoo/d4rl/entry/d4rl_qtransformer_main.py

+from ding.entry import serial_pipeline_offline
+from ding.config import read_config
+from pathlib import Path
+from ding.model.template.qtransformer import QTransformer


import from the secondary directory, such as:

from ding.model import QTransformer

PaParaZz1 · 2024-04-02T11:01:06Z

dizoo/d4rl/config/hopper_expert_qtransformer_config.py

+            alpha=0.2,
+            discount_factor_gamma=0.9,
+            min_reward = 0.1,
+            auto_alpha=False,


remove unused fields like this

PaParaZz1 · 2024-04-02T11:03:10Z

ding/policy/qtransformer.py

+            update_type='momentum',
+            update_kwargs={'theta': self._cfg.learn.target_theta}
+        )
+        self._low = np.array(self._cfg.other["low"])


we don't need low and high here, We always think that the action value range in the policy is [-1,1]

PaParaZz1 · 2024-04-02T11:05:53Z

dizoo/d4rl/config/hopper_expert_qtransformer_config.py

+        cuda=True,
+        model=dict(
+            num_actions = 3,
+            action_bins = 256,


this action_bins field is not used in policy

PaParaZz1 · 2024-04-02T11:18:07Z

ding/policy/qtransformer.py

+        selected = t.gather(-1, indices)
+        return rearrange(selected, '... 1 -> ...')
+
+    def _discretize_action(self, actions):


we can optimize this for loop:

action_values = np.linspace(-1, 1, 8)[np.newaxis, ...].repeat(4, 0) action_values = torch.as_tensor(action_values).to(self._device) diff = (actions.unsqueeze(-1) - action_values.unsqueeze(0)) ** 2 indices = diff.argmin(-1)

PaParaZz1 · 2024-04-02T11:21:49Z

ding/policy/qtransformer.py

+        actions = data['action']
+
+        #get q
+        num_timesteps, device = states.shape[1], states.device   


use self._device, which is the default member variable of Policy

PaParaZz1 · 2024-04-09T07:26:39Z

ding/policy/qtransformer.py

+import torch
+import torch.nn.functional as F
+from torch.distributions import Normal, Independent
+from ema_pytorch import EMA


remove unused third party libraries

PaParaZz1 · 2024-04-09T07:26:56Z

ding/policy/qtransformer.py

+
+from pathlib import Path
+from functools import partial
+from contextlib import nullcontext


polish imports

PaParaZz1 · 2024-04-09T07:27:08Z

ding/policy/qtransformer.py

+
+from torchtyping import TensorType
+
+from einops import rearrange, repeat, pack, unpack


add einops in setup.py

PaParaZz1 · 2024-04-09T07:27:53Z

ding/policy/qtransformer.py

+from einops import rearrange, repeat, pack, unpack
+from einops.layers.torch import Rearrange
+
+from beartype import beartype


we will not use beartype to validate runtime types in the current version, thus remove it in this PR

PaParaZz1 · 2024-04-09T07:29:02Z

ding/model/template/qtransformer.py

@@ -0,0 +1,753 @@
+from random import random
+from functools import partial, cache


cache is the new feature in python3.9, for compatibility, you should implement it as follows:

try: from functools import cache # only in Python >= 3.9 except ImportError: from functools import lru_cache cache = lru_cache(maxsize=None)

…tput; more pannel to see

make it can use

c0416af

PaParaZz1 added the algo Add new algorithm or improve old one label Mar 22, 2024

PaParaZz1 mentioned this pull request Mar 22, 2024

Roadmap for DI-engine #548

Open

rongkunxue added 4 commits March 29, 2024 06:13

change config to fit

8ab5da8

good use

b12714e

change all framework

066ff45

good use for eval

5988d14

PaParaZz1 requested changes Apr 2, 2024

View reviewed changes

add q_value

0875c3f

PaParaZz1 requested changes Apr 9, 2024

View reviewed changes

rongkunxue added 20 commits April 10, 2024 12:05

change action_bin to 8 with best control; init q weight for middle ou…

cf51545

…tput; more pannel to see

Merge branch 'opendilab:main' into q_transformner

90b3dbb

Merge branch 'opendilab:main' into q_transformner

0446efe

polish code

f309121

change it

8eff2ef

polish code for init

191fe53

polish config

33554e7

add more high and low with action_bin

81bea50

polish import

4fe9db0

polish import

1839ded

Merge branch 'opendilab:main' into q_transformner

be60d5c

Merge branch 'opendilab:main' into q_transformner

4e5dd58

add dataset for update

0e71001

add init

6023c65

polish qtransformer

7095b38

episode

ad1ccb1

polish

660a038

polish

68003c8

polish

4b228cb

polish

8e97624

rongkunxue added 14 commits June 20, 2024 10:58

polish

54688fa

polish

d8b3868

polish

509cd5a

polish

6e3cf36

polish

d536ab1

poilsh

0b54465

Merge branch 'opendilab:main' into q_transformner

140b70f

polish online

c76e9b3

polish to d4rl dataset

44d746e

add

5d59b3d

add

b784bb2

polish

f35338b

polish

7c8d64f

make more head for the task

a057051

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(xrk): add q-transformer#783

feature(xrk): add q-transformer#783
rongkunxue wants to merge 40 commits intoopendilab:mainfrom
rongkunxue:q_transformner

rongkunxue commented Mar 22, 2024

Uh oh!

PaParaZz1 Mar 22, 2024

Uh oh!

PaParaZz1 Mar 22, 2024

Uh oh!

PaParaZz1 Apr 2, 2024

Uh oh!

PaParaZz1 Apr 2, 2024

Uh oh!

PaParaZz1 Apr 2, 2024

Uh oh!

PaParaZz1 Apr 2, 2024

Uh oh!

PaParaZz1 Apr 2, 2024

Uh oh!

PaParaZz1 Apr 9, 2024

Uh oh!

PaParaZz1 Apr 9, 2024

Uh oh!

PaParaZz1 Apr 9, 2024

Uh oh!

PaParaZz1 Apr 9, 2024

Uh oh!

PaParaZz1 Apr 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments


		from torchtyping import TensorType

		from einops import rearrange, repeat, pack, unpack

		@@ -0,0 +1,753 @@
		from random import random
		from functools import partial, cache

Conversation

rongkunxue commented Mar 22, 2024

Description

Related Issue

TODO

Check List

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments