Rationale for training only on 10% of the buffer ?

When defining the batch size in training_go.py, you comment : 'To avoid overfitting, we want to make sure the agent only sees ~10% of samples in the replay over one checkpoint.' 'That is, batch_size * ckpt_interval <= replay_capacity * 0.1'. Can you expand on this choice ? Intuitively training on a small sample of the buffer will foster overfitting rather than prevent it doesn't it ? Can you explain more in details this choice please :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rationale for training only on 10% of the buffer ? #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Rationale for training only on 10% of the buffer ? #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions