0.8M parameters model (16,000 times smaller than Vicuna-13B) training… #1

avesus · 2024-06-22T15:56:52Z

A simple 4 layer generative Transformer with 16 attention heads.
One tweak of LLaMA is the embedding matrix, for which computed a pseudo-inverse used as unembedding that's being backpropagated.

After 50,000 iterations it generates wild tiny stories.

This PR adds an example of inference parameters and training (with example of run over 50,000 iterations).

… over 50,000 iterations makes wild stories

0.8M parameters model (16,000 times smaller than Vicuna-13B) training…

e7e79b2

… over 50,000 iterations makes wild stories

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.8M parameters model (16,000 times smaller than Vicuna-13B) training… #1

0.8M parameters model (16,000 times smaller than Vicuna-13B) training… #1

Uh oh!

avesus commented Jun 22, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

0.8M parameters model (16,000 times smaller than Vicuna-13B) training… #1

Are you sure you want to change the base?

0.8M parameters model (16,000 times smaller than Vicuna-13B) training… #1

Uh oh!

Conversation

avesus commented Jun 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

avesus commented Jun 22, 2024 •

edited

Loading