From e7e79b283a344279d15f7761a525d5da222074e7 Mon Sep 17 00:00:00 2001 From: Brian Cannard Date: Sat, 22 Jun 2024 08:55:15 -0700 Subject: [PATCH] 0.8M parameters model (16,000 times smaller than Vicuna-13B) training over 50,000 iterations makes wild stories --- train-dali.sh | 227 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 227 insertions(+) create mode 100644 train-dali.sh diff --git a/train-dali.sh b/train-dali.sh new file mode 100644 index 000000000..8f0d9a041 --- /dev/null +++ b/train-dali.sh @@ -0,0 +1,227 @@ +### Running: + +brian@Brians-MacBook-Pro llama2.c % python3 sample.py --max_new_tokens=10240 --top_k=200 --temperature=0.8 --checkpoint=./outminimagic23/ckpt.pt --tokenizer=./data/tok361.model --start='' +Overriding: max_new_tokens = 10240 +Overriding: top_k = 200 +Overriding: temperature = 0.8 +Overriding: checkpoint = ./outminimagic23/ckpt.pt +Overriding: tokenizer = ./data/tok361.model +Overriding: start = +ModelArgs(dim=128, n_layers=4, n_heads=16, n_kv_heads=16, vocab_size=361, hidden_dim=None, multiple_of=1, norm_eps=1e-05, max_seq_len=128, dropout=0.15) +tok_embeddings.weight torch.Size([361, 128]) +layers.0.attention.wq.weight torch.Size([128, 128]) +layers.0.attention.wk.weight torch.Size([128, 128]) +layers.0.attention.wv.weight torch.Size([128, 128]) +layers.0.attention.wo.weight torch.Size([128, 128]) +layers.0.feed_forward.w1.weight torch.Size([341, 128]) +layers.0.feed_forward.w1.bias torch.Size([341]) +layers.0.feed_forward.w2.weight torch.Size([128, 341]) +layers.0.feed_forward.w3.weight torch.Size([341, 128]) +layers.0.attention_norm.weight torch.Size([128]) +layers.0.ffn_norm.weight torch.Size([128]) +layers.1.attention.wq.weight torch.Size([128, 128]) +layers.1.attention.wk.weight torch.Size([128, 128]) +layers.1.attention.wv.weight torch.Size([128, 128]) +layers.1.attention.wo.weight torch.Size([128, 128]) +layers.1.feed_forward.w1.weight torch.Size([341, 128]) +layers.1.feed_forward.w1.bias torch.Size([341]) +layers.1.feed_forward.w2.weight torch.Size([128, 341]) +layers.1.feed_forward.w3.weight torch.Size([341, 128]) +layers.1.attention_norm.weight torch.Size([128]) +layers.1.ffn_norm.weight torch.Size([128]) +layers.2.attention.wq.weight torch.Size([128, 128]) +layers.2.attention.wk.weight torch.Size([128, 128]) +layers.2.attention.wv.weight torch.Size([128, 128]) +layers.2.attention.wo.weight torch.Size([128, 128]) +layers.2.feed_forward.w1.weight torch.Size([341, 128]) +layers.2.feed_forward.w1.bias torch.Size([341]) +layers.2.feed_forward.w2.weight torch.Size([128, 341]) +layers.2.feed_forward.w3.weight torch.Size([341, 128]) +layers.2.attention_norm.weight torch.Size([128]) +layers.2.ffn_norm.weight torch.Size([128]) +layers.3.attention.wq.weight torch.Size([128, 128]) +layers.3.attention.wk.weight torch.Size([128, 128]) +layers.3.attention.wv.weight torch.Size([128, 128]) +layers.3.attention.wo.weight torch.Size([128, 128]) +layers.3.feed_forward.w1.weight torch.Size([341, 128]) +layers.3.feed_forward.w1.bias torch.Size([341]) +layers.3.feed_forward.w2.weight torch.Size([128, 341]) +layers.3.feed_forward.w3.weight torch.Size([341, 128]) +layers.3.attention_norm.weight torch.Size([128]) +layers.3.ffn_norm.weight torch.Size([128]) +norm.weight torch.Size([128]) +output.weight torch.Size([361, 128]) + Once upon a time, there was a nice girl named Lily. She loved to play outside and play loud sounds. One day, she went to the park to play with her friends in the park. They saw a tree and opened the box. The box was so happy! Lily did not know what to do. +But then, a man came to her friend, Timmy, and asked him to climb him. The cabin was weck to eat out and took a nap. The yummy bird couldn't see them again. Timmy learned that sometimes we promise to be my friend, and so you have a secret to be brave. Once upon a time, there was a little girl named Lily. She loved to play outside with her friend, Timmy. One day, they went to the park to play. Lily saw a big, round room and said, "Hi. Do you want to give me hug your ball?" +Lily's mom said, "I like this dog. His mom said no. It makes you just go and play with it." They do not like the turtle and they carried the snake. +They see a small toy. They did not want to share. They took the door to the tall room. They got out and said, "I'm sorry, Tim. I don't feel better. I will come back to his book." The tuna was sad and said, "I made a soup for you." +Sue looked at the pot and said, "Yes, you can cry! My tail is so big," Sue said. Sue said, "Don't worry, Tom. I was a little bit careful and brave." Tom did not listen. He wanted to play with the armchair, but he did not know what to do. +They went to the bed with their table. They were sad and sad. But then, a big dog came to the park. They saw the bear saw the shield, something unexpected happened. The big dog started to chew and a big box. The dog was not scary. The dog accidentally found the ball outside and started to move. After a while, the cat found a big puddle on the ground. The ball was very sad and did not know what to do. The dog could not play with the cat. +Suddenly, the ball started to flower. The cat fell down the cat. The cat came closer, and the little girl was scared. She was so happy to help. She felt happy and thanked the cat. From that day on, Lily played on the swings and remembered the cat. Once upon a time, there was a little bird named Blue. Blue loved to play and drive his favorite dog. One day, Blue saw a big red man with lots of statues. The little dog saw a big tree. He wanted to play with the dog, but he was too hard. The dog ran to the dog and decided to draw a big, dirty ball in a tree. The boy said, "No, I'm sorry, Max. You're a good idea and danced like the stack." +Max took a big ball of the cat with the ball. "Hello, Kitty! What are you doing?" Lily asked. She asked her mommy if she could lose her ball, but she didn't know what to do. +Her mom said, "Let's go home and show it to me." +Lily played with her amazing eyes and her friends were happy to play with her friends. They all had a new friend together and the stranger. Once upon a time, there was a little girl named Lily. She loved to play outside and eat carrots. One day, she saw a big box of nuts in the park. She was very scared because she didn't know what to do. +Lily noticed that even though it was coming a treat to eat it and didn't want to do what to do. She told her friend, "Thank you, car. I'll help you get the drawer." +Lily felt like the monster to the cat. She learned that it's important to always stay in the air and have fun. She knew that the candy was so happy! Once upon a time, in a small bush, there was a little girl named Lily. She loved to play outside and explore the forest. One day, she met a big tree near her house. She wanted to eat it, but it was too heavy. She felt sad and didn't know what to do. +As the day of the lion, Lucy went to the beach and saw a big tree. The bear had a big room on it. They were proud of her friends. They said, "Tom, I am sorry. I was a good help." +Tom decided to help. He started to play with the toy all day. He wanted to listen to his mom and said, "No, let's race. It's still playing on the soft soft fence." +The next day, Timmy went to the park to work. He saw a little bird that he had never rested to fly again. He thought about it was a great time. +The moral of the story is that it is too big and pretty!" +Lily and her friends went to her to a friend Tom and took their hands. They learned that it's important to be careful with things. They might still make sure to take a bath. The cloud is so naughty and cheerful. They are happy. +"Thank you, Lily. This mess is the vendor. It is not real. You know me where things is mine. It is not just a big tree. It is so big!" Lily says. She says. +"Yes, it is a store," Mom says. "Now we do not take a nap. She makes a pretty blanket. Go away because they have many friends." They play together and play together. They both and have fun. +One day, the bear saw a small plane with a screw. The apple belonged to the ceiling and it was hungry. The boy had a special toy man in his home. He was so happy to have a friend to play with his toy doll. +One day, a small turtle was playing outside in the snow. The sun was shining because he had to make a big string. It was a toy car in the dirt. The cushion was so happy! The egg was happy and said, "Thank you, but be careful and need to eat the toy for you." +The dog looked around and saw that he had never tried to make a seat in his toy. It was so excited! He put it in his hand and the day was so happy that he swam away with his mom. +Timmy and his toy didn't want to help but he kept his beetle. They both closed his eyes and slept at the top. Timmy was surprised and said, "Don't worry, it's not good to take care of you." +Jen and Max went to the cabinet- and the zoo said, "That's better than you and that's not for it." +Anna was surprised and decided to help the birds. She pulled the bird away and brought the bird to the bird. The bird was so happy! The bird and the bird were surprised too. They became best friends and played together every day. Once upon a time, there was a little girl named Lily. She loved to help her mommy and daddy. One day, her mommy told her that she had to ask for help. Lily thought she loved to show her mommy how to clean up her toys and run back to her daddy to her friends. Lily was so happy because she had a new friend to play. From that day on, Lily always remembered to be more careful next time. Once upon a time, there was a little girl named Lily. She had a favorite toy that she loved to play with her toys. One day, her mom gave her a big ball to her toy to buy some food. Lily laughed and said, "Be careful, but I want to clean up too. But when I was scared, it would be fun to make the monster feel happy." +The next day, a mommy came into the room and saw the clouds. The cow said, "How are you sad?" The mommy said, "I'm sorry, Timmy. It's a special phone for the storm to fix it." +Timmy smiled and said, "Yes, you must turn around and wouldn't have to leave you up to the door." +The moral of the story is that it's important to be more kind and helpful and keep just keeping good as a play for you. Once upon a time, there was a little girl named Lily. She loved to play outside and explore the world around her neighborhood. One day, she saw a boy named Tim. The boy was playing outside with her doll when she saw a cloud on the ground. She saw a brave box of pointers. The brown door said, "Hi, I am Tom. Do you want to play with me?" Sue did not know what to do. She tried to push her books and showed her the doll. +Tom said, "Oh no, the kitchen, that was the best friend, a big storm." Tom was so happy that he had lost his mom's head. He thanked the children and said, "Thank you for the children to my home." Tom was sad and said, "It's a special pepper. You was excited to take listen to my sweet place." Tom ran away from his mom and said, "Sure, let's find some cake!" Sue did not listen to his mom and dad. She told him he was a scarf swimming away. He said, "I don't know why I am sorry, but I will have to have a storm is or delicate and far away to be there to leave something that we can use the hidden to make me small and cozy. He looked up and saw a small cat with a birds, but he said no. The cat was sad and ran to find him. The squirrel was sad, but his toy cat had an idea. +Tim thought about a white cat, "That is not so brilliant to play!" The cat said, "No, thank you!" The duck said, "Yes, let's play with the toy car." The dog slipped and said, "Yes, Max!" Max and Sue saw that the toy car was the new toy car. They told the dog that they needed to help Max. They ran to the car. The dog was not strong. They all played together and had fun. They were all afraid, but never started to meet there. They found a big dog. They had so much fun. The dog and the dog were happy and played together in the kitchen. The dog wanted to do it for the dog. They all went outside to play. They decided to share the toy cars. +The dog wanted to see what was for, but they were not too late. They were so happy to have a lot of fly. The dog reached and brought from the shelf. The dog said, "You did something different. It said yes, and you can come back home." The cat laughed and hugged her. They both smiled and had an idea. They started to rain. They followed their new doll and the dog asked him what was wrong. +"Wow!" Dad said. "What do you do?" +"No, let's play with it!" Tim said. +"Maybe Max is not the dog," said his mom. +Tim was sad. He could not find his paw. He ran to his room and found a new toy car. Max said, "That's a special monster! He grabbed the man and put it on his adventure. Max was happy. Max said, "Thank, you're so kind, Max. I will enjoy the house on the counter to hide in the box." Once upon a time, there was a little girl named Lily. She loved to play with her toys and her friends. One day, she saw a little girl named Lily who wanted to play with her toys. The girl said, "Ow, let's go to the park!" +The squirrel went to the park to play. Her friends was very happy. They played together all day long and had lots of fun. They all laughed and played together every day. Once upon a time, there was a little girl named Lily. She loved to play in her backyard. One day, she went to the park to play outside in the park. She saw a small box in the park, but she wanted to play with it with it. +Lily said, "Mom, you need to eat something funny + + + +### Training: + +llama2.c % python3 train.py \ + --out_dir="outminimagic19" \ + --batch_size=64 \ + --max_seq_len=128 \ + --gradient_accumulation_steps=1 \ + --vocab_source="custom" \ + --vocab_size=361 \ + --dim=128 \ + --n_layers=4 \ + --n_heads=16 \ + --n_kv_heads=16 \ + --multiple_of=1 \ + --learning_rate=3e-4 \ + --dropout=0.15 \ + --weight_decay=0.1 \ + --max_iters=100000 \ + --beta2=0.99 \ + --warmup_iters=2500 \ + --eval_interval=5000 \ + --eval_iters=100 \ + --compile=False \ + --device=cpu +Overriding: out_dir = outminimagic19 +Overriding: batch_size = 64 +Overriding: max_seq_len = 128 +Overriding: gradient_accumulation_steps = 1 +Overriding: vocab_source = custom +Overriding: vocab_size = 361 +Overriding: dim = 128 +Overriding: n_layers = 4 +Overriding: n_heads = 16 +Overriding: n_kv_heads = 16 +Overriding: multiple_of = 1 +Overriding: learning_rate = 0.0003 +Overriding: dropout = 0.15 +Overriding: weight_decay = 0.1 +Overriding: max_iters = 100000 +Overriding: beta2 = 0.99 +Overriding: warmup_iters = 2500 +Overriding: eval_interval = 5000 +Overriding: eval_iters = 100 +Overriding: compile = False +Overriding: device = cpu +tokens per iteration will be: 8,192 +breaks down as: 1 grad accum steps * 1 processes * 64 batch size * 128 max seq len +Initializing a new model from scratch +num decayed parameter tensors: 29, with 832,128 parameters +num non-decayed parameter tensors: 13, with 2,516 parameters +using fused AdamW: False +Created a PretokDataset with rng seed 42 +Created a PretokDataset with rng seed 42 +Created a PretokDataset with rng seed 42 +step 0: train loss 13.0864, val loss 13.0707 +... +705 | loss 2.0334 | lr 8.460000e-05 | 1492.70ms | mfu 0.01% +706 | loss 1.9905 | lr 8.472000e-05 | 1494.75ms | mfu 0.01% +707 | loss 2.0143 | lr 8.484000e-05 | 1506.09ms | mfu 0.01% +708 | loss 2.0368 | lr 8.496000e-05 | 1485.97ms | mfu 0.01% +709 | loss 2.0127 | lr 8.508000e-05 | 1484.27ms | mfu 0.01% +710 | loss 2.0345 | lr 8.520000e-05 | 1510.76ms | mfu 0.01% +711 | loss 1.9817 | lr 8.532000e-05 | 1520.79ms | mfu 0.01% +712 | loss 1.9925 | lr 8.544000e-05 | 1526.97ms | mfu 0.01% +713 | loss 2.0007 | lr 8.556000e-05 | 1499.91ms | mfu 0.01% +714 | loss 1.9873 | lr 8.568000e-05 | 1512.84ms | mfu 0.01% +715 | loss 2.0048 | lr 8.580000e-05 | 1511.11ms | mfu 0.01% +716 | loss 2.0225 | lr 8.592000e-05 | 1493.29ms | mfu 0.01% +717 | loss 1.9932 | lr 8.604000e-05 | 1440.74ms | mfu 0.01% +718 | loss 1.9723 | lr 8.616000e-05 | 1429.70ms | mfu 0.01% +719 | loss 2.0064 | lr 8.628000e-05 | 1417.85ms | mfu 0.01% +720 | loss 2.0453 | lr 8.640000e-05 | 1421.58ms | mfu 0.01% +721 | loss 1.9991 | lr 8.652000e-05 | 1422.54ms | mfu 0.01% +722 | loss 1.9957 | lr 8.664000e-05 | 1410.20ms | mfu 0.01% +723 | loss 1.9658 | lr 8.676000e-05 | 1442.83ms | mfu 0.01% +724 | loss 1.9801 | lr 8.688000e-05 | 1535.14ms | mfu 0.01% +725 | loss 1.9765 | lr 8.700000e-05 | 1417.81ms | mfu 0.01% +726 | loss 1.9835 | lr 8.712000e-05 | 1420.28ms | mfu 0.01% +727 | loss 2.0033 | lr 8.724000e-05 | 1410.68ms | mfu 0.01% +728 | loss 1.9774 | lr 8.736000e-05 | 1417.51ms | mfu 0.01% +729 | loss 1.9833 | lr 8.748000e-05 | 1415.66ms | mfu 0.01% +730 | loss 1.9985 | lr 8.760000e-05 | 1414.78ms | mfu 0.01% +731 | loss 2.0255 | lr 8.772000e-05 | 1496.85ms | mfu 0.01% +732 | loss 1.9592 | lr 8.784000e-05 | 1413.75ms | mfu 0.01% +733 | loss 1.9893 | lr 8.796000e-05 | 1434.57ms | mfu 0.01% +734 | loss 1.9545 | lr 8.808000e-05 | 1530.36ms | mfu 0.01% +735 | loss 1.9857 | lr 8.820000e-05 | 1685.52ms | mfu 0.01% +736 | loss 1.9825 | lr 8.832000e-05 | 1449.32ms | mfu 0.01% +737 | loss 1.9974 | lr 8.844000e-05 | 1430.09ms | mfu 0.01% +738 | loss 1.9856 | lr 8.856000e-05 | 1417.67ms | mfu 0.01% +739 | loss 2.0039 | lr 8.868000e-05 | 1436.28ms | mfu 0.01% +740 | loss 1.9652 | lr 8.880000e-05 | 1419.71ms | mfu 0.01% +741 | loss 1.9629 | lr 8.892000e-05 | 1409.32ms | mfu 0.01% +742 | loss 2.0012 | lr 8.904000e-05 | 1416.17ms | mfu 0.01% +743 | loss 1.9590 | lr 8.916000e-05 | 1419.33ms | mfu 0.01% +744 | loss 1.9578 | lr 8.928000e-05 | 1429.98ms | mfu 0.01% +745 | loss 1.9379 | lr 8.940000e-05 | 1434.93ms | mfu 0.01% +746 | loss 1.9645 | lr 8.952000e-05 | 1414.13ms | mfu 0.01% +747 | loss 2.0026 | lr 8.964000e-05 | 1414.19ms | mfu 0.01% +... +53098 | loss 0.7980 | lr 1.410735e-04 | 1353.33ms | mfu 0.01% +53099 | loss 0.7813 | lr 1.410687e-04 | 1338.73ms | mfu 0.01% +53100 | loss 0.7909 | lr 1.410638e-04 | 1440.78ms | mfu 0.01% +53101 | loss 0.8017 | lr 1.410590e-04 | 1357.53ms | mfu 0.01% +53102 | loss 0.7951 | lr 1.410542e-04 | 1354.06ms | mfu 0.01% +53103 | loss 0.8079 | lr 1.410494e-04 | 1360.68ms | mfu 0.01% +53104 | loss 0.8031 | lr 1.410445e-04 | 1387.29ms | mfu 0.01% +53105 | loss 0.8064 | lr 1.410397e-04 | 1375.44ms | mfu 0.01% +53106 | loss 0.8625 | lr 1.410349e-04 | 1369.82ms | mfu 0.01% +53107 | loss 0.7961 | lr 1.410301e-04 | 1375.93ms | mfu 0.01% +53108 | loss 0.8355 | lr 1.410252e-04 | 1370.07ms | mfu 0.01% +53109 | loss 0.7853 | lr 1.410204e-04 | 1395.70ms | mfu 0.01% +53110 | loss 0.7997 | lr 1.410156e-04 | 1380.16ms | mfu 0.01% +53111 | loss 0.7759 | lr 1.410108e-04 | 1396.00ms | mfu 0.01% +53112 | loss 0.7474 | lr 1.410059e-04 | 1385.05ms | mfu 0.01% +53113 | loss 0.7681 | lr 1.410011e-04 | 1397.83ms | mfu 0.01% +53114 | loss 0.8033 | lr 1.409963e-04 | 1680.10ms | mfu 0.01% +53115 | loss 0.7722 | lr 1.409915e-04 | 1669.39ms | mfu 0.01% +53116 | loss 0.7766 | lr 1.409866e-04 | 1356.32ms | mfu 0.01% +53117 | loss 0.7859 | lr 1.409818e-04 | 1349.66ms | mfu 0.01% +53118 | loss 0.7967 | lr 1.409770e-04 | 1343.45ms | mfu 0.01% +53119 | loss 0.8224 | lr 1.409722e-04 | 1395.09ms | mfu 0.01% +53120 | loss 0.7804 | lr 1.409673e-04 | 1416.06ms | mfu 0.01% +53121 | loss 0.7978 | lr 1.409625e-04 | 1365.17ms | mfu 0.01% +53122 | loss 0.7935 | lr 1.409577e-04 | 1396.14ms | mfu 0.01% +53123 | loss 0.8094 | lr 1.409529e-04 | 1381.31ms | mfu 0.01% +53124 | loss 0.7966 | lr 1.409480e-04 | 1383.27ms | mfu 0.01% +53125 | loss 0.7698 | lr 1.409432e-04 | 1400.36ms | mfu 0.01% +53126 | loss 0.8198 | lr 1.409384e-04 | 1376.63ms | mfu 0.01% +53127 | loss 0.7761 | lr 1.409336e-04 | 1529.66ms | mfu 0.01% +53128 | loss 0.8134 | lr 1.409288e-04 | 1510.99ms | mfu 0.01% +53129 | loss 0.7929 | lr 1.409239e-04 | 1421.13ms | mfu 0.01% +53130 | loss 0.8412 | lr 1.409191e-04 | 1456.67ms | mfu 0.01% +