From 4be783224823db9bfd8bd7409b0cc4c2c7ba5e6a Mon Sep 17 00:00:00 2001
From: wwwisman <120352666+wisman-tccr@users.noreply.github.com>
Date: Sun, 27 Jul 2025 21:11:03 +0800
Subject: [PATCH 1/2] Update README.md

---
 smallthinker/README.md | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/smallthinker/README.md b/smallthinker/README.md
index d868c4c2..4a23a804 100644
--- a/smallthinker/README.md
+++ b/smallthinker/README.md
@@ -13,26 +13,28 @@ https://github.com/user-attachments/assets/cefd466e-3b1f-47a9-8dc3-f1cf5119045e
 ### SmallThinker 21B 
 | Model                            | Memory(GiB)         | i9 14900 | 1+13 8ge4 | rk3588 (16G) | Raspberry PI 5 |
 |--------------------------------------|---------------------|----------|-----------|--------------|----------------|
-| SmallThinker 21B+sparse              | 11.47               | 30.19    | 23.03     | 10.84        | 6.61           |
-| SmallThinker 21B+sparse +limited memory | limit 8G         | 20.30     | 15.50        | 8.56     | -              |
+| SmallThinker 21B (sparse)             | 11.47               | 30.19    | 23.03     | 10.84        | 6.61           |
+| SmallThinker 21B (sparse + limited memory) | limit 8G         | 20.30     | 15.50        | 8.56     | -              |
 | Qwen3 30B A3B                        | 16.20               | 33.52    | 20.18     | 9.07         | -              |
-| Qwen3 30B A3Blimited memory          | limit 8G            | 10.11     | 0.18         | 6.32     | -              |
+| Qwen3 30B A3B (limited memory)          | limit 8G            | 10.11     | 0.18         | 6.32     | -              |
 | Gemma 3n E2B                         | 1G, theoretically   | 36.88    | 27.06     | 12.50        | 6.66           |
 | Gemma 3n E4B                         | 2G, theoretically   | 21.93    | 16.58     | 7.37         | 4.01           |
 
 ### SmallThinker 4B 
 | Model                                         | Memory(GiB)         | i9 14900 | 1+13 8gen4 | rk3588 (16G) | rk3576 | Raspberry PI 5 | RDK X5 | rk3566 |
 |-----------------------------------------------|---------------------|----------|------------|--------------|--------|----------------|--------|--------|
-| SmallThinker 4B+sparse ffn +sparse lm_head    | 2.24                | 108.17   | 78.99      | 39.76        | 15.10  | 28.77          | 7.23   | 6.33   |
-| SmallThinker 4B+sparse ffn +sparse lm_head+limited memory | limit 1G           | 29.99    | 20.91      | 15.04        | 2.60   | 0.75           | 0.67   | 0.74   |
+| SmallThinker 4B (sparse)                      | 2.24                | 108.17   | 78.99      | 39.76        | 15.10  | 28.77          | 7.23   | 6.33   |
+| SmallThinker 4B (sparse + limited memory) | limit 1G           | 29.99    | 20.91      | 15.04        | 2.60   | 0.75           | 0.67   | 0.74   |
 | Qwen3 0.6B                                    | 0.6                 | 148.56   | 94.91      | 45.93        | 15.29  | 27.44          | 13.32  | 9.76   |
 | Qwen3 1.7B                                    | 1.3                 | 62.24    | 41.00      | 20.29        | 6.09   | 11.08          | 6.35   | 4.15   |
-| Qwen3 1.7B limited memory                     | limit 1G            | 2.66     | 1.09       | 1.00         | 0.47   | -              | -      | 0.11   |
+| Qwen3 1.7B (limited memory)                     | limit 1G            | 2.66     | 1.09       | 1.00         | 0.47   | -              | -      | 0.11   |
 | Gemma3n E2B                                   | 1G, theoretically   | 36.88    | 27.06      | 12.50        | 3.80   | 6.66           | 3.46   | 2.45   |
 
 
 
-Note：i9 14900、1+13 8ge4 use 4 threads，others use the number of threads that  can achieve the maximum speed 
+Note：
+- i9 14900, 1+13 8ge4 use 4 threads, others use the number of threads that can achieve the maximum speed
+- sparse: refers to leveraging the sparsity induced by the ReLU activation function to skip certain computations during the UP/DOWN calculation of each expert based on the GATE output, as well as using a predictor to perform sparse computation when calculating the lm_head
 
 ## Setup
 1. init submodule：
@@ -45,17 +47,21 @@ git submodule update --init --recursive
 ```bash
 sudo apt install clang-21 mold
 ```
-3. cd smallthinker before compiling
+3. Install the required Python packages
+```bash
+pip install -r requirements.txt
+```
+4. cd smallthinker before compiling
 ```bash
 cd smallthinker
 ```
-
+### NOTE: Compilation, model conversion, and other related operations must be performed in the `smallthinker` directory.
 
 ## Convert Model
 ```bash
 python3 convert_hf_to_gguf.py /path/to/safetensors_model --outtype f16 --outfile /path/to/gguf_fp16 --transpose-down all
 
-./build_x86/bin/llama-quantize --pure /path/to/gguf_fp16  /path/to/gguf_q4_0 Q4_0  8
+./build/bin/llama-quantize --pure /path/to/gguf_fp16  /path/to/gguf_q4_0 Q4_0  8
 ```
 Note:lm_head sparsity is not included. If needed, please merge model_lm_head.pt into the safetensors file before executing the above commands, or directly download the GGUF file we provide.
 ## x86 Compile

From 7ca0b2cdba7e8df46b83c961e3e45c466da86fb8 Mon Sep 17 00:00:00 2001
From: wwwisman <120352666+wisman-tccr@users.noreply.github.com>
Date: Sun, 27 Jul 2025 21:13:34 +0800
Subject: [PATCH 2/2] Update README.md

---
 smallthinker/README.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/smallthinker/README.md b/smallthinker/README.md
index 4a23a804..d77d121e 100644
--- a/smallthinker/README.md
+++ b/smallthinker/README.md
@@ -33,7 +33,6 @@ https://github.com/user-attachments/assets/cefd466e-3b1f-47a9-8dc3-f1cf5119045e
 
 
 Note：
-- i9 14900, 1+13 8ge4 use 4 threads, others use the number of threads that can achieve the maximum speed
 - sparse: refers to leveraging the sparsity induced by the ReLU activation function to skip certain computations during the UP/DOWN calculation of each expert based on the GATE output, as well as using a predictor to perform sparse computation when calculating the lm_head
 
 ## Setup