-
Notifications
You must be signed in to change notification settings - Fork 39
Qwen 3 1.7B Offline tool calling Android #165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vkkhare
wants to merge
11
commits into
NimbleEdge:main
Choose a base branch
from
vkkhare:tokenizers
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
c2e00db
initiate qwen 1.7 Agent scripts
c7c425c
add onnx tests and lfm models
vkkhare d61c475
Merge branch 'main' into tokenizers
vkkhare 92bae73
# This is a combination of 5 commits.
vkkhare c274f0c
Redo deliteai.dev website (#163)
jpuneet b42f0c6
Upgrade Python version in GitHub workflows (#166)
jpuneet 5c5e4e8
correct mac build
d342cad
udpate tokenizers submodule
53c8aa5
Merge branch 'main' into tokenizers
vkkhare 7882b83
Merge branch 'main' into tokenizers
vkkhare c04611d
Merge branch 'main' into tokenizers
vkkhare File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,3 +6,5 @@ third_party/runtime/ | |
| !third_party/runtime/CMakeLists.txt | ||
| __pycache__/ | ||
| .pytest_cache/ | ||
| **/NimbleSDK | ||
| models/**/data | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| [submodule "third_party/tokenizers-cpp"] | ||
| path = third_party/tokenizers-cpp | ||
| url = https://github.com/NimbleEdge/tokenizers-cpp.git | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
152 changes: 152 additions & 0 deletions
152
coreruntime/delitepy/library_stubs/src_template/delitepy/tokenizers/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,152 @@ | ||
| # SPDX-FileCopyrightText: (C) 2025 DeliteAI Authors | ||
| # | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| """Package delitepy.tokenizers for tokenizer functionality.""" | ||
|
|
||
| from typing import List, Union | ||
| from delitepy.nimblenet.tensor import Tensor | ||
|
|
||
| def from_pretrained(model_name_or_path: str) -> str: | ||
| """Load a pre-trained tokenizer from HuggingFace Hub or local file. | ||
|
|
||
| Args: | ||
| model_name_or_path: Path to tokenizer.json file or HuggingFace model name | ||
|
|
||
| Returns: | ||
| Tokenizer handle (opaque string identifier) | ||
|
|
||
| Example: | ||
| >>> tokenizer = tokenizers.from_pretrained("bert-base-uncased") | ||
| >>> tokenizer = tokenizers.from_pretrained("/path/to/tokenizer.json") | ||
| """ | ||
| pass | ||
|
|
||
| def from_file(file_path: str) -> str: | ||
| """Load a tokenizer from a file path. | ||
|
|
||
| Args: | ||
| file_path: Path to tokenizer.json or .model file | ||
|
|
||
| Returns: | ||
| Tokenizer handle (opaque string identifier) | ||
|
|
||
| Example: | ||
| >>> tokenizer = tokenizers.from_file("tokenizer.json") | ||
| >>> tokenizer = tokenizers.from_file("model.spm") | ||
| """ | ||
| pass | ||
|
|
||
| def from_json(json_str: str) -> str: | ||
| """Create a tokenizer from a JSON string. | ||
|
|
||
| Args: | ||
| json_str: JSON string containing tokenizer configuration | ||
|
|
||
| Returns: | ||
| Tokenizer handle (opaque string identifier) | ||
|
|
||
| Example: | ||
| >>> json_config = '{"model": {...}, "normalizer": {...}}' | ||
| >>> tokenizer = tokenizers.from_json(json_config) | ||
| """ | ||
| pass | ||
|
|
||
| def from_sentencepiece(model_path: str) -> str: | ||
| """Load a SentencePiece tokenizer from a .model file. | ||
|
|
||
| Args: | ||
| model_path: Path to SentencePiece .model file | ||
|
|
||
| Returns: | ||
| Tokenizer handle (opaque string identifier) | ||
|
|
||
| Example: | ||
| >>> tokenizer = tokenizers.from_sentencepiece("tokenizer.model") | ||
| """ | ||
| pass | ||
|
|
||
| def encode(tokenizer: str, text: str) -> Tensor: | ||
| """Encode text into token IDs. | ||
|
|
||
| Args: | ||
| tokenizer: Tokenizer handle from from_pretrained/from_file/etc. | ||
| text: Text to encode | ||
|
|
||
| Returns: | ||
| Tensor containing token IDs (INT32) | ||
|
|
||
| Example: | ||
| >>> tokenizer = tokenizers.from_pretrained("bert-base-uncased") | ||
| >>> token_ids = tokenizers.encode(tokenizer, "Hello world!") | ||
| >>> print(token_ids.shape) # [num_tokens] | ||
| """ | ||
| pass | ||
|
|
||
| def decode(tokenizer: str, token_ids: Tensor) -> str: | ||
| """Decode token IDs back to text. | ||
|
|
||
| Args: | ||
| tokenizer: Tokenizer handle | ||
| token_ids: Tensor containing token IDs (INT32) | ||
|
|
||
| Returns: | ||
| Decoded text string | ||
|
|
||
| Example: | ||
| >>> tokenizer = tokenizers.from_pretrained("bert-base-uncased") | ||
| >>> token_ids = tokenizers.encode(tokenizer, "Hello world!") | ||
| >>> text = tokenizers.decode(tokenizer, token_ids) | ||
| >>> print(text) # "Hello world!" | ||
| """ | ||
| pass | ||
|
|
||
| def get_vocab_size(tokenizer: str) -> int: | ||
| """Get the vocabulary size of the tokenizer. | ||
|
|
||
| Args: | ||
| tokenizer: Tokenizer handle | ||
|
|
||
| Returns: | ||
| Size of the vocabulary | ||
|
|
||
| Example: | ||
| >>> tokenizer = tokenizers.from_pretrained("bert-base-uncased") | ||
| >>> vocab_size = tokenizers.get_vocab_size(tokenizer) | ||
| >>> print(vocab_size) # 30522 | ||
| """ | ||
| pass | ||
|
|
||
| def token_to_id(tokenizer: str, token: str) -> int: | ||
| """Convert a token string to its ID. | ||
|
|
||
| Args: | ||
| tokenizer: Tokenizer handle | ||
| token: Token string | ||
|
|
||
| Returns: | ||
| Token ID, or -1 if token not found | ||
|
|
||
| Example: | ||
| >>> tokenizer = tokenizers.from_pretrained("bert-base-uncased") | ||
| >>> token_id = tokenizers.token_to_id(tokenizer, "[CLS]") | ||
| >>> print(token_id) # 101 | ||
| """ | ||
| pass | ||
|
|
||
| def id_to_token(tokenizer: str, token_id: int) -> str: | ||
| """Convert a token ID to its string representation. | ||
|
|
||
| Args: | ||
| tokenizer: Tokenizer handle | ||
| token_id: Token ID | ||
|
|
||
| Returns: | ||
| Token string, or empty string if ID not found | ||
|
|
||
| Example: | ||
| >>> tokenizer = tokenizers.from_pretrained("bert-base-uncased") | ||
| >>> token = tokenizers.id_to_token(tokenizer, 101) | ||
| >>> print(token) # "[CLS]" | ||
| """ | ||
| pass |
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cp -Ris the portable form, compared tocp -r.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree