Skip to content

Added support for SalamandraTA 7B instructed #10

Merged
carlosep93 merged 3 commits intoprojecte-aina:mainfrom
carlosep93:main
Apr 22, 2025
Merged

Added support for SalamandraTA 7B instructed #10
carlosep93 merged 3 commits intoprojecte-aina:mainfrom
carlosep93:main

Conversation

@carlosep93
Copy link
Contributor

This PR is to add support to the SalamadraTA 7B Instruct models. The main changes are:

  • Added a new model type: salamandra_instruct. The idea is to support the current model and also the future 2B model.
  • Modified config files to download SalamandraTA 7B from the API.

How to run the API:

python main.py --load MULTI-MULTI-salamandra_instruct     

Where MULTI-MULTI-salamandra_instruct means: Multilingual source, Multilingual target and salamandra_instruct model type.

To send a request to the model:

curl --location --request POST 'http://0.0.0.0:8000/api/v1/translate' --header 'Content-Type: application/json' --data-raw '{"src":"ca", "tgt":"es", "text":"c'\''Això és una prova."}'

is_model_loaded, is_tokenizer_loaded = False, False

def translator(src_texts, src, tgt):
print(lang_map)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove print

def translator(src_texts, src, tgt):
print(lang_map)
if lang_map:
src = lang_map.get(src) if src in lang_map else src

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: do we want ot use the src value provided if not in the language map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this part comes from the bilingual models. Without the source the API cannot choose the model to use.

generated_text = tokenizer.decode(outputs[0, input_length:], skip_special_tokens=True)
return generated_text

return [salamandra_inst_translator(text, src, tgt, max_length=400)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what will be the src value if not langmap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default it would be "es" (Spanish), if I'm not wrong.

@carlosep93 carlosep93 merged commit 5159b95 into projecte-aina:main Apr 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants