Conversation
|
I was just downloading it and will give it a try soon. |
|
@ggerganov Any idea what the deal is with this? /usr/bin/ld: ../../bin/libllama.so.0.0.1: undefined reference to `bool llama_model_loader::get_key_or_arr<float, 512ul>(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::array<float, 512ul>&, unsigned int, bool)' Getting it locally too, not just on CI. |
| case LLM_ARCH_STEP35: | ||
| { | ||
| ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps); | ||
|
|
||
| hparams.swa_type = LLAMA_SWA_TYPE_STANDARD; | ||
|
|
||
| // MoE + SWA parameters | ||
| ml.get_key(LLM_KV_EXPERT_FEED_FORWARD_LENGTH, hparams.n_ff_exp); | ||
| ml.get_key(LLM_KV_EXPERT_SHARED_FEED_FORWARD_LENGTH, hparams.n_ff_shexp, false); | ||
| ml.get_key(LLM_KV_EXPERT_GATING_FUNC, hparams.expert_gating_func, false); | ||
| ml.get_key(LLM_KV_EXPERT_WEIGHTS_SCALE, hparams.expert_weights_scale, false); | ||
| ml.get_key(LLM_KV_EXPERT_WEIGHTS_NORM, hparams.expert_weights_norm, false); | ||
|
|
||
| // Step35 uses sigmoid gating by default (if not set in GGUF) | ||
| if (hparams.expert_gating_func == LLAMA_EXPERT_GATING_FUNC_TYPE_NONE) { | ||
| hparams.expert_gating_func = LLAMA_EXPERT_GATING_FUNC_TYPE_SIGMOID; | ||
| } | ||
|
|
||
| ml.get_key(LLM_KV_ATTENTION_SLIDING_WINDOW, hparams.n_swa); | ||
| ml.get_key_or_arr(LLM_KV_ATTENTION_SLIDING_WINDOW_PATTERN, hparams.swa_layers, hparams.n_layer); | ||
| ml.get_key_or_arr(LLM_KV_ROPE_DIMENSION_COUNT_PER_LAYER, hparams.rope_dim_per_layer, hparams.n_layer); | ||
| ml.get_key_or_arr(LLM_KV_SWIGLU_LIMITS, hparams.swiglu_limits, hparams.n_layer); | ||
| ml.get_key_or_arr(LLM_KV_SWIGLU_LIMITS_SHARED, hparams.swiglu_limits_shared, hparams.n_layer); | ||
|
|
||
| // Optional: Step35-only gating for applying rope scaling (HF: yarn_only_types). | ||
| // Default is 3 (apply on all layers) if the key is absent. | ||
| ml.get_key( | ||
| format("%s.rope.scaling.apply_mask", ml.get_arch_name().c_str()), | ||
| hparams.rope_scaling_apply_mask, | ||
| false | ||
| ); | ||
|
|
||
| hparams.has_rope_freq_base_per_layer = ml.get_key_or_arr( | ||
| format("%s.rope.freq_base_per_layer", ml.get_arch_name().c_str()), | ||
| hparams.rope_freq_base_per_layer, | ||
| hparams.n_layer, | ||
| false | ||
| ); | ||
|
|
||
| type = LLM_TYPE_UNKNOWN; | ||
| } break; |
There was a problem hiding this comment.
Maybe one of these does not compile correctly. On mac it builds OK btw.
|
@pwilkin Thanks a lot for putting this together so quickly, and for taking the time to extract the changes into a clean PR — we really appreciate the interest and effort around step3.5. We’re the authors of the step3.5 model, and we already have an official llama.cpp integration prepared, which we’re planning to submit very shortly (likely tomorrow). Since the upcoming PR will be the reference implementation we’ll be maintaining going forward, we’d prefer to have that land as the primary integration. For that reason, we were hoping this PR could be held off from merging for now. |
| ml.get_key( | ||
| format("%s.rope.scaling.apply_mask", ml.get_arch_name().c_str()), | ||
| hparams.rope_scaling_apply_mask, | ||
| false | ||
| ); |
There was a problem hiding this comment.
I think you will need something like this (const char* instead of std::string):
template bool llama_model_loader::get_key_or_arr<float, 512>(const char *, std::array<float, 512> & result, uint32_t n, bool required);
But I think it's better to just put this as a LLM_KV_*, like all other metadata keys
| apply_mask |= 0x1 | ||
| if "sliding_attention" in yarn_only_types: | ||
| apply_mask |= 0x2 | ||
| self.gguf_writer.add_uint32(f"{self.gguf_writer.arch}.rope.scaling.apply_mask", int(apply_mask)) |
There was a problem hiding this comment.
Better to add this as a self.gguf_writer.add_* function
| raise ValueError(f"Unprocessed experts: {experts}") | ||
|
|
||
|
|
||
| @ModelBase.register("Step35ForCausalLM") |
There was a problem hiding this comment.
Should this be:
| @ModelBase.register("Step35ForCausalLM") | |
| @ModelBase.register("Step3p5ForCausalLM") |
|
Closing as duplicate of #19283 |
Since the model creators put up a llama.cpp fork, but they did it by bare directory upload and the llama.cpp version is quite far outdated, I attempted to pull a clean PR from their code. Not guaranteeing it's all correct, but putting it up since there's probably going to be lots of interest in the new model.