Skip to content

Incomplete fallback in libaz for CPUs without AVX2 #264

@Yorwba

Description

@Yorwba

I was curious how well SmallThinker would run on a low-end laptop CPU, so I followed the instructions in the SmallThinker README for compiling on x86, but got the following compilation error:

smallthinker/powerinfer/libaz/az/cpu/vec_dot.cpp:381:31: error: use of undeclared identifier 'mul_sum_i8_pairs'
  381 |         const __m128i i32_0 = mul_sum_i8_pairs(bx_0, by_0);

This seems to be because mul_sum_i8_pairs is used in the SSSE3 fallback branch
https://github.com/SJTU-IPADS/PowerInfer/blob/d3ebd7c5666348cf43c22f0d62dfbc9a763cffb8/smallthinker/powerinfer/libaz/az/cpu/vec_dot.cpp#L358-L381
even though it is only defined when AVX2 is available
https://github.com/SJTU-IPADS/PowerInfer/blob/d3ebd7c5666348cf43c22f0d62dfbc9a763cffb8/smallthinker/powerinfer/libaz/az/cpu/vec_dot.cpp#L12-L17

A similar issue is likely to affect the AVX branch, since it uses sum_i16_pairs_float,
https://github.com/SJTU-IPADS/PowerInfer/blob/d3ebd7c5666348cf43c22f0d62dfbc9a763cffb8/smallthinker/powerinfer/libaz/az/cpu/vec_dot.cpp#L351
which is also only defined for the AVX2 case.

I would fully understand if you don't want to support CPUs without certain vector intrinsics, but it would be nice to have this indicated explicitly.

For reference, my CPU:

$ lscpu | grep "Model name\|Flags"
  • Model name: Intel(R) Pentium(R) Silver N6000 @ 1.10GHz
  • Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust smep erms rdt_a rdseed smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req vnmi umip waitpkg gfni rdpid movdiri movdir64b md_clear flush_l1d arch_capabilities

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions