update master branch from embench-2.0-branch#211
Open
lesteral wants to merge 49 commits intoembench:masterfrom
Open
update master branch from embench-2.0-branch#211lesteral wants to merge 49 commits intoembench:masterfrom
master branch from embench-2.0-branch#211lesteral wants to merge 49 commits intoembench:masterfrom
Conversation
This change removes all floating-point operations from the benchmark,
and reduces the size of the x86 executable to 57k. It also enables
the use of deeper trees (max_depth increased from 4 to 5), which
slightly increases the complexity of the benchmark. Overall
accuracy on the 8x8 downscaled MNIST dataset is 95.82%.
Update xgboost benchmark to use uint8-quantized weights
If we call exit, we end up pulling in the C standard library. * support/beebsc.c: Use assert_beebs rather than assert with init_heap_beebs. * support/beebsc.h: rewrite assert_beebs to not use exit. Signed-off-by: Jeremy Bennett <jeremy.bennett@embecosm.com>
We separate out the CPU_MHZ into its two roles. The first uses GLOBAL_SCALE_FACTOR to scale the benchmarks when building so each runs in around 4 seconds. The second is to work out the Embench score per MHz. We now scale the benchmarks, with two nested loops, one for the LOCAL_SCALE_FACTOR and one for the GLOBAL_SCALE_FACTOR. This allows us to not overflow the loop count with 8/16-bit architectures, while being able to scale up to modern big fast machines. We adjust LOCAL_SCALE_FACTOR values for the benchmarks kept from Embench IoT 1.0 to take account of improvements in compiler performance. * baseline-data/speed.json: Updated for Embench 2.0. * benchmark_speed.py: Script updated for new GLOBAL_SCALE_FACTOR; remove parallel execution; new options to generate MD and CSV output.f; generate total and per MHz scores for relative results. * doc/README.md: Updated to document GLOBAL_SCALE_FACTOR. * examples/arm/stm32f4-discovery/README.md: Updated to use GLOBAL_SCALE_FACTOR. * pylib/embench_core.py: Add MD and CSV to class output_format; move stats output functions to benchmark_speed.py. * pylib/run_stm32f4-discovery.py: Move --cpu_mhz to benchmark_speed.py, pass args to functions. * sconstruct.py: Add --gsf option and help test, remove trailing whitespace. * src/aha-mont64/mont64.c: Use LOCAL_SCALE_FACTOR and GLOBAL_SCALE_FACTOR in nested loop to scale performance. * src/crc32/crc_32.c: Likewise. * src/depthconv/depthconv.c: Likewise. * src/edn/libedn.c: Likewise. * src/huffbench/libhuffbench.c: Likewise. * src/matmult-int/matmult-int.c: Likewise. * src/md5sum/md5.c: Likewise. * src/nettle-aes/nettle-aes.c: Likewise. * src/nettle-sha256/nettle-sha256.c: Likewise. * src/nsichneu/libnsichneu.c: Likewise. * src/picojpeg/picojpeg_test.c: Likewise. * src/qrduino/qrtest.c: Likewise. * src/sglib-combined/combined.c: Likewise. * src/slre/libslre.c: Likewise. * src/statemate/libstatemate.c: Likewise. * src/tarfind/tarfind.c: Likewise. * src/ud/libud.c: Likewise. * src/wikisort/libwikisort.c: Likewise. * src/xgboost/testbench.c: Likewise. Signed-off-by: Jeremy Bennett <jeremy.bennett@embecosm.com>
* sconstruct.py: Set up the environment from the parent process. Signed-off-by: Jeremy Bennett <jeremy.bennett@embecosm.com>
The previous data, fell foul of the scons config not importing the environment, so in fact was with system GCC 13.2. This correctly has data for GCC 14.1, and adjusts local scale factors accordingly. * baseline-data/speed.json: Updated data for GCC 14.1. * src/aha-mont64/mont64.c: Adjust LOCAL_SCALE_FACTOR. * src/edn/libedn.c: Likewise. * src/huffbench/libhuffbench.c: Likewise. * src/matmult-int/matmult-int.c: Likewise. * src/md5sum/md5.c: Likewise. * src/nettle-aes/nettle-aes.c: Likewise. * src/nettle-sha256/nettle-sha256.c: Likewise. * src/sglib-combined/combined.c: Likewise. * src/sglib-combined/sglib.h: Likewise, also replace assert by assert_beebs throughout. * src/slre/libslre.c: Adjust LOCAL_SCALE_FACTOR. * src/statemate/libstatemate.c: Likewise. * src/tarfind/tarfind.c: Likewise. * src/ud/libud.c: Likewise. * src/wikisort/libwikisort.c: Likewise. Signed-off-by: Jeremy Bennett <jeremy.bennett@embecosm.com>
* baseline-data/size.json: Updated values for Embench 2.0 * benchmark_size.py: Extend to measure BSS separately, add CSV and MarkDown output formats, generate statistics for relative runs. Signed-off-by: Jeremy Bennett <jeremy.bennett@embecosm.com>
* benchmark_speed.py (benchmark_speed): Ensure res is set before use. * pylib/run_stm32f4-discovery.py: Add dictionary of exported functions. Signed-off-by: Jeremy Bennett <jeremy.bennett@embecosm.com>
We have updated the defaults, to be based on using garbage collection of
unused sections. The baseline data for speed is from a run configured
with:
scons --config-dir=examples/arm/stm32f4-discovery/ \
cc=arm-none-eabi-gcc \
cflags='-O2 -mcpu=cortex-m4 -mthumb -mfloat-abi=soft -ffunction-sections -fdata-sections' \
ldflags='-O2 -Wl,--gc-sections -mcpu=cortex-m4 -mthumb -mfloat-abi=soft -T${CONFIG_DIR}/STM32F407IGHX_FLASH.ld -L${CONFIG_DIR} -static -nostartfiles' \
user_libs='m startup' gsf=16
with results collected using:
./benchmark_speed.py --target-module run_stm32f4-discovery \
--gdb-command gdb-multiarch --cpu-mhz 16 --gsf 16 --absolute \
--baseline-output
The baseline for size is from a run configured with:
scons --config-dir=examples/arm/stm32f4-discovery/ cc=arm-none-eabi-gcc \
cflags='-Os -ffunction-sections -fdata-sections -mcpu=cortex-m4 -mfloat-abi=soft -mthumb ' \
ldflags='-Os -Wl,--gc-sections -mcpu=cortex-m4 -mfloat-abi=soft -mthumb -T${CONFIG_DIR}/STM32F407IGHX_FLASH.ld -L${CONFIG_DIR} -static -nostartfiles' \
user_libs='m startup' gsf=1
with results collected using:
./benchmark_size.py --absolute --baseline-output
* baseline-data/size.json: Update data.
* baseline-data/speed.json: Likewise.
Signed-off-by: Jeremy Bennett <jeremy.bennett@embecosm.com>
This is a read through to clarify wording, and ensure consistency for Embench 2.0 and its Arm reference board. * README.md: Updated for Embench 2.0. * doc/Makefile: Correct spelling of hunspell dictionary * doc/README.md: Updated for Embench 2.0. * doc/custom.wordlist: Add new words needed for updated documentation. * examples/arm/stm32f4-discovery/README.md: Updated for Embench 2.0. Signed-off-by: Jeremy Bennett <jeremy.bennett@embecosm.com>
* examples/riscv32/cv32e40pv2fpga/README.md: Created. * examples/riscv32/cv32e40pv2fpga/boardsupport.c: Created. * examples/riscv32/cv32e40pv2fpga/boardsupport.h: Created. * examples/riscv32/cv32e40pv2fpga/link.ld: Created. * examples/riscv32/cv32e40pv2fpga/openocd-nexys-hs2.cfg: Created. * examples/riscv32/cv32e40pv2fpga/unilink.ld: Created. Signed-off-by: Jeremy Bennett <jeremy.bennett@embecosm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
@jeremybennett - Can you please update the
masterbranch, from theembench-2.0-branch, as per this PR?Thanks & regards, Lester