-
Notifications
You must be signed in to change notification settings - Fork 5
Integrate Loop Rolling and add BERT example #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
jsmonson
wants to merge
123
commits into
develop
Choose a base branch
from
dev/joshmonson/add-loop-rolling
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Initial commit * finn flow: pass absolute path names to finn * Added scripts for roofline analysis * Making the output save in the current directory * release v0.2.0 Enable 4 bits * Bringing up a branch that is just the plugin framework for the BERT ops that have been added * Initial cleanup script. Performs some simplification and does some surgery to remove the Dropout layer. For some reason the IdentityOps are not being removed * Added a simple input arg * Moving to bert_build * Added a transformation to reorder the inputs so that the remove IdentityOP transformation is effective. * Initial cut and laying the groundwork for plugin-based shuffle convert_to_hw operator * Getting stubs up for shuffle op and starting to populate some * Cleanup and some more asserts to check permutation list and shapes match up * Initial helper functions for shuffle work * Adding the input_generator for the cases where the inner dimension is not migrating. * Adding latest version of the onnx model and combining cleanup and bringup scripts into a single build script with multiple steps. * Added the infer QuantSoftMax to the pipecleaner build script, renamed the brevitas script * First cut at shuffle specialise layer * Registering Shuffle_hls * Added convert step that is currently skipped * Added a step that attempts to specialise layers on the pipecleaner model * Using fpgapart from the config instead * fixed model * adding some streamlining steps to the build flow which are passing through on the modified input model * Initial commit * finnbrainsmith integration * Added a simple README for now * fixing typoe thanks @auphelia * Initial build shuffle tests up" * populating member functions for getting the dtype and instream/outstream width for HLS generation * Adding the loop_coeffs to the attribute types dict * Needed to give nodes unique names to start generating hardware * Adding a custom HLSBackend where the tcl generation is overridden so that we can include the hlsextension directory * Fixing some portname issues in the generated HLS code * IP successfully building * Added cppsim support, passed suspiciously easily * Added some temporary stop-gaps with a brainsmith_templates so that we can support vector inputs before they appear in finn/dev * Fixing loop bound/coefficient zipping ordering * Reshaping now happening properly and avoiding cppsim segfault * removing IPgen step... for now... * Adding testing from pytorch for the shuffles * cppsim from pytorch to hw is passing * Ramping up testing for all the shuffle types * Removing redundant reshape in testing * First cut at rtlsim support for shuffles * First shuffle RTLSim tests passing * cleaning up the test a little * Cleaning up the InferShuffle transformation * shuffle cppsim codegen cleanup * fixing bug with shape of output when a reshape was present * Needed to increase liveness threshold to get all the rtlsim's to pass' * Bigger bump needed? * [BugFix] Fixed issue with using old Brevitas API for quant_act_scale. * Was including the file from the location * Using the plugin's template now * Removing test that doesn't make sense anymore * Removing INT16 for now focusing testing on INT8 for EoY goal * Adding the latest Brevitas bert build script and starting work on the cleanup scripts * Datatype name fix * cppsim integration * Fixing issues with the decapitation step * Added model tail removal custom step * Cleaning up the cleanup script * Removing redundant cleanup step * Adding an endtoend script and updating the README * Ensuring hash's and branches are consistent on the README * Added a minimal initial endtoend test * test fixed * Added a switch to end2end test to attempt IP generation (this is currently failing) * Extended the test to track how many ops have been successfully specialised and what percentage * Have the end2end test export a json dashboard file instead for tracking progress. * refactoring the endtoend test a bit to use fixtures and track progress through the build process * Updated testing to track various bits * RTLSim for QuantSoftMax * Removing prepare_rtlsim stub * QuantSoftMax RTLSim bugfixes (working now) * fix issue of passing datatypes instead of datatype strings * Adding template types to the treereduction operation * cppsim compiling, for the half it required some casting that I was not quite sure about. * ensure that the context array is np.float32 * Getting stuff working with the latest changes * Clean up remove head and add streamlining steps * Add streamlining steps for softmax * add gather to crop * Fixing linker library paths and include directories for 2024.2 compatibility * Cleanup * tracking individual steps now with fixtures dependencies, also added the ability to dump data to the dashboard json file * Refactored testing so that each step in the build flow is a separate pytest fixture. If we want to add a test at any point in the build flow we can just pass the step fixture in as an argument and then the cached build at that specific point will be picked up" * Starting to bring in the default steps * Generate a test for each step added automatically * Trying as much of the default flow as possible * removing tests that don't make sense right now * fixing the custom steps * Remove call to default convert_to_hw * Reverting back to old specialise layers * need dataflow partition, comment out for now * Removing duplication of the custom steps for BERT and duplicated scripts * updating endtoend script to include some of the default steps * commenting out the last few steps for now * Add a check at the end to see if hls synth went okay * dashboard json data update * Cleaning up the custom steps * Docstring explanations of the custom_steps required for BERT also cleaned up the flow a bit * bringing up validation testing of some of the steps * Adding python execution model for the shuffle * Added a small function for validation that when a test fails will examine the contexts and show what is the same and what differs * Silly mistake with the shuffle execute, it was not writing the result back into the context but was returning it * Elemwise integration * Adding UINT8 testcase which is the same as the BERT model * Increasing the timeout on softmax tests * Changing paths to match new 2024.2 directory structure * keep things float32 for now * Fixing case issue on SIMD attribute allowed the compilation to go further * boilerplate prepare_rtl sim is okay now, removing overridden version * Input int8, 2024.2 update * FuncLayerNorm bugfix and FLOAT32 testcase * "exec_mode" fix and code cleanup * Merge feature/plugin/layernorm_stf * support multiple lines * Added template parameter to enable/disable the quant stage at the end of the softmax * Adjusting the nodeattr for shuffle so that it is compatible with the set_target_fps transformation * QuantSoftMax nodeattr compatibility with set_fps_target transformation * Adding nodeattr so that layernorm is compatible with set_target_fps transformations * simd to SIMD * Non Quant softmax passing cppsim * Validation is having a lot more success with HWSoftMax rather than QuantSoftMax * reintroducing some essential streamlining steps, validation looking a lot better * Endtoend up without fps_target yet * integer cycles to stop issue in set_fifo_depths * Using the v80 part number for the softmax tests * Fix for the issue causing the stitched rtl sim stall * Setting reasonable fps target for initial pipecleaning * Fix for infering the datatypes in the shuffle node thanks @auphelia * Adding some configuration files for the bert end2end flow * Added some expected input and output npy files * Removing start step * Adding correct expected output * Adding an RTLSim node-by-node test to the pytests. Adjusting the configuration for a default build flow. * Adding more rtlsim based testing to the end2end pytests * Saving the context of the node-by-node runs under a different dir name * generate a reference IO each time due to randomly generated weights in brevitas script * Adding a custom step that generates the reference IO for each run for validation * SIMD parameter for shuffles in testing is now properly being set, some tests are now failing cppsim and need fixing * Not every loop coeff should be divided by simd * Fixed the shuffle SIMD issue * Making more command line arguments available for the parameter sweeping for the bert_build demo scripts * Woops left in note * Removing the custom debugging steps from the build flow * Adding an example bash script to sweep over some parameters. * Added a simple script to print the results of param sweep * Cleaning up to remove c++17 warning * Tidying up comments / warnings for demos * Using board instead of fpga_part * Making the output look a bit neater * Removing unused validation steps * fix param sweep * Slight tweak to example param sweep script * Adding a makefile and configs for some single layer and three layer configurations. * We have some large fifos in these builds that need to be split. * Updating the Brevitas model as per @nfraser suggestion * Fix circular make dependency * Works using later qonnx changes * New FIFO depth configurations for the three layers, folding configuration might not match the main plugin version though. * Added new preconfigured designs for latest brevitas changes. * Adding license file headers * updating to correct link in setup instructions * Tidying up QuantSoftMax/SoftMax * Cleaning up utils and testing * Cleaning up endtoend pytestingclear * Adding back in the bitwidth option for the parameter sweep with the new model generation * Added a parameter for changing the sequence length * Skipping LN test for now * Changed the artifact naming convention a little * Remove extraneous implementation of QuantizeLayerNormalization * Added a script to generate a config (pre FIFO depth sizing) for a particular folding configuration as we explore the DSE side of the Bert build * Added a makefile recipe for a maximum folding three layer design for passing to RW team * Adjusting number of layers on the design * Manually control the fifo depth stage instead of setting it if a param file is present * Need to come up with better arg naming for parameters, maybe just enforce longargs? * Makefile recipies use the generation script for various SIMD/PE configurations rather than prebaking them --------- Co-authored-by: aziz bahri <azizb@amd.com> Co-authored-by: azizb-xlnx <48930381+azizb-xlnx@users.noreply.github.com> Co-authored-by: root <root@TAFK> Co-authored-by: Thomas Keller <thomaskeller@microsoft.com> Co-authored-by: auphelia <jakobapk@web.de> Co-authored-by: Joshua Monson <joshmonson@microsoft.com> Co-authored-by: jsmonson <jsmonson@gmail.com>
* Added extra arguments to reflect latest change in finn/custom/transformer that enables you to override the number of inferences that the fifo depth sizing stage performs. * Fixing the recipies and simplifying
* Improvements to SoftMax hardware efficiency and also adding support for ap_float<W,I> datatypes. * Fixes and compiler integration for new SoftMax * fixing license header
…es on three layer designs (#9) * Adding check to make sure that we don't accidentally set SIMD for shuffleB yet, also updated the config generation so that we do not accidentally set the wrong shuffle in later layers * Cleaning up the build scripts a little thanks @auphelia * Moving the constraining of shuffle paramemters and pumpedCompute to temporary custom transformations so that they are more reliable * Removing the temporary check and relying on the custom pass for now until the parallel transpose op comes online * Fixed the return type of the custom transformations
* Added cycle testing to softmax test script Implemented cycle testing code, which compares the layer's rtlsim cycles with its expected cycles (found using QONNX's ModelWrapper.analysis). Copied from https://github.com/Xilinx/finn/blob/00bf8279f2ed20500f3046b395b24c08c8c82325/tests/fpgadataflow/test_fpgadataflow_fmpadding.py * Updated cycles test op type, imported exp_cycles_per_layer - The rtlsim cycles test for the softmax custom op was failing due to the incorrect op type string being used ("FMPadding" instead of "HWSoftmax"). - The FINN method, exp_cycles_per_layer, was not imported, causing the test to fail. * Implemented cycles test for Shuffle custom op - Implemented test to test_fpgadataflow_shuffle.py which compares the Shuffle node's expected cycles with the rtlsim's outputted cycles. - Ran this test, it currently fails. The expected cycles (12288) do not fall within a tolerance of 10 of the rtlsim cycles (23475). * Implemented alternate LayerNorm test script - The existing LayerNorm test is incomplete, and doesn't execute. To bridge the gap in testing, a new test was written based on other custom operations tests. - The new test, test_fpga_dataflow_layernorm_hw_custom_op(), is in the same file as the old test. - The cppsim version of the test currently passes. The rtlsim version fails due to the expected cycles (456) not matching the simulated cycles (63516). Testing was done using the [ifm_dim0-rtlsim-INT9-simd4-hls] configuration. * Removed rtlsim_trace from LayerNorm, updated comments Implemented reviewer suggested changes: - Removed rtlsim_trace attribute from the test's LayerNorm node. - Updated comments: - In construct_onnx_model()'s header comment, changed "Finn" -> "FINN", added info about the LayerNorm's Scale and Bias tensors. - In test_fpga_dataflow_layernorm_hw_custom_op()'s header comment, explained that this test is missing the inferred eltwise operations.
…flow (#15) * Removing the accidentally included startstep in the endtoend flow * Restoring the default to 8 for bitwidth
Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>
* Include the reference IO as part of the metadata handover * typo fix
* Added cycle testing to softmax test script Implemented cycle testing code, which compares the layer's rtlsim cycles with its expected cycles (found using QONNX's ModelWrapper.analysis). Copied from https://github.com/Xilinx/finn/blob/00bf8279f2ed20500f3046b395b24c08c8c82325/tests/fpgadataflow/test_fpgadataflow_fmpadding.py * Updated cycles test op type, imported exp_cycles_per_layer - The rtlsim cycles test for the softmax custom op was failing due to the incorrect op type string being used ("FMPadding" instead of "HWSoftmax"). - The FINN method, exp_cycles_per_layer, was not imported, causing the test to fail. * Implemented cycles test for Shuffle custom op - Implemented test to test_fpgadataflow_shuffle.py which compares the Shuffle node's expected cycles with the rtlsim's outputted cycles. - Ran this test, it currently fails. The expected cycles (12288) do not fall within a tolerance of 10 of the rtlsim cycles (23475). * Implemented alternate LayerNorm test script - The existing LayerNorm test is incomplete, and doesn't execute. To bridge the gap in testing, a new test was written based on other custom operations tests. - The new test, test_fpga_dataflow_layernorm_hw_custom_op(), is in the same file as the old test. - The cppsim version of the test currently passes. The rtlsim version fails due to the expected cycles (456) not matching the simulated cycles (63516). Testing was done using the [ifm_dim0-rtlsim-INT9-simd4-hls] configuration. * Removed rtlsim_trace from LayerNorm, updated comments Implemented reviewer suggested changes: - Removed rtlsim_trace attribute from the test's LayerNorm node. - Updated comments: - In construct_onnx_model()'s header comment, changed "Finn" -> "FINN", added info about the LayerNorm's Scale and Bias tensors. - In test_fpga_dataflow_layernorm_hw_custom_op()'s header comment, explained that this test is missing the inferred eltwise operations. * Created OpTest class for abstracting CustomOp tests - This class helps reduce shared boilerplate code between tests for custom FINN ops. - The OpTest class is designed to be inherited by custom test classes. These custom test classes will inherit pre-written commonly used tests, and helper functions to make writing tests easier. - An example of a test designed using OpTest can be found at the end of `./test/fpgadataflow/test_fpgadataflow_layernorm.py`. - While functional, the class is still a work in progress, and more functionality will be added in alignment with the needs of the engineers who use it. * Applied linting - Applied linting using black's default settings. * Created target_fpga fixture, removed prints, added SIMD ids - Target FPGA, as used by the model_specialise fixture, is now a fixture, which can be overridden by a test class. - Removed print statements in op_test.py that were used for debugging - Added IDs to TestLayerNorms SIMD parameters. Pytest now displays SIMD1, SIMD2, SIMD4, instead of 1, 2, 4. More human-readable! * Implemented reviewer suggestions, new 'target_node' fixture, improved typing - Implemented @STFleming 's suggestions: - The `exec_mode` comparsisons at lines 65 and 68 now use `==` instead of `is`. - The reference to `LayerNorm` in the comment at line 173 has been removed. - `apply_transforms()` no longer uses an `assert`, instead it raises a `RuntimeError`. - Implemented a new fixture, `target_node()`. This fixture returns an integer, specifiying the index in the model of the node we're testing. This means a model can contain nodes/layers other than the the one we want to test. - Improved typing consistency throughout 'op_test.py': `input_tensors()` and `apply_transforms()` were missing parameter type hints.
* Formatting bert_build as a job * Further iteration/brainstorming * Initial FINN docker transplant * Adding deps to git ignore * [Deps] Restructure python github repo installs (#8) Co-authored-by: auphelia <jakobapk@web.de> * Initial docker structuring for BrainSmith * entrypoint path bugfix * [Docker] Enable interactive mode for docker container (#10) * Added model profiling scripts * Hotpatch to remove pyverilator * Normalize line endings in SUPPORT.md * finnbrainsmith --> brainsmith/finnlib paths * Tools folder restructure * Fix gen_bert paths & name in expand_norms * Custom QONNX branch to fix is_finn * Removed old QuantLayerNorm func * Initial job runner structuring * Job structure v0, structure for profiling improvements * Updated readme * Template path fix * Unsued import and formatting cleanup * FP IP import fix * Docker updates for pyxsi * Pyxsi path fix * Onnx path + linting fixes * Removed finnlib, moving up sub folders * Moved run_job to core for consistency * Linting cleanup * Updated README * Added RTL placeholder * Typo & gitignore fixes * Updated finnlib to brainsmith in tests * bert_steps path fix in tests * Fix punctuation in README instructions. * Update LICENSE: Brainsmith name fix Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com> * Update LICENSE: Brainsmith name fix 2 Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com> * Update README.md - typo fix Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com> * Brainsmith name fix Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com> * Update brainsmith/tools/README.md: Brainsmith name fix Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com> * Update docker/entrypoint.sh: Brainsmith name fix Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com> * Update docker/entrypoint.sh: Brainsmith name fix Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com> * Removed exec from fetch_repos * Copyright typo fix --------- Co-authored-by: Thomas Keller <thomaskeller@microsoft.com> Co-authored-by: auphelia <jakobapk@web.de> Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>
* add custom onnxscript branch * Add TODO for reconciling onnxscript dependencies --------- Co-authored-by: Joshua Monson <joshmonson@microsoft.com> Co-authored-by: Thomas Keller <tkeller787@gmail.com>
* Initial attempt at docker build action * Added branch name to action * PR & weekly tests for dev/ci-actions * Added self-hosted runner * Adjusted runs-on label * path fix * Added debug to orient pwd * Added pytest keyword through run-docker.sh * Fixed license path * Updated upload-artifats to v4 * Reorganize bert demo for github action * Updated run-docker CLI args * Added e2e test to actions * Removed build artifacts * Fix ci.yml run-docker statement * Removed "push" trigger * Merge with develop changes and add num workers env variable * Re-added push trigger for testing * Fix merge * Temporarily disabled docker and pytest for e2e validation * Fix BSMITH_BUILD_DIR env variable * Remove push trigger, since PR trigger is sufficient * Remove tesing branches and triggers for PR * Remove auto-gen docs * Delete demos/bert/configs/l1_simd12_pe8.json Removed extraneous config from test --------- Co-authored-by: Ubuntu <azureuser@brainsmith-dev2.woh15gx5mv0exiu0m5xe0hjytg.dx.internal.cloudapp.net>
* add custom onnxscript branch * fix torch error * readd todo --------- Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
* fix formatting with copilot * fix dynamic matmul config when sizing is not divisble by 3 --------- Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
…me (#31) * fix argparse arg that could never be false * update fifosizing arg in hw compiler to match new argument name --------- Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
* Added cleanup steps and job * Made num_default_worker env variable
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
…metadata_fixes Metadata fixes for trained model
… into dev/sfleming/trainedbert_mlo
update loop_body_hierarhcy to list of lists
…ific MLO information required for shell bringup
…were not test in the brainsmith environment
…rainsmith into dev/joshmonson/add-loop-rolling
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Waiting for Finn Pull Request to complete... before moving to full PR.