Skip to content

Conversation

@jsmonson
Copy link
Contributor

No description provided.

microsoft-github-operations bot and others added 30 commits October 8, 2024 00:07
* Initial commit

* finn flow: pass absolute path names to finn

* Added scripts for roofline analysis

* Making the output save in the current directory

* release v0.2.0

Enable 4 bits

* Bringing up a branch that is just the plugin framework for the BERT ops that have been added

* Initial cleanup script. Performs some simplification and does some surgery to remove the Dropout layer. For some reason the IdentityOps are not being removed

* Added a simple input arg

* Moving to bert_build

* Added a transformation to reorder the inputs so that the remove IdentityOP transformation is effective.

* Initial cut and laying the groundwork for plugin-based shuffle convert_to_hw operator

* Getting stubs up for shuffle op and starting to populate some

* Cleanup and some more asserts to check permutation list and shapes match up

* Initial helper functions for shuffle work

* Adding the input_generator for the cases where the inner dimension is not migrating.

* Adding latest version of the onnx model and combining cleanup and bringup scripts into a single build script with multiple steps.

* Added the infer QuantSoftMax to the pipecleaner build script, renamed the brevitas script

* First cut at shuffle specialise layer

* Registering Shuffle_hls

* Added convert step that is currently skipped

* Added a step that attempts to specialise layers on the pipecleaner model

* Using fpgapart from the config instead

* fixed model

* adding some streamlining steps to the build flow which are passing through on the modified input model

* Initial commit

* finnbrainsmith integration

* Added a simple README for now

* fixing typoe thanks @auphelia

* Initial build shuffle tests up"

* populating member functions for getting the dtype and instream/outstream width for HLS generation

* Adding the loop_coeffs to the attribute types dict

* Needed to give nodes unique names to start generating hardware

* Adding a custom HLSBackend where the tcl generation is overridden so that we can include the hlsextension directory

* Fixing some portname issues in the generated HLS code

* IP successfully building

* Added cppsim support, passed suspiciously easily

* Added some temporary stop-gaps with a brainsmith_templates so that we can support vector inputs before they appear in finn/dev

* Fixing loop bound/coefficient zipping ordering

* Reshaping now happening properly and avoiding cppsim segfault

* removing IPgen step... for now...

* Adding testing from pytorch for the shuffles

* cppsim from pytorch to hw is passing

* Ramping up testing for all the shuffle types

* Removing redundant reshape in testing

* First cut at rtlsim support for shuffles

* First shuffle RTLSim tests passing

* cleaning up the test a little

* Cleaning up the InferShuffle transformation

* shuffle cppsim codegen cleanup

* fixing bug with shape of output when a reshape was present

* Needed to increase liveness threshold to get all the rtlsim's to pass'

* Bigger bump needed?

* [BugFix] Fixed issue with using old Brevitas API for quant_act_scale.

* Was including the file from the location

* Using the plugin's template now

* Removing test that doesn't make sense anymore

* Removing INT16 for now focusing testing on INT8 for EoY goal

* Adding the latest Brevitas bert build script and starting work on the cleanup scripts

* Datatype name fix

* cppsim integration

* Fixing issues with the decapitation step

* Added model tail removal custom step

* Cleaning up the cleanup script

* Removing redundant cleanup step

* Adding an endtoend script and updating the README

* Ensuring hash's and branches are consistent on the README

* Added a minimal initial endtoend test

* test fixed

* Added a switch to end2end test to attempt IP generation (this is currently failing)

* Extended the test to track how many ops have been successfully specialised and what percentage

* Have the end2end test export a json dashboard file instead for tracking progress.

* refactoring the endtoend test a bit to use fixtures and track progress through the build process

* Updated testing to track various bits

* RTLSim for QuantSoftMax

* Removing prepare_rtlsim stub

* QuantSoftMax RTLSim bugfixes (working now)

* fix issue of passing datatypes instead of datatype strings

* Adding template types to the treereduction operation

* cppsim compiling, for the half it required some casting that I was not quite sure about.

* ensure that the context array is np.float32

* Getting stuff working with the latest changes

* Clean up remove head and add streamlining steps

* Add streamlining steps for softmax

* add gather to crop

* Fixing linker library paths and include directories for 2024.2 compatibility

* Cleanup

* tracking individual steps now with fixtures dependencies, also added the ability to dump data to the dashboard json file

* Refactored testing so that each step in the build flow is a separate pytest fixture. If we want to add a test at any point in the build flow we can just pass the step fixture in as an argument and then the cached build at that specific point will be picked up"

* Starting to bring in the default steps

* Generate a test for each step added automatically

* Trying as much of the default flow as possible

* removing tests that don't make sense right now

* fixing the custom steps

* Remove call to default convert_to_hw

* Reverting back to old specialise layers

* need dataflow partition, comment out for now

* Removing duplication of the custom steps for BERT and duplicated scripts

* updating endtoend script to include some of the default steps

* commenting out the last few steps for now

* Add a check at the end to see if hls synth went okay

* dashboard json data update

* Cleaning up the custom steps

* Docstring explanations of the custom_steps required for BERT also cleaned up the flow a bit

* bringing up validation testing of some of the steps

* Adding python execution model for the shuffle

* Added a small function for validation that when a test fails will examine the contexts and show what is the same and what differs

* Silly mistake with the shuffle execute, it was not writing the result back into the context but was returning it

* Elemwise integration

* Adding UINT8 testcase which is the same as the BERT model

* Increasing the timeout on softmax tests

* Changing paths to match new 2024.2 directory structure

* keep things float32 for now

* Fixing case issue on SIMD attribute allowed the compilation to go further

* boilerplate prepare_rtl sim is okay now, removing overridden version

* Input int8, 2024.2 update

* FuncLayerNorm bugfix and FLOAT32 testcase

* "exec_mode" fix and code cleanup

* Merge feature/plugin/layernorm_stf

* support multiple lines

* Added template parameter to enable/disable the quant stage at the end of the softmax

* Adjusting the nodeattr for shuffle so that it is compatible with the set_target_fps transformation

* QuantSoftMax nodeattr compatibility with set_fps_target transformation

* Adding nodeattr so that layernorm is compatible with set_target_fps transformations

* simd to SIMD

* Non Quant softmax passing cppsim

* Validation is having a lot more success with HWSoftMax rather than QuantSoftMax

* reintroducing some essential streamlining steps, validation looking a lot better

* Endtoend up without fps_target yet

* integer cycles to stop issue in set_fifo_depths

* Using the v80 part number for the softmax tests

* Fix for the issue causing the stitched rtl sim stall

* Setting reasonable fps target for initial pipecleaning

* Fix for infering the datatypes in the shuffle node thanks @auphelia

* Adding some configuration files for the bert end2end flow

* Added some expected input and output npy files

* Removing start step

* Adding correct expected output

* Adding an RTLSim node-by-node test to the pytests. Adjusting the configuration for a default build flow.

* Adding more rtlsim based testing to the end2end pytests

* Saving the context of the node-by-node runs under a different dir name

* generate a reference IO each time due to randomly generated weights in brevitas script

* Adding a custom step that generates the reference IO for each run for validation

* SIMD parameter for shuffles in testing is now properly being set, some tests are now failing cppsim and need fixing

* Not every loop coeff should be divided by simd

* Fixed the shuffle SIMD issue

* Making more command line arguments available for the parameter sweeping for the bert_build demo scripts

* Woops left in note

* Removing the custom debugging steps from the build flow

* Adding an example bash script to sweep over some parameters.

* Added a simple script to print the results of param sweep

* Cleaning up to remove c++17 warning

* Tidying up comments / warnings for demos

* Using board instead of fpga_part

* Making the output look a bit neater

* Removing unused validation steps

* fix param sweep

* Slight tweak to example param sweep script

* Adding a makefile and configs for some single layer and three layer configurations.

* We have some large fifos in these builds that need to be split.

* Updating the Brevitas model as per @nfraser suggestion

* Fix circular make dependency

* Works using later qonnx changes

* New FIFO depth configurations for the three layers, folding configuration might not match the main plugin version though.

* Added new preconfigured designs for latest brevitas changes.

* Adding license file headers

* updating to correct link in setup instructions

* Tidying up QuantSoftMax/SoftMax

* Cleaning up utils and testing

* Cleaning up endtoend pytestingclear

* Adding back in the bitwidth option for the parameter sweep with the new model generation

* Added a parameter for changing the sequence length

* Skipping LN test for now

* Changed the artifact naming convention a little

* Remove extraneous implementation of QuantizeLayerNormalization

* Added a script to generate a config (pre FIFO depth sizing) for a particular folding configuration as we explore the DSE side of the Bert build

* Added a makefile recipe for a maximum folding three layer design for passing to RW team

* Adjusting number of layers on the design

* Manually control the fifo depth stage instead of setting it if a param file is present

* Need to come up with better arg naming for parameters, maybe just enforce longargs?

* Makefile recipies use the generation script for various SIMD/PE configurations rather than prebaking them

---------

Co-authored-by: aziz bahri <azizb@amd.com>
Co-authored-by: azizb-xlnx <48930381+azizb-xlnx@users.noreply.github.com>
Co-authored-by: root <root@TAFK>
Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>
Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: jsmonson <jsmonson@gmail.com>
* Added extra arguments to reflect latest change in finn/custom/transformer that enables you to override the number of inferences that the fifo depth sizing stage performs.

* Fixing the recipies and simplifying
* Improvements to SoftMax hardware efficiency and also adding support for ap_float<W,I> datatypes.

* Fixes and compiler integration for new SoftMax

* fixing license header
…es on three layer designs (#9)

* Adding check to make sure that we don't accidentally set SIMD for shuffleB yet, also updated the config generation so that we do not accidentally set the wrong shuffle in later layers

* Cleaning up the build scripts a little thanks @auphelia

* Moving the constraining of shuffle paramemters and pumpedCompute to temporary custom transformations so that they are more reliable

* Removing the temporary check and relying on the custom pass for now until the parallel transpose op comes online

* Fixed the return type of the custom transformations
* Added cycle testing to softmax test script
Implemented cycle testing code, which compares the layer's rtlsim cycles with its expected cycles (found using QONNX's ModelWrapper.analysis).
Copied from https://github.com/Xilinx/finn/blob/00bf8279f2ed20500f3046b395b24c08c8c82325/tests/fpgadataflow/test_fpgadataflow_fmpadding.py

* Updated cycles test op type, imported exp_cycles_per_layer
- The rtlsim cycles test for the softmax custom op was failing due to the incorrect op type string being used ("FMPadding" instead of "HWSoftmax").
- The FINN method, exp_cycles_per_layer, was not imported, causing the test to fail.

* Implemented cycles test for Shuffle custom op
- Implemented test to test_fpgadataflow_shuffle.py which compares the Shuffle node's expected cycles with the rtlsim's outputted cycles.
- Ran this test, it currently fails. The expected cycles (12288) do not fall within a tolerance of 10 of the rtlsim cycles (23475).

* Implemented alternate LayerNorm test script
- The existing LayerNorm test is incomplete, and doesn't execute. To bridge the gap in testing, a new test was written based on other custom operations tests.
- The new test, test_fpga_dataflow_layernorm_hw_custom_op(), is in the same file as the old test.
- The cppsim version of the test currently passes. The rtlsim version fails due to the expected cycles (456) not matching the simulated cycles (63516). Testing was done using the [ifm_dim0-rtlsim-INT9-simd4-hls] configuration.

* Removed rtlsim_trace from LayerNorm, updated comments
Implemented reviewer suggested changes:
- Removed rtlsim_trace attribute from the test's LayerNorm node.
- Updated comments:
  - In construct_onnx_model()'s header comment, changed "Finn" -> "FINN", added info about the LayerNorm's Scale and Bias tensors.
  - In test_fpga_dataflow_layernorm_hw_custom_op()'s header comment, explained that this test is missing the inferred eltwise operations.
…flow (#15)

* Removing the accidentally included startstep in the endtoend flow

* Restoring the default to 8 for bitwidth
Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>
* Include the reference IO as part of the metadata handover

* typo fix
* Added cycle testing to softmax test script
Implemented cycle testing code, which compares the layer's rtlsim cycles with its expected cycles (found using QONNX's ModelWrapper.analysis).
Copied from https://github.com/Xilinx/finn/blob/00bf8279f2ed20500f3046b395b24c08c8c82325/tests/fpgadataflow/test_fpgadataflow_fmpadding.py

* Updated cycles test op type, imported exp_cycles_per_layer
- The rtlsim cycles test for the softmax custom op was failing due to the incorrect op type string being used ("FMPadding" instead of "HWSoftmax").
- The FINN method, exp_cycles_per_layer, was not imported, causing the test to fail.

* Implemented cycles test for Shuffle custom op
- Implemented test to test_fpgadataflow_shuffle.py which compares the Shuffle node's expected cycles with the rtlsim's outputted cycles.
- Ran this test, it currently fails. The expected cycles (12288) do not fall within a tolerance of 10 of the rtlsim cycles (23475).

* Implemented alternate LayerNorm test script
- The existing LayerNorm test is incomplete, and doesn't execute. To bridge the gap in testing, a new test was written based on other custom operations tests.
- The new test, test_fpga_dataflow_layernorm_hw_custom_op(), is in the same file as the old test.
- The cppsim version of the test currently passes. The rtlsim version fails due to the expected cycles (456) not matching the simulated cycles (63516). Testing was done using the [ifm_dim0-rtlsim-INT9-simd4-hls] configuration.

* Removed rtlsim_trace from LayerNorm, updated comments
Implemented reviewer suggested changes:
- Removed rtlsim_trace attribute from the test's LayerNorm node.
- Updated comments:
  - In construct_onnx_model()'s header comment, changed "Finn" -> "FINN", added info about the LayerNorm's Scale and Bias tensors.
  - In test_fpga_dataflow_layernorm_hw_custom_op()'s header comment, explained that this test is missing the inferred eltwise operations.

* Created OpTest class for abstracting CustomOp tests
- This class helps reduce shared boilerplate code between tests for custom FINN ops.
- The OpTest class is designed to be inherited by custom test classes. These custom test classes will inherit pre-written commonly used tests, and helper functions to make writing tests easier.
- An example of a test designed using OpTest can be found at the end of `./test/fpgadataflow/test_fpgadataflow_layernorm.py`.
- While functional, the class is still a work in progress, and more functionality will be added in alignment with the needs of the engineers who use it.

* Applied linting
- Applied linting using black's default settings.

* Created target_fpga fixture, removed prints, added SIMD ids
- Target FPGA, as used by the model_specialise fixture, is now a fixture, which can be overridden by a test class.
- Removed print statements in op_test.py that were used for debugging
- Added IDs to TestLayerNorms SIMD parameters. Pytest now displays SIMD1, SIMD2, SIMD4, instead of 1, 2, 4. More human-readable!

* Implemented reviewer suggestions, new 'target_node' fixture, improved typing
- Implemented @STFleming 's suggestions:
  - The `exec_mode` comparsisons at lines 65 and 68 now use `==` instead of `is`.
  - The reference to `LayerNorm` in the comment at line 173 has been removed.
  - `apply_transforms()` no longer uses an `assert`, instead it raises a `RuntimeError`.
- Implemented a new fixture, `target_node()`. This fixture returns an integer, specifiying the index in the model of the node we're testing. This means a model can contain nodes/layers other than the the one we want to test.
- Improved typing consistency throughout 'op_test.py': `input_tensors()` and `apply_transforms()` were missing parameter type hints.
* Formatting bert_build as a job

* Further iteration/brainstorming

* Initial FINN docker transplant

* Adding deps to git ignore

* [Deps] Restructure python github repo installs (#8)

Co-authored-by: auphelia <jakobapk@web.de>

* Initial docker structuring for BrainSmith

* entrypoint path bugfix

* [Docker] Enable interactive mode for docker container (#10)

* Added model profiling scripts

* Hotpatch to remove pyverilator

* Normalize line endings in SUPPORT.md

* finnbrainsmith --> brainsmith/finnlib paths

* Tools folder restructure

* Fix gen_bert paths & name in expand_norms

* Custom QONNX branch to fix is_finn

* Removed old QuantLayerNorm func

* Initial job runner structuring

* Job structure v0, structure for profiling improvements

* Updated readme

* Template path fix

* Unsued import and formatting cleanup

* FP IP import fix

* Docker updates for pyxsi

* Pyxsi path fix

* Onnx path + linting fixes

* Removed finnlib, moving up sub folders

* Moved run_job to core for consistency

* Linting cleanup

* Updated README

* Added RTL placeholder

* Typo & gitignore fixes

* Updated finnlib to brainsmith in tests

* bert_steps path fix in tests

* Fix punctuation in README instructions.

* Update LICENSE: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update LICENSE: Brainsmith name fix 2

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update README.md - typo fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update brainsmith/tools/README.md: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update docker/entrypoint.sh: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Update docker/entrypoint.sh: Brainsmith name fix

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>

* Removed exec from fetch_repos

* Copyright typo fix

---------

Co-authored-by: Thomas Keller <thomaskeller@microsoft.com>
Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>
* add custom onnxscript branch

* Add TODO for reconciling onnxscript dependencies

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: Thomas Keller <tkeller787@gmail.com>
* Initial attempt at docker build action

* Added branch name to action

* PR & weekly tests for dev/ci-actions

* Added self-hosted runner

* Adjusted runs-on label

* path fix

* Added debug to orient pwd

* Added pytest keyword through run-docker.sh

* Fixed license path

* Updated upload-artifats to v4

* Reorganize bert demo for github action

* Updated run-docker CLI args

* Added e2e test to actions

* Removed build artifacts

* Fix ci.yml run-docker statement

* Removed "push" trigger

* Merge with develop changes and add num workers env variable

* Re-added push trigger for testing

* Fix merge

* Temporarily disabled docker and pytest for e2e validation

* Fix BSMITH_BUILD_DIR env variable

* Remove push trigger, since PR trigger is sufficient

* Remove tesing branches and triggers for PR

* Remove auto-gen docs

* Delete demos/bert/configs/l1_simd12_pe8.json

Removed extraneous config from test

---------

Co-authored-by: Ubuntu <azureuser@brainsmith-dev2.woh15gx5mv0exiu0m5xe0hjytg.dx.internal.cloudapp.net>
* add custom onnxscript branch

* fix torch error

* readd todo

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
* fix formatting with copilot

* fix dynamic matmul config when sizing is not divisble by 3

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
…me (#31)

* fix argparse arg that could never be false

* update fifosizing arg in hw compiler to match new argument name

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
* Added cleanup steps and job

* Made num_default_worker env variable
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
jsmonson and others added 23 commits June 2, 2025 09:12
* set to a fixed commit #

* moved up to previous latest commit

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
* Debugging ckpt 0

* Fucntional parser

* Organized docs

* Fix interface docs name

* Functional interface, broke parser, debugging

* Debug ckpt 0

* Debug ckpt 1 -- functional width parsing

* Debug ckpt 2

* rtl_parser test suite passing

* All pytests passing

* parser.py audit

* Refactoring parser.py

* Removed old tests

* Organzied docs & logs

* Cleanup interface files

* Added license header to tests

* Updated readme

* Improved docstrings, combined interface-types+data

* Updated readme

* Add md type to convo log

* Initial RTL template generation

* HKG test passes

* Improve AXI detection resiliency

* Debug ckpt 0

* Functional RTL Template generation

* Initial structure

* Initial debugg ckpt

* Cleanup & streamlining pragma & interface code

* test_rtl_parser core

* Partial interface refactor

* rtl_parser test suite fully passing

* Begin HWCOp implementation

* Fix onnxscript dependencies

* Removed test artifacts

* RTL parser readme & comment cleanup, initial layout detector

* Test file cleanup

* RTL parser test suite clean-up & refactor

* Cleaned up placeholders

* Consolidated LLM artifacts to docs/rtl_parser

* Cleaned up old examples

* Removed duplicate, outdated test

* Removed layout files, fixed license headers

* Added HKG readme
* set to a fixed commit #

* add bert large single layer test

* moved up to previous latest commit

* reduce folding config

* update folding parameter to account for absence of pTranspose

* add bi-weekly bert-large single layer ci test.

---------

Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
Comprehensive modernization of Brainsmith's Docker workflow and GitHub Actions CI system, introducing persistent container management and modular action architecture that reduces code duplication by 75% while achieving 73% performance improvements for multi-command workflows.
* Convert pyxsi commands to finnxsi

* [bert-folding] Adjust folding script to correctly set SIMD and PE for dynamic MVAUs

* [Deps] Reset qonnx url to main repo

* Reset finn commit to custom/transformer
  Complete architectural overhaul introducing:
  - Plugin system for extensible kernels, transforms, and build steps
  - Blueprint YAML interface for declarative design space configuration
  - Segment-based execution tree for efficient DSE with computation reuse
  - Unified `smithy` CLI replacing run-docker.sh with improved container management
  - Reorganized module structure: custom_op → kernels, transformation → transforms
  - New core modules for design parsing, DSE runners, and framework adapters
  - Modular GitHub Actions workflows replacing monolithic CI

  Breaking changes:
  - Module paths changed (e.g., brainsmith.custom_op → brainsmith.kernels)
  - Docker workflow now uses ./smithy instead of ./run-docker.sh
  - Configuration format migrated from Python to YAML blueprints
  - Renamed demos/ to examples/ following standard conventions

  This refactor establishes the foundation for planned features including
  multi-layer offload, parallelized tree execution, and automated kernel
  integration while maintaining backward compatibility for the BERT demo.
 ## Brainsmith v0.1.0a1 - Initial Closed Alpha Release

### Overview
Initial pre-release of Brainsmith, an advanced FPGA accelerator design automation framework co-developed by Microsoft and AMD. This release enables automated conversion of PyTorch neural networks to optimized FPGA implementations with intelligent design space exploration.

  ### Key Features
  - **Plugin Architecture**: Extensible framework for custom kernels, transforms, and build steps
  - **Blueprint System**: YAML-based declarative configuration with inheritance support
  - **Segment-based DSE**: Efficient design space exploration with computation reuse
  - **End-to-End BERT Demo**: Complete workflow from PyTorch to synthesizable RTL

  ### Major Components Added
  - BERT-Large support with dynamic matmul configurations
  - Continuous integration testing framework
  - Custom ONNX operators for Brainsmith domain
  - Docker-based development environment with Vivado 2024.2

  ### Known Limitations (Pre-Release)
  - FIFO sizing performance (>90% runtime) - optimization planned
  - Multi-layer offload for larger models - coming soon
  - Parallelized DSE execution - in development

  ### Breaking Changes
  This is a pre-release with no compatibility guarantees. The API and blueprint schema may change in future releases.

  ### Documentation
  - Complete blueprint schema documentation
  - BERT acceleration example with step-by-step guide
  - Plugin development guidelines
  - Docker-based quickstart guide

  ---
  *Note: This is a closed alpha release for partner evaluation and feedback. Production use is not recommended.*
Co-authored-by: auphelia <jakobapk@web.de>
Co-authored-by: Joshua Monson <joshmonson@microsoft.com>
* File transfer from experimental/hwkg

* Remove old hwkg, fix transform imports

* feat(dataflow): add InterfaceType and shape expression types

- Move InterfaceType enum from kernel_integrator to dataflow as fundamental type
- Add ShapeExpr and ShapeSpec type aliases for unified shape expressions
- Maintains all protocol and helper methods from original implementation

* refactor(kernel_integrator): use InterfaceType from dataflow

- Update all imports to use InterfaceType from brainsmith.core.dataflow.types
- Remove InterfaceType definition from kernel_integrator/data.py
- Fix BaseDataType import to come from qonnx
- All tests passing with no circular dependencies

* feat(kernel_integrator): implement new type structure

- Create types/ directory with modular type system
- Implement core types: PortDirection, DatatypeSpec, DimensionSpec
- Implement RTL types: Port, Parameter, ParsedModule, ValidationResult
- Implement metadata types: InterfaceMetadata, KernelMetadata
- Implement generation types: GeneratedFile, GenerationContext, GenerationResult
- Implement binding types: IOSpec, AttributeBinding, CodegenBinding
- Implement config types with validation and path helpers
- All types use dataflow InterfaceType and ShapeSpec for consistency
- No circular dependencies in new type structure

* feat(kernel_integrator): complete Phase 2 type structure implementation

- Add missing types to match original rtl_data.py (ProtocolValidationResult)
- Update Parameter class with all original fields (param_type, template_param_name)
- Update Port class to match original (width as string, description field)
- Update PortGroup class with interface_type and proper structure
- Create rtl_data.py as compatibility shim with deprecation warning
- All parser integration tests passing (23/23)
- Next: Phase 3 - Migrate existing code to use new types

* refactor(kernel_integrator): complete Phase 3 - remove compatibility shim

- Update all imports to use new type modules directly
- Remove rtl_data.py compatibility shim completely
- Update all test imports to new type locations
- Fix remaining imports from data.py to use dataflow types
- Add GenerationValidationResult to generation types
- All parser integration tests passing (23/23)
- Zero imports from old rtl_data or data modules

Arete.

* refactor(kernel_integrator): restructure type system with unified constraint model

    Replace fragmented type modules (config, data, metadata, core) with
    streamlined constraint_builder and converters modules. Update all imports
    and tests to use new type structure. Add comprehensive API reference and
    migration guide documentation.

* Update demo1 for type refactor, remove old demos

* refactor(kernel_integrator): remove parameter whitelist system

  - Delete parameter_config module and whitelist defaults
  - Simplify Parameter class with inline validation
  - Remove whitelist checks from context generation
  - Clean up migration guides and recommendations
  - Remove parameter resolution utilities

* Remove outdated functions

* Remove unused Parameter fields

* refactor(kernel_integrator): remove codegen binding layer and simplify context generation

  - Remove codegen_binding.py and codegen_binding_generator.py
  - Simplify context_generator.py to use KernelMetadata directly
  - Add categorized interface properties to KernelMetadata
  - Update parameter handling throughout the codebase
  - Streamline template context generation

* v2 generator system with direct template rendering

  Add new v2 generators that render templates directly from KernelMetadata
  without intermediate transformations. Includes base generator, HW custom op,
  RTL backend, and RTL wrapper generators with comprehensive test coverage.

* simplify template context by passing metadata directly

    - Add computed properties to KernelMetadata for common access patterns
    - Update all v2 generators to pass only kernel_metadata to templates
    - Modify templates to access metadata properties directly
    - Remove redundant variable extraction logic from generators

* refactor: replace interface_scanner with kernel_builder and enhance module extraction

- Add new KernelBuilder class to coordinate module extraction and interface building
- Extend ModuleExtractor with direct KernelMetadata building capabilities
- Move interface scanning functionality into InterfaceBuilder
- Simplify pragma handling and remove unused rules functionality
- Update tests to reflect architectural changes
- Remove type annotations from KernelMetadata fields

* refactor: simplify metadata structures and consolidate type system

  - Remove unused DatatypeParameters and parameter linkage fields from metadata
  - Add MutableMapping interface to InterfaceMetadata and PortGroup
  - Consolidate Direction enums and remove duplicate PortDirection
  - Simplify interface builder by removing complex parameter tracking
  - Clean up protocol validation and reduce overall code complexity

* refactor: unify parameter types and consolidate interface building

  - Remove separate InterfaceBuilder module and integrate into KernelBuilder
  - Rename interface metadata classes (Interface -> InterfaceMetadata subclasses)
  - Simplify ParsedModule to contain structured data instead of validation results
  - Remove unused validation types (ValidationError, ValidationResult, PortGroup)
  - Consolidate protocol scanning and interface building into single workflow

* refactor: simplify parameter representation and pragma application

  - Remove source_detail, interface_name, and category from Parameter class
  - Add kernel_value field for ALIAS nodeattr names and DERIVED expressions
  - Change pragma application from interface-level to kernel-level
  - Add linked_parameters list to KernelMetadata for DERIVED parameters
  - Simplify nodeattr_name property to use kernel_value when available

* refactor: consolidate parameter linking, remove old artifacts

  - Remove 9 obsolete analysis and documentation files from _artifacts
  - Restructure parameter_linker.py to use modular pattern-based approach
  - Add DatatypeParameters dataclass to metadata.py for structured parameter access
  - Update pragma handlers and parser to work with consolidated parameter system
  - Fix import paths and remove references to deleted DatatypeMetadata
  - Update tests to match new parameter handling structure

* feat: add direct generation mode with simplified template system

  - Add direct_autohwcustomop generator for streamlined code generation
  - Add three new templates: autohwcustomop_direct, hw_custom_op-EXPERIMENT, rtl_wrapper_direct
  - Update parser to remove unused source_name parameter from kernel_builder
  - Add has_any() method to DatatypeParameters for checking parameter presence
  - Add needs_nodeattr property to Parameter for determining node attribute requirements
  - Reorganize AXIStreamMetadata field ordering for clarity

* refactor: complete kernel_integrator architecture simplification

  - Remove multi-file generator system in favor of single generator.py
  - Delete 20+ generator/type modules and complex inheritance hierarchies
  - Consolidate types into metadata.py and rtl_parser/types.py
  - Rename templates for clarity (autohwcustomop_direct -> auto_hw_custom_op)
  - Move metadata.py to top level from types/ subdirectory
  - Update all imports and module references throughout rtl_parser

* refactor: simplify kernel_integrator and add dataflow documentation

  - Remove outdated artifacts and analysis files
  - Simplify kernel_integrator module structure
  - Add validation and info modes to CLI
  - Add RTL parser pragma reference documentation

* refactor: test KI w/ thresholding and add inference transform gen

  - Add new test suite with infer_transform, node_comparison, and rtl_generation tests
  - Implement FINN patch functions for threshold operations
  - Add inference-time transform module for thresholding
  - Update kernel_integrator with enhanced pragma parsing and metadata generation
  - Add infer_transform.py.j2 template for generating inference transforms
  - Refactor RTL templates to support new parameter handling
  - Add smithy script logging enhancements, added "kernel" flag

* refactor: move example kernels to dedicated examples directory

  - Remove fmpadding, mvu, and thresholding kernels from brainsmith/kernels
  - Create examples/kernel_integrator with thresholding as reference implementation
  - Add source pragma support to RTL parser for external file references
  - Update templates to support modular kernel integration
  - Remove kernel-specific inference transforms from transforms/core
  - Add comparison tests between FINN and Brainsmith implementations

* Cleanup docs and outdated tests

* Removed temp demos

* [Deps] Update and fix finn and qonnx deps (#50)

* refactor: consolidate dataflow interfaces and archive outdated docs

  - Extract base interface from dataflow base module
  - Update input/output interfaces to use new base interface
  - Move all legacy documentation to archive directory
  - Replace fragmented kernel docs with unified hardware_kernels_and_dataflow_modeling guide
  - Add new dataflow diagrams illustrating chunking and kernel architecture
  - Remove deprecated infer_auto_hw_custom_op transform
  - Update examples to use new dataflow interface structure

* Remove wip docs

* Remove deprecated core import

---------

Co-authored-by: auphelia <56755897+auphelia@users.noreply.github.com>
Bring Develop Back Inline with main
# Add core integration test suite and documentation improvements

## Summary
- Adds integration tests for core DSE functionality with fixture-based architecture
- Enhances documentation with summary/guidance readme and additional docs
- Simplifies BERT demo with blueprint-based configuration
- Fixes critical bugs in parser API and plugin registry

## Core Changes

### Bug Fixes & API Improvements
- Refactored `BlueprintParser` class to standalone `parse_blueprint()` function
- Fixed plugin registry lazy loading issues in `find()` and `all()` methods
- Corrected DSE tree node counting and depth calculation to include root node
- Enhanced registry reset to properly clear initialization state

### Testing Infrastructure
- Integration tests for blueprint parsing with inheritance and step operations
- DSE execution tests validating tree construction and artifact sharing
- Plugin system tests covering registration, discovery, and framework integration
- Mock plugin ecosystem with test kernels, transforms, and build steps

### Documentation
- Dataflow modeling guide explaining kernel abstractions and tiling strategies
- Plugin registry documentation with examples and best practices
- BERT example README with quickstart instructions
- Improved design space exploration docs with clearer examples

### BERT Example Improvements
- Replaced CLI arguments with blueprint-driven configuration
- Added `bert_quicktest.yaml` for rapid testing (single layer, minimal folding)
- Quicktest blueprint extends base with test-specific overrides
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants