Feature/generalized-data-width-converter #144
Open
+128
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This companion PR to the 'soon-to-be-PR'd' https://github.com/lstasytis/finn/tree/feature/generalized-datawidthconverter branch introduces a new variant for the StreamingDataWidthConverter_Batch (DWC), called StreamingDataWidthConverterGeneralized_Batch which should eventually completely replace the old StreamingDataWidthConverter_Batch function from finn-hlslib.
The new DWC has two key improvements over the previous HLS version:
a.) Cases where the input and output streams have widths which are incompatible for use with the RTL variant (one cannot be divided by the other) will no longer result in an intermediate buffer of the size equaling the lowest common multiple (LCM) of the two widths.
Instead, a single intermediate buffer of size input width + output width is always generated.
This leads to the intermediate buffer never having an enormous size due to an extremely large LCM between widths, while also limiting the node to a single module instantiation, instead of 3. Thus, a potential >4K bit width intermediate axis data bus will not be generated (unless the input+output streams widths are >4K bit wide in total) which would have otherwise broke HLS.
b.) The node supports padding and cropping of the tail-end of the transactions being passed through the node with zeroes. This allows arbitrary padding of nodes in finn for relaxing folding factor constraints.
Architecture:
The node functions by using an intermediate shift-register-based buffer of size in width + out width. The input stream is addressed to the intermediate buffer using an offset variable which tracks how many elements are currently in the intermediate buffer. The output stream is tied to the right-most output stream width bits of the intermediate buffer. We also track the total number of input and output words which need to be processed by the DWC in a single transaction and either shift in zeroes (padding) or stop writing to the output stream (cropping) whenever we run out of either input words or output words relative to how many are assigned during compile time.
Downsides:
The architecture does not produce efficient HLS code due to the multiplexing of the input stream to the intermediate buffer leading to massive LUT use because a general IP core is instantiated by HLS for the task. The node is only more LUT-efficient versus the old HLS variant in cases where the intermediate buffer produced by the old DWC is 3-4x larger than the sum size of the input width and output width streams.
Improvements to be made:
An RTL variant for the DWC should eventually be pushed to finn-rtllib, at which point the old DWC can be retired entirely in favor of this current architecture.
Use of padding functionality:
Introducing padding to FINN nodes is extremely error-prone and so should be done carefully. The recommendation is to use the new generalized folding optimizer from the following branch: https://github.com/lstasytis/finn/tree/feature/set-folding-optimizer and allow it to use padding by setting the folding_maximum_padding dataflow builder argument to more than 0. The InsertDWC transformation will then insert DWCs which will potentially perform padding since the SetFolding() transformation will relax the stream shape restrictions with the assumption of DWCs performing the padding.
For a breakdown of padding restrictions in FINN nodes, refer to the code in the new SetFolding() transformation in the aformentioned branch.