Analog Model Training Taking Longer in AIHWKIT, is this Normal? #695
Unanswered
adnanrana88
asked this question in
General
Replies: 1 comment 1 reply
-
Can you share the configuration you used for the Analog HWA. This is not a normal behavior. It might be an issue for the way you configured the experiment. Please look at the example we used: https://github.com/IBM/aihwkit/blob/master/examples/06_lenet5_hardware_aware.py |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am working with a 2-layer MLP model and using IBM AIHWKIT for Hardware-Aware (HWA) training. While training a digital model in PyTorch, I reach around 98% train accuracy and 91% validation accuracy in about 150 epochs.
However, after converting to an analog model, it takes nearly 1000 epochs to achieve similar accuracy. Sometimes, I notice that fewer epochs are needed in HWA, but in general, it takes significantly longer.
• My RPU configuration seems correct for both loading and HWA training.
• Is this longer training time typical for HWA? Does training directly in analog (without starting with a digital model) make a difference or correct instead of using the digital?
Also, is it always necessary to start with a digital model before converting to analog, or can I train directly in analog from the start? What would you recommend?
Thank you for your suggestions!
Beta Was this translation helpful? Give feedback.
All reactions