Analog Model Training Taking Longer in AIHWKIT, is this Normal? #695

adnanrana88 · 2024-10-22T12:12:08Z

adnanrana88
Oct 22, 2024

Hi,
I am working with a 2-layer MLP model and using IBM AIHWKIT for Hardware-Aware (HWA) training. While training a digital model in PyTorch, I reach around 98% train accuracy and 91% validation accuracy in about 150 epochs.
However, after converting to an analog model, it takes nearly 1000 epochs to achieve similar accuracy. Sometimes, I notice that fewer epochs are needed in HWA, but in general, it takes significantly longer.

• My RPU configuration seems correct for both loading and HWA training.
• Is this longer training time typical for HWA? Does training directly in analog (without starting with a digital model) make a difference or correct instead of using the digital?

Also, is it always necessary to start with a digital model before converting to analog, or can I train directly in analog from the start? What would you recommend?

Thank you for your suggestions!

kaoutar55 · 2024-10-23T15:43:33Z

kaoutar55
Oct 23, 2024
Maintainer

Can you share the configuration you used for the Analog HWA. This is not a normal behavior. It might be an issue for the way you configured the experiment. Please look at the example we used: https://github.com/IBM/aihwkit/blob/master/examples/06_lenet5_hardware_aware.py

1 reply

adnanrana88 Oct 23, 2024
Author

Thanks for your insights!

I have resolved the issue with longer training epochs in my analog model using AIHWKIT. For small networks, is it necessary to use pretrained digital weights before converting to analog? Does it primarily serve as weight initialization, or is there a reason we can not train the entire model directly in analog from the start?

Would skipping the digital pretraining significantly impact the models performance or training efficiency, especially in handling non-idealities? What is generally recommended for smaller networks?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analog Model Training Taking Longer in AIHWKIT, is this Normal? #695

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Analog Model Training Taking Longer in AIHWKIT, is this Normal? #695

adnanrana88 Oct 22, 2024

Replies: 1 comment · 1 reply

kaoutar55 Oct 23, 2024 Maintainer

adnanrana88 Oct 23, 2024 Author

adnanrana88
Oct 22, 2024

Replies: 1 comment 1 reply

kaoutar55
Oct 23, 2024
Maintainer

adnanrana88 Oct 23, 2024
Author