-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[onert] Fix loss value difference #13736
Comments
The Loss issue occurs in Adam + SGD as well as Adam + CCE.
|
It's a sad news 😢 |
Adam + CCE
Epoch 1/5
100/100 [==============================] - 0s 813us/step - loss: 7.7064 - categorical_accuracy: 0.2010
Epoch 2/5
100/100 [==============================] - 0s 728us/step - loss: 8.7290 - categorical_accuracy: 0.2060
Epoch 3/5
100/100 [==============================] - 0s 695us/step - loss: 9.1582 - categorical_accuracy: 0.1720
Epoch 4/5
100/100 [==============================] - 0s 736us/step - loss: 9.2479 - categorical_accuracy: 0.1420
Epoch 5/5
100/100 [==============================] - 0s 727us/step - loss: 9.1316 - categorical_accuracy: 0.1240
==========================
Total time: 0.5966
$ ./Product/x86_64-linux.release/out/bin/onert_train --loss 2 --optimizer 1 --loss_reduction_type 1 --learning_rate 0.001 --batch_size 10 --num_of_trainable_ops -1 --load_expected:raw out/train.output.1000.bin --load_input:raw out/train.input.1000.bin --metric 0 model.circle
Model Filename model.circle
== training parameter ==
- learning_rate = 0.001
- batch_size = 10
- loss_info = {loss = categorical crossentropy, reduction = sum over batch size}
- optimizer = sgd
- num_of_trainable_ops = -1
========================
Epoch 1/5 - time: 0.488ms/step - loss: [0] nan - categorical_accuracy: [0] 0.0860
Epoch 2/5 - time: 0.460ms/step - loss: [0] nan - categorical_accuracy: [0] 0.0860
Epoch 3/5 - time: 0.527ms/step - loss: [0] nan - categorical_accuracy: [0] 0.0860
Epoch 4/5 - time: 0.437ms/step - loss: [0] nan - categorical_accuracy: [0] 0.0860
Epoch 5/5 - time: 0.452ms/step - loss: [0] nan - categorical_accuracy: [0] 0.0860
===================================
MODEL_LOAD takes 0.3240 ms
PREPARE takes 2.0710 ms
EXECUTE takes 241.4750 ms
- Epoch 1 takes 48.8490 ms
- Epoch 2 takes 46.0450 ms
- Epoch 3 takes 52.7080 ms
- Epoch 4 takes 43.6880 ms
- Epoch 5 takes 45.1500 ms
=================================== Adam + CCE
Epoch 1/5
100/100 [==============================] - 0s 919us/step - loss: 9.4015 - categorical_accuracy: 0.1940
Epoch 2/5
100/100 [==============================] - 0s 866us/step - loss: 9.6065 - categorical_accuracy: 0.2080
Epoch 3/5
100/100 [==============================] - 0s 880us/step - loss: 9.6064 - categorical_accuracy: 0.2090
Epoch 4/5
100/100 [==============================] - 0s 868us/step - loss: 9.6064 - categorical_accuracy: 0.2090
Epoch 5/5
100/100 [==============================] - 0s 844us/step - loss: 9.6064 - categorical_accuracy: 0.2090
==========================
Total time: 0.6848
$ ./Product/x86_64-linux.release/out/bin/onert_train --loss 2 --optimizer 2 --loss_reducti
on_type 1 --learning_rate 0.001 --batch_size 10 --num_of_trainable_ops -1 --load_expected:raw out/train.output.1000.bin --l
oad_input:raw out/train.input.1000.bin --metric 0 model.circle
Model Filename model.circle
== training parameter ==
- learning_rate = 0.001
- batch_size = 10
- loss_info = {loss = categorical crossentropy, reduction = sum over batch size}
- optimizer = adam
- num_of_trainable_ops = -1
========================
Epoch 1/5 - time: 0.701ms/step - loss: [0] nan - categorical_accuracy: [0] 0.0920
Epoch 2/5 - time: 0.608ms/step - loss: [0] nan - categorical_accuracy: [0] 0.0920
Epoch 3/5 - time: 0.610ms/step - loss: [0] nan - categorical_accuracy: [0] 0.0920
Epoch 4/5 - time: 0.612ms/step - loss: [0] nan - categorical_accuracy: [0] 0.0920
Epoch 5/5 - time: 0.602ms/step - loss: [0] nan - categorical_accuracy: [0] 0.0920
===================================
MODEL_LOAD takes 0.8570 ms
PREPARE takes 4.3150 ms
EXECUTE takes 320.9320 ms
- Epoch 1 takes 70.1430 ms
- Epoch 2 takes 60.7580 ms
- Epoch 3 takes 60.9560 ms
- Epoch 4 takes 61.2370 ms
- Epoch 5 takes 60.1600 ms
=================================== |
SGD + CCE
Epoch 1/5
100/100 [==============================] - 0s 781us/step - loss: 2.3559 - categorical_accuracy: 0.2210
Epoch 2/5
100/100 [==============================] - 0s 714us/step - loss: 1.8817 - categorical_accuracy: 0.3970
Epoch 3/5
100/100 [==============================] - 0s 720us/step - loss: 1.6417 - categorical_accuracy: 0.4920
Epoch 4/5
100/100 [==============================] - 0s 652us/step - loss: 1.4734 - categorical_accuracy: 0.5630
Epoch 5/5
100/100 [==============================] - 0s 611us/step - loss: 1.3445 - categorical_accuracy: 0.6000
==========================
Total time: 0.5747
$ ./Product/x86_64-linux.release/out/bin/onert_train --loss 2 --optimizer 1 --loss_reducti
on_type 1 --learning_rate 0.001 --batch_size 10 --num_of_trainable_ops -1 --load_expected:raw out/train.output.1000.bin --l
oad_input:raw out/train.input.1000.bin --metric 0 model.circle
Model Filename model.circle
== training parameter ==
- learning_rate = 0.001
- batch_size = 10
- loss_info = {loss = categorical crossentropy, reduction = sum over batch size}
- optimizer = sgd
- num_of_trainable_ops = -1
========================
Epoch 1/5 - time: 0.480ms/step - loss: [0] 1.5273 - categorical_accuracy: [0] 0.1010
Epoch 2/5 - time: 0.480ms/step - loss: [0] 0.9308 - categorical_accuracy: [0] 0.1010
Epoch 3/5 - time: 0.463ms/step - loss: [0] 0.7696 - categorical_accuracy: [0] 0.1010
Epoch 4/5 - time: 0.436ms/step - loss: [0] 0.6822 - categorical_accuracy: [0] 0.1010
Epoch 5/5 - time: 0.451ms/step - loss: [0] 0.6233 - categorical_accuracy: [0] 0.1010
===================================
MODEL_LOAD takes 0.3980 ms
PREPARE takes 2.2420 ms
EXECUTE takes 235.7700 ms
- Epoch 1 takes 47.9630 ms
- Epoch 2 takes 48.0220 ms
- Epoch 3 takes 46.3070 ms
- Epoch 4 takes 43.6420 ms
- Epoch 5 takes 45.0810 ms
=================================== Adam + CCE
Epoch 1/5
100/100 [==============================] - 0s 901us/step - loss: 1.0923 - categorical_accuracy: 0.5950
Epoch 2/5
100/100 [==============================] - 0s 831us/step - loss: 0.6341 - categorical_accuracy: 0.7810
Epoch 3/5
100/100 [==============================] - 0s 823us/step - loss: 0.5396 - categorical_accuracy: 0.8160
Epoch 4/5
100/100 [==============================] - 0s 812us/step - loss: 0.4943 - categorical_accuracy: 0.8130
Epoch 5/5
100/100 [==============================] - 0s 859us/step - loss: 0.4609 - categorical_accuracy: 0.8230
==========================
Total time: 0.6680
$ ./Product/x86_64-linux.release/out/bin/onert_train --loss 2 --optimizer 2 --loss_reducti
on_type 1 --learning_rate 0.001 --batch_size 10 --num_of_trainable_ops -1 --load_expected:raw out/train.output.1000.bin --l
oad_input:raw out/train.input.1000.bin --metric 0 model.circle
Model Filename model.circle
== training parameter ==
- learning_rate = 0.001
- batch_size = 10
- loss_info = {loss = categorical crossentropy, reduction = sum over batch size}
- optimizer = adam
- num_of_trainable_ops = -1
========================
Epoch 1/5 - time: 0.627ms/step - loss: [0] 1.0886 - categorical_accuracy: [0] 0.0850
Epoch 2/5 - time: 0.612ms/step - loss: [0] 0.6423 - categorical_accuracy: [0] 0.0850
Epoch 3/5 - time: 0.618ms/step - loss: [0] 0.5578 - categorical_accuracy: [0] 0.0850
Epoch 4/5 - time: 0.605ms/step - loss: [0] 0.5004 - categorical_accuracy: [0] 0.0850
Epoch 5/5 - time: 0.601ms/step - loss: [0] 0.4618 - categorical_accuracy: [0] 0.0850
===================================
MODEL_LOAD takes 0.3410 ms
PREPARE takes 2.4010 ms
EXECUTE takes 312.7880 ms
- Epoch 1 takes 62.7390 ms
- Epoch 2 takes 61.1770 ms
- Epoch 3 takes 61.8160 ms
- Epoch 4 takes 60.4690 ms
- Epoch 5 takes 60.0820 ms
=================================== |
In the draft, the model without softmax works well by introducing the kernel containing running softmax. Epoch 1/5
1000/1000 [==============================] - 1s 516us/step - loss: 1.0897 - categorical_accuracy: 0.6000
Epoch 2/5
1000/1000 [==============================] - 1s 503us/step - loss: 0.6761 - categorical_accuracy: 0.7450
Epoch 3/5
1000/1000 [==============================] - 1s 500us/step - loss: 0.5629 - categorical_accuracy: 0.7920
Epoch 4/5
1000/1000 [==============================] - 1s 503us/step - loss: 0.4945 - categorical_accuracy: 0.8270
Epoch 5/5
1000/1000 [==============================] - 1s 504us/step - loss: 0.4447 - categorical_accuracy: 0.8380
==========================
Total time: 2.7670 $ ./Product/x86_64-linux.release/out/bin/onert_train --loss 2 --optimizer 2 --loss_reduction_type 1 --learning_rate 0.001 --batch_size 1 --num_of_trainable_ops -1 --load_expected:raw out/train.output.1000.bin --load_input:raw out/train.input.1000.bin --metric 0 model.circle
Model Filename model.circle
== training parameter ==
- learning_rate = 0.001
- batch_size = 1
- loss_info = {loss = categorical crossentropy, reduction = sum over batch size}
- optimizer = adam
- num_of_trainable_ops = -1
========================
Epoch 1/5 - time: 0.047ms/step - loss: [0] 1.0897 - categorical_accuracy: [0] 0.0950
Epoch 2/5 - time: 0.044ms/step - loss: [0] 0.6761 - categorical_accuracy: [0] 0.0950
Epoch 3/5 - time: 0.044ms/step - loss: [0] 0.5628 - categorical_accuracy: [0] 0.0950
Epoch 4/5 - time: 0.044ms/step - loss: [0] 0.4945 - categorical_accuracy: [0] 0.0950
Epoch 5/5 - time: 0.050ms/step - loss: [0] 0.4447 - categorical_accuracy: [0] 0.0950
===================================
MODEL_LOAD takes 0.1380 ms
PREPARE takes 1.4940 ms
EXECUTE takes 257.7260 ms
- Epoch 1 takes 46.8360 ms
- Epoch 2 takes 43.6010 ms
- Epoch 3 takes 44.4730 ms
- Epoch 4 takes 44.1270 ms
- Epoch 5 takes 49.8270 ms
=================================== |
Applying #13944, loss values are the same as tensorflow results when using mse.
Epoch 1/5
100/100 [==============================] - 0s 545us/step - loss: 0.2282 - categorical_accuracy: 0.1270
Epoch 2/5
100/100 [==============================] - 0s 598us/step - loss: 0.1866 - categorical_accuracy: 0.1560
Epoch 3/5
100/100 [==============================] - 0s 552us/step - loss: 0.1752 - categorical_accuracy: 0.1720
Epoch 4/5
100/100 [==============================] - 0s 478us/step - loss: 0.1663 - categorical_accuracy: 0.1850
Epoch 5/5
100/100 [==============================] - 0s 483us/step - loss: 0.1590 - categorical_accuracy: 0.1910
==========================
Total time: 0.5044 $ ./Product/x86_64-linux.debug/out/bin/onert_train --loss 1 --optimizer 1 --loss_reduction_type 1 --learn
ing_rate 0.001 --batch_size 10 --num_of_trainable_ops -1 --load_expected:raw out/train.output.1000.bin --load_input:raw out/train.input.10
00.bin --metric 0 model.circle
Model Filename model.circle
== training parameter ==
- learning_rate = 0.001
- batch_size = 10
- loss_info = {loss = mean squared error, reduction = sum over batch size}
- optimizer = sgd
- num_of_trainable_ops = -1
========================
Epoch 1/5 - time: 1.354ms/step - loss: [0] 0.2282 - categorical_accuracy: [0] 0.1040
Epoch 2/5 - time: 1.242ms/step - loss: [0] 0.1866 - categorical_accuracy: [0] 0.1040
Epoch 3/5 - time: 1.292ms/step - loss: [0] 0.1752 - categorical_accuracy: [0] 0.1040
Epoch 4/5 - time: 1.295ms/step - loss: [0] 0.1663 - categorical_accuracy: [0] 0.1040
Epoch 5/5 - time: 1.290ms/step - loss: [0] 0.1590 - categorical_accuracy: [0] 0.1040
===================================
MODEL_LOAD takes 1.4580 ms
PREPARE takes 18.8870 ms
EXECUTE takes 661.0470 ms
- Epoch 1 takes 135.3970 ms
- Epoch 2 takes 124.2470 ms
- Epoch 3 takes 129.2360 ms
- Epoch 4 takes 129.4980 ms
- Epoch 5 takes 128.9610 ms
===================================
Epoch 1/5
100/100 [==============================] - 0s 653us/step - loss: 1.7125 - categorical_accuracy: 0.3630
Epoch 2/5
100/100 [==============================] - 0s 604us/step - loss: 1.0092 - categorical_accuracy: 0.5570
Epoch 3/5
100/100 [==============================] - 0s 586us/step - loss: 0.8466 - categorical_accuracy: 0.6260
Epoch 4/5
100/100 [==============================] - 0s 579us/step - loss: 0.7470 - categorical_accuracy: 0.6750
Epoch 5/5
100/100 [==============================] - 0s 599us/step - loss: 0.6754 - categorical_accuracy: 0.7040
==========================
Total time: 0.5538 $ ./Product/x86_64-linux.debug/out/bin/onert_train --loss 1 --optimizer 2 --loss_reduction_type 2 --lea
rning_rate 0.001 --batch_size 10 --num_of_trainable_ops -1 --load_expected:raw out/train.output.1000.bin --load_input:raw out/train.inpu
t.1000.bin --metric 0 model.circle
Model Filename model.circle
== training parameter ==
- learning_rate = 0.001
- batch_size = 10
- loss_info = {loss = mean squared error, reduction = sum}
- optimizer = adam
- num_of_trainable_ops = -1
========================
Epoch 1/5 - time: 1.519ms/step - loss: [0] 1.7125 - categorical_accuracy: [0] 0.0890
Epoch 2/5 - time: 1.500ms/step - loss: [0] 1.0092 - categorical_accuracy: [0] 0.0890
Epoch 3/5 - time: 2.238ms/step - loss: [0] 0.8466 - categorical_accuracy: [0] 0.0890
Epoch 4/5 - time: 1.749ms/step - loss: [0] 0.7470 - categorical_accuracy: [0] 0.0890
Epoch 5/5 - time: 1.540ms/step - loss: [0] 0.6754 - categorical_accuracy: [0] 0.0890
===================================
MODEL_LOAD takes 1.5080 ms
PREPARE takes 15.0090 ms
EXECUTE takes 875.6840 ms
- Epoch 1 takes 151.8760 ms
- Epoch 2 takes 150.0400 ms
- Epoch 3 takes 223.8330 ms
- Epoch 4 takes 174.8910 ms
- Epoch 5 takes 154.0180 ms
=================================== |
To apply normalization(softmax) automatically to categorical cross entropy, we need to consider that sum of labels is not 1. |
What
Let's fix loss values difference from tensorflow in some cases
Why
We found out that loss values were different from tensorflow when using the two combinations(Adam optimizer and categorical cross entropy) in branching models such as the model below:
Since MobileNet v2 is also branched, the model's loss values are different.
From @jyoungyun
Required tasks
CategoricalCrossentropy
.batch_size != 1
and using the reduction type,sum
and the loss type,mse
.batch_size != 1
and using the reduction type,sum_over_batch_size
andCategoricalCrossentropy
.Draft #13934
The text was updated successfully, but these errors were encountered: