Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed #53

MHX1203 opened this issue May 1, 2022 · 3 comments


Copy link

MHX1203 commented May 1, 2022

Following is the training log.

dnnlib: Running training.training_loop.training_loop() on localhost...
GPU available:  True
GPU devices:  /device:GPU:0
>>>>> Create Session
Dataset directory:  .
Streaming data using training.dataset.TFRecordDataset...
tfrecord_dir:  .\custom-images
Dataset shape = [1, 512, 512]
Dynamic range = [0, 255]
Label size    = 0
Constructing networks...

G                             Params    OutputShape         WeightShape     
---                           ---       ---                 ---             
latents_in                    -         (?, 512)            -               
labels_in                     -         (?, 0)              -               
lod                           -         ()                  -               
dlatent_avg                   -         (512,)              -               
G_mapping/latents_in          -         (?, 512)            -               
G_mapping/labels_in           -         (?, 0)              -               
G_mapping/PixelNorm           -         (?, 512)            -               
G_mapping/Dense0              262656    (?, 512)            (512, 512)      
G_mapping/Dense1              262656    (?, 512)            (512, 512)      
G_mapping/Dense2              262656    (?, 512)            (512, 512)      
G_mapping/Dense3              262656    (?, 512)            (512, 512)      
G_mapping/Dense4              262656    (?, 512)            (512, 512)      
G_mapping/Dense5              262656    (?, 512)            (512, 512)      
G_mapping/Dense6              262656    (?, 512)            (512, 512)      
G_mapping/Dense7              4202496   (?, 8192)           (512, 8192)     
G_mapping/Reshape             -         (?, 16, 512)        -               
G_mapping/dlatents_out        -         (?, 16, 512)        -               
Truncation                    -         (?, 16, 512)        -               
G_synthesis/dlatents_in       -         (?, 16, 512)        -               
G_synthesis/4x4/Const         534528    (?, 512, 4, 4)      (512,)          
G_synthesis/4x4/Conv          2885632   (?, 512, 4, 4)      (3, 3, 512, 512)
G_synthesis/ToRGB_lod7        513       (?, 1, 4, 4)        (1, 1, 512, 1)  
G_synthesis/8x8/Conv0_up      2885632   (?, 512, 8, 8)      (3, 3, 512, 512)
G_synthesis/8x8/Conv1         2885632   (?, 512, 8, 8)      (3, 3, 512, 512)
G_synthesis/ToRGB_lod6        513       (?, 1, 8, 8)        (1, 1, 512, 1)  
G_synthesis/Upscale2D         -         (?, 1, 8, 8)        -               
G_synthesis/Grow_lod6         -         (?, 1, 8, 8)        -               
G_synthesis/16x16/Conv0_up    2885632   (?, 512, 16, 16)    (3, 3, 512, 512)
G_synthesis/16x16/Conv1       2885632   (?, 512, 16, 16)    (3, 3, 512, 512)
G_synthesis/ToRGB_lod5        513       (?, 1, 16, 16)      (1, 1, 512, 1)  
G_synthesis/Upscale2D_1       -         (?, 1, 16, 16)      -               
G_synthesis/Grow_lod5         -         (?, 1, 16, 16)      -               
G_synthesis/32x32/Conv0_up    2885632   (?, 512, 32, 32)    (3, 3, 512, 512)
G_synthesis/32x32/Conv1       2885632   (?, 512, 32, 32)    (3, 3, 512, 512)
G_synthesis/ToRGB_lod4        513       (?, 1, 32, 32)      (1, 1, 512, 1)  
G_synthesis/Upscale2D_2       -         (?, 1, 32, 32)      -               
G_synthesis/Grow_lod4         -         (?, 1, 32, 32)      -               
G_synthesis/64x64/Conv0_up    1442816   (?, 256, 64, 64)    (3, 3, 512, 256)
G_synthesis/64x64/Conv1       852992    (?, 256, 64, 64)    (3, 3, 256, 256)
G_synthesis/ToRGB_lod3        257       (?, 1, 64, 64)      (1, 1, 256, 1)  
G_synthesis/Upscale2D_3       -         (?, 1, 64, 64)      -               
G_synthesis/Grow_lod3         -         (?, 1, 64, 64)      -               
G_synthesis/128x128/Conv0_up  426496    (?, 128, 128, 128)  (3, 3, 256, 128)
G_synthesis/128x128/Conv1     279040    (?, 128, 128, 128)  (3, 3, 128, 128)
G_synthesis/ToRGB_lod2        129       (?, 1, 128, 128)    (1, 1, 128, 1)  
G_synthesis/Upscale2D_4       -         (?, 1, 128, 128)    -               
G_synthesis/Grow_lod2         -         (?, 1, 128, 128)    -               
G_synthesis/256x256/Conv0_up  139520    (?, 64, 256, 256)   (3, 3, 128, 64) 
G_synthesis/256x256/Conv1     102656    (?, 64, 256, 256)   (3, 3, 64, 64)  
G_synthesis/ToRGB_lod1        65        (?, 1, 256, 256)    (1, 1, 64, 1)   
G_synthesis/Upscale2D_5       -         (?, 1, 256, 256)    -               
G_synthesis/Grow_lod1         -         (?, 1, 256, 256)    -               
G_synthesis/512x512/Conv0_up  51328     (?, 32, 512, 512)   (3, 3, 64, 32)  
G_synthesis/512x512/Conv1     42112     (?, 32, 512, 512)   (3, 3, 32, 32)  
G_synthesis/ToRGB_lod0        33        (?, 1, 512, 512)    (1, 1, 32, 1)   
G_synthesis/Upscale2D_6       -         (?, 1, 512, 512)    -               
G_synthesis/Grow_lod0         -         (?, 1, 512, 512)    -               
G_synthesis/images_out        -         (?, 1, 512, 512)    -               
G_synthesis/lod               -         ()                  -               
G_synthesis/noise0            -         (1, 1, 4, 4)        -               
G_synthesis/noise1            -         (1, 1, 4, 4)        -               
G_synthesis/noise2            -         (1, 1, 8, 8)        -               
G_synthesis/noise3            -         (1, 1, 8, 8)        -               
G_synthesis/noise4            -         (1, 1, 16, 16)      -               
G_synthesis/noise5            -         (1, 1, 16, 16)      -               
G_synthesis/noise6            -         (1, 1, 32, 32)      -               
G_synthesis/noise7            -         (1, 1, 32, 32)      -               
G_synthesis/noise8            -         (1, 1, 64, 64)      -               
G_synthesis/noise9            -         (1, 1, 64, 64)      -               
G_synthesis/noise10           -         (1, 1, 128, 128)    -               
G_synthesis/noise11           -         (1, 1, 128, 128)    -               
G_synthesis/noise12           -         (1, 1, 256, 256)    -               
G_synthesis/noise13           -         (1, 1, 256, 256)    -               
G_synthesis/noise14           -         (1, 1, 512, 512)    -               
G_synthesis/noise15           -         (1, 1, 512, 512)    -               
images_out                    -         (?, 1, 512, 512)    -               
---                           ---       ---                 ---             
Total                         30114536                                      

D                    Params    OutputShape         WeightShape     
---                  ---       ---                 ---             
images_in            -         (?, 1, 512, 512)    -               
labels_in            -         (?, 0)              -               
lod                  -         ()                  -               
FromRGB_lod0         64        (?, 32, 512, 512)   (1, 1, 1, 32)   
512x512/Conv0        9248      (?, 32, 512, 512)   (3, 3, 32, 32)  
512x512/Conv1_down   18496     (?, 64, 256, 256)   (3, 3, 32, 64)  
Downscale2D          -         (?, 1, 256, 256)    -               
FromRGB_lod1         128       (?, 64, 256, 256)   (1, 1, 1, 64)   
Grow_lod0            -         (?, 64, 256, 256)   -               
256x256/Conv0        36928     (?, 64, 256, 256)   (3, 3, 64, 64)  
256x256/Conv1_down   73856     (?, 128, 128, 128)  (3, 3, 64, 128) 
Downscale2D_1        -         (?, 1, 128, 128)    -               
FromRGB_lod2         256       (?, 128, 128, 128)  (1, 1, 1, 128)  
Grow_lod1            -         (?, 128, 128, 128)  -               
128x128/Conv0        147584    (?, 128, 128, 128)  (3, 3, 128, 128)
128x128/Conv1_down   295168    (?, 256, 64, 64)    (3, 3, 128, 256)
Downscale2D_2        -         (?, 1, 64, 64)      -               
FromRGB_lod3         512       (?, 256, 64, 64)    (1, 1, 1, 256)  
Grow_lod2            -         (?, 256, 64, 64)    -               
64x64/Conv0          590080    (?, 256, 64, 64)    (3, 3, 256, 256)
64x64/Conv1_down     1180160   (?, 512, 32, 32)    (3, 3, 256, 512)
Downscale2D_3        -         (?, 1, 32, 32)      -               
FromRGB_lod4         1024      (?, 512, 32, 32)    (1, 1, 1, 512)  
Grow_lod3            -         (?, 512, 32, 32)    -               
32x32/Conv0          2359808   (?, 512, 32, 32)    (3, 3, 512, 512)
32x32/Conv1_down     2359808   (?, 512, 16, 16)    (3, 3, 512, 512)
Downscale2D_4        -         (?, 1, 16, 16)      -               
FromRGB_lod5         1024      (?, 512, 16, 16)    (1, 1, 1, 512)  
Grow_lod4            -         (?, 512, 16, 16)    -               
16x16/Conv0          2359808   (?, 512, 16, 16)    (3, 3, 512, 512)
16x16/Conv1_down     2359808   (?, 512, 8, 8)      (3, 3, 512, 512)
Downscale2D_5        -         (?, 1, 8, 8)        -               
FromRGB_lod6         1024      (?, 512, 8, 8)      (1, 1, 1, 512)  
Grow_lod5            -         (?, 512, 8, 8)      -               
8x8/Conv0            2359808   (?, 512, 8, 8)      (3, 3, 512, 512)
8x8/Conv1_down       2359808   (?, 512, 4, 4)      (3, 3, 512, 512)
Downscale2D_6        -         (?, 1, 4, 4)        -               
FromRGB_lod7         1024      (?, 512, 4, 4)      (1, 1, 1, 512)  
Grow_lod6            -         (?, 512, 4, 4)      -               
4x4/MinibatchStddev  -         (?, 513, 4, 4)      -               
4x4/Conv             2364416   (?, 512, 4, 4)      (3, 3, 513, 512)
4x4/Dense0           4194816   (?, 512)            (8192, 512)     
4x4/Dense1           513       (?, 1)              (512, 1)        
scores_out           -         (?, 1)              -               
---                  ---       ---                 ---             
Total                23075169                                      

Building TensorFlow graph...
Setting up snapshot image grid...
Setting up run dir...

Traceback (most recent call last):
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\", line 1334, in _do_call
    return fn(*args)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\", line 1407, in _call_tf_sessionrun
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(16, 8192), b.shape=(16, 512), m=8192, n=512, k=16
	 [[{{node GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/MatMul_grad/MatMul_1}} = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](GPU0/D_loss/D_1/4x4/Dense0/Reshape, GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/add_grad/Reshape)]]
	 [[{{node TrainD/ApplyGrads0/UpdateWeights/cond/pred_id/_1585}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_24297_TrainD/ApplyGrads0/UpdateWeights/cond/pred_id", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "", line 193, in <module>
  File "", line 188, in main
  File "D:\dy\idinvert\dnnlib\submission\", line 290, in submit_run
  File "D:\dy\idinvert\dnnlib\submission\", line 242, in run_wrapper
    util.call_func_by_name(func_name=submit_config.run_func_name, submit_config=submit_config, **submit_config.run_func_kwargs)
  File "D:\dy\idinvert\dnnlib\", line 257, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "D:\dy\idinvert\training\", line 231, in training_loop[D_train_op, Gs_update_op], {lod_in: sched.lod, lrate_in: sched.D_lrate, minibatch_in: sched.minibatch})
  File "D:\dy\idinvert\dnnlib\tflib\", line 26, in run
    return tf.get_default_session().run(*args, **kwargs)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\", line 929, in run
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\", line 1328, in _do_run
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(16, 8192), b.shape=(16, 512), m=8192, n=512, k=16
	 [[node GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/MatMul_grad/MatMul_1 (defined at D:\dy\idinvert\dnnlib\tflib\  = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](GPU0/D_loss/D_1/4x4/Dense0/Reshape, GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/add_grad/Reshape)]]
	 [[{{node TrainD/ApplyGrads0/UpdateWeights/cond/pred_id/_1585}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_24297_TrainD/ApplyGrads0/UpdateWeights/cond/pred_id", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/MatMul_grad/MatMul_1', defined at:
  File "", line 193, in <module>
  File "", line 188, in main
  File "D:\dy\idinvert\dnnlib\submission\", line 290, in submit_run
  File "D:\dy\idinvert\dnnlib\submission\", line 242, in run_wrapper
    util.call_func_by_name(func_name=submit_config.run_func_name, submit_config=submit_config, **submit_config.run_func_kwargs)
  File "D:\dy\idinvert\dnnlib\", line 257, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "D:\dy\idinvert\training\", line 184, in training_loop
    D_opt.register_gradients(tf.reduce_mean(D_loss), D_gpu.trainables)
  File "D:\dy\idinvert\dnnlib\tflib\", line 98, in register_gradients
    grads = self._dev_opt[dev].compute_gradients(loss, trainable_vars, gate_gradients=tf.train.Optimizer.GATE_NONE)  # disable gating to reduce memory usage
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\training\", line 519, in compute_gradients
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\", line 630, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\", line 814, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\", line 408, in _MaybeCompile
    return grad_fn()  # Exit early
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\", line 814, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\", line 1131, in _MatMulGrad
    grad_b = gen_math_ops.mat_mul(a, grad, transpose_a=True)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\", line 4560, in mat_mul
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\framework\", line 787, in _apply_op_helper
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\util\", line 488, in new_func
    return func(*args, **kwargs)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\framework\", line 3274, in create_op
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\framework\", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

...which was originally created as op 'GPU0/D_loss/D_1/4x4/Dense0/MatMul', defined at:
  File "", line 193, in <module>
[elided 3 identical lines from previous traceback]
  File "D:\dy\idinvert\dnnlib\", line 257, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "D:\dy\idinvert\training\", line 182, in training_loop
    D_loss = dnnlib.util.call_func_by_name(G=G_gpu, D=D_gpu, opt=D_opt, training_set=training_set, minibatch_size=minibatch_split, reals=reals, labels=labels, **D_loss_args)
  File "D:\dy\idinvert\dnnlib\", line 257, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "D:\dy\idinvert\training\", line 154, in D_logistic_simplegp
    fake_scores_out = fp32(D.get_output_for(fake_images_out, labels, is_training=True))
  File "D:\dy\idinvert\dnnlib\tflib\", line 222, in get_output_for
    out_expr = self._build_func(*final_inputs, **build_kwargs)
  File "D:\dy\idinvert\training\", line 654, in D_basic
    scores_out = grow(2, resolution_log2 - 2)
  File "D:\dy\idinvert\training\", line 651, in grow
    x = block(x(), res); y = lambda: x
  File "D:\dy\idinvert\training\", line 619, in block
    x = act(apply_bias(dense(x, fmaps=nf(res-2), gain=gain, use_wscale=use_wscale)))
  File "D:\dy\idinvert\training\", line 159, in dense
    return tf.matmul(x, w)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\", line 2057, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\", line 4560, in mat_mul
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\framework\", line 787, in _apply_op_helper
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\util\", line 488, in new_func
    return func(*args, **kwargs)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\framework\", line 3274, in create_op

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(16, 8192), b.shape=(16, 512), m=8192, n=512, k=16
	 [[node GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/MatMul_grad/MatMul_1 (defined at D:\dy\idinvert\dnnlib\tflib\  = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](GPU0/D_loss/D_1/4x4/Dense0/Reshape, GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/add_grad/Reshape)]]
	 [[{{node TrainD/ApplyGrads0/UpdateWeights/cond/pred_id/_1585}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_24297_TrainD/ApplyGrads0/UpdateWeights/cond/pred_id", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

is this problem caused by large batch_size? but when i turn down the batch_size ,the problem is still occured.

Copy link

You can try on the images with the resolution of 256x256 and see if the problem still happens.

Copy link

MHX1203 commented May 15, 2022

the problem still occured.

Copy link

Your environment may cause it. I find some solutions, such as here and here, and see if these can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

2 participants