You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`
Verbose error log
0%| | 0/128 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [96,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [97,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [98,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [99,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [100,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [101,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [102,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [103,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [104,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [105,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [106,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [107,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [108,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [109,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [110,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [111,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [112,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [113,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [114,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [115,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [116,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [117,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [118,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [119,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [120,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [121,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [122,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [123,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [124,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [125,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [126,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [73,0,0], thread: [127,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
0%| | 0/128 [00:00<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[6], [line 1](vscode-notebook-cell:?execution_count=6&line=1)
----> [1](vscode-notebook-cell:?execution_count=6&line=1) baseline_cache = ortho.create_activation_cache(baseline,N=len(baseline))
[2](vscode-notebook-cell:?execution_count=6&line=2) eeyore_cache = ortho.create_activation_cache(eeyored_toks,N=len(eeyored_toks))
File ~/Desktop/AbliterationExperiments/abliterator/abliterator.py:613, in ModelAbliterator.create_activation_cache(self, toks, N, batch_size, last_indices, measure_refusal, stop_at_layer)
[611](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:611) z_label = [] if measure_refusal > 1 else None
[612](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:612) for i in tqdm(range(0,min(N,len(toks)),batch_size)):
--> [613](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:613) logits,cache = self.run_with_cache(toks[i:min(i+batch_size,len(toks))],max_new_tokens=measure_refusal,stop_at_layer=stop_at_layer)
[614](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:614) if measure_refusal > 1:
[615](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:615) z_label.extend(self.measure_scores_from_logits(logits,measure_refusal)[0])
File ~/Desktop/AbliterationExperiments/abliterator/abliterator.py:396, in ModelAbliterator.run_with_cache(self, names_filter, incl_bwd, device, remove_batch_dim, reset_hooks_end, clear_contexts, fwd_hooks, max_new_tokens, *model_args, **model_kwargs)
[392](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:392) max_new_tokens = 1
[394](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:394) with self.model.hooks(fwd_hooks=fwd_hooks, bwd_hooks=bwd, reset_hooks_end=reset_hooks_end, clear_contexts=clear_contexts):
[395](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:395) #model_out = self.model(*model_args,**model_kwargs)
--> [396](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:396) model_out,toks = self.generate_logits(*model_args,max_tokens_generated=max_new_tokens, **model_kwargs)
[397](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:397) if incl_bwd:
[398](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:398) model_out.backward()
File ~/Desktop/AbliterationExperiments/abliterator/abliterator.py:317, in ModelAbliterator.generate_logits(self, toks, drop_refusals, stop_at_eos, max_tokens_generated, *args, **kwargs)
[315](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:315) generating = [i for i in range(toks.shape[0])]
[316](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:316) for i in range(max_tokens_generated):
--> [317](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:317) logits = self.model(all_toks[generating, :-max_tokens_generated + i],*args,**kwargs)
[318](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:318) next_tokens = logits[:,-1,:].argmax(dim=-1).to('cpu')
[319](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/Desktop/AbliterationExperiments/abliterator/abliterator.py:319) all_toks[generating,-max_tokens_generated+i] = next_tokens
File ~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
[1530](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1530) return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
[1531](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1531) else:
-> [1532](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1532) return self._call_impl(*args, **kwargs)
File ~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
[1536](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1536) # If we don't have any hooks, we want to skip the rest of the logic in
[1537](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1537) # this function, and just call forward.
[1538](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1538) if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
[1539](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1539) or _global_backward_pre_hooks or _global_backward_hooks
[1540](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1540) or _global_forward_hooks or _global_forward_pre_hooks):
-> [1541](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1541) return forward_call(*args, **kwargs)
[1543](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1543) try:
[1544](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1544) result = None
File ~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:550, in HookedTransformer.forward(self, input, return_type, loss_per_token, prepend_bos, padding_side, start_at_layer, tokens, shortformer_pos_embed, attention_mask, stop_at_layer, past_kv_cache)
[545](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:545) if shortformer_pos_embed is not None:
[546](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:546) shortformer_pos_embed = shortformer_pos_embed.to(
[547](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:547) devices.get_device_for_block_index(i, self.cfg)
[548](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:548) )
--> [550](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:550) residual = block(
[551](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:551) residual,
[552](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:552) # Cache contains a list of HookedTransformerKeyValueCache objects, one for each
[553](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:553) # block
[554](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:554) past_kv_cache_entry=past_kv_cache[i] if past_kv_cache is not None else None,
[555](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:555) shortformer_pos_embed=shortformer_pos_embed,
[556](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:556) attention_mask=attention_mask,
[557](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:557) ) # [batch, pos, d_model]
[559](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:559) if stop_at_layer is not None:
[560](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:560) # When we stop at an early layer, we end here rather than doing further computation
[561](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/HookedTransformer.py:561) return residual
File ~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
[1530](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1530) return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
[1531](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1531) else:
-> [1532](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1532) return self._call_impl(*args, **kwargs)
File ~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
[1536](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1536) # If we don't have any hooks, we want to skip the rest of the logic in
[1537](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1537) # this function, and just call forward.
[1538](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1538) if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
[1539](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1539) or _global_backward_pre_hooks or _global_backward_hooks
[1540](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1540) or _global_forward_hooks or _global_forward_pre_hooks):
-> [1541](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1541) return forward_call(*args, **kwargs)
[1543](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1543) try:
[1544](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1544) result = None
File ~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:159, in TransformerBlock.forward(self, resid_pre, shortformer_pos_embed, past_kv_cache_entry, attention_mask)
[152](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:152) key_input = attn_in
[153](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:153) value_input = attn_in
[155](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:155) attn_out = (
[156](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:156) # hook the residual stream states that are used to calculate the
[157](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:157) # queries, keys and values, independently.
[158](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:158) # Then take the layer norm of these inputs, and pass these to the attention module.
--> [159](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:159) self.attn(
[160](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:160) query_input=self.ln1(query_input)
[161](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:161) + (0.0 if shortformer_pos_embed is None else shortformer_pos_embed),
[162](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:162) key_input=self.ln1(key_input)
[163](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:163) + (0.0 if shortformer_pos_embed is None else shortformer_pos_embed),
[164](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:164) value_input=self.ln1(value_input),
[165](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:165) past_kv_cache_entry=past_kv_cache_entry,
[166](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:166) attention_mask=attention_mask,
[167](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:167) )
[168](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:168) ) # [batch, pos, d_model]
[169](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:169) if self.cfg.use_normalization_before_and_after:
[170](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:170) # If we use LayerNorm both before and after, then apply the second LN after the layer
[171](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:171) # and before the hook. We do it before the hook so hook_attn_out captures "that which
[172](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:172) # is added to the residual stream"
[173](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/transformer_block.py:173) attn_out = self.ln1_post(attn_out)
File ~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
[1530](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1530) return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
[1531](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1531) else:
-> [1532](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1532) return self._call_impl(*args, **kwargs)
File ~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
[1536](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1536) # If we don't have any hooks, we want to skip the rest of the logic in
[1537](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1537) # this function, and just call forward.
[1538](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1538) if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
[1539](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1539) or _global_backward_pre_hooks or _global_backward_hooks
[1540](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1540) or _global_forward_hooks or _global_forward_pre_hooks):
-> [1541](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1541) return forward_call(*args, **kwargs)
[1543](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1543) try:
[1544](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/torch/nn/modules/module.py:1544) result = None
File ~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:216, in AbstractAttention.forward(self, query_input, key_input, value_input, past_kv_cache_entry, additive_attention_mask, attention_mask, position_bias)
[213](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:213) q = q.to(torch.float32)
[214](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:214) k = k.to(torch.float32)
--> [216](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:216) attn_scores = self.calculate_attention_scores(
[217](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:217) q, k
[218](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:218) ) # [batch, head_index, query_pos, key_pos]
[220](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:220) if self.cfg.positional_embedding_type == "alibi":
[221](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:221) query_ctx = attn_scores.size(-2)
File ~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/grouped_query_attention.py:153, in GroupedQueryAttention.calculate_attention_scores(self, q, k)
[142](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/grouped_query_attention.py:142) """Calculate attention scores from Q and the unexpanded K matrix.
[143](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/grouped_query_attention.py:143) K will be expaned from [batch, pos, n_key_value_head, d_head] to [batch, pos, n_query_heads, d_head] using torch.repeat_interleave.
[144](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/grouped_query_attention.py:144)
(...)
[150](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/grouped_query_attention.py:150) Float[torch.Tensor, "batch head_index query_pos key_pos"]: The attention scores.
[151](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/grouped_query_attention.py:151) """
[152](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/grouped_query_attention.py:152) k = torch.repeat_interleave(k, dim=2, repeats=self.repeat_kv_heads)
--> [153](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/grouped_query_attention.py:153) return super().calculate_attention_scores(q, k)
File ~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:409, in AbstractAttention.calculate_attention_scores(self, q, k)
[403](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:403) q_ = einops.rearrange(
[404](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:404) q, "batch query_pos head_index d_head -> batch head_index query_pos d_head"
[405](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:405) )
[406](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:406) k_ = einops.rearrange(
[407](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:407) k, "batch key_pos head_index d_head -> batch head_index d_head key_pos"
[408](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:408) )
--> [409](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:409) attn_scores = q_ @ k_ / self.attn_scale
[410](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:410) if self.cfg.attn_scores_soft_cap > 0:
[411](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:411) attn_scores = self.cfg.attn_scores_soft_cap * F.tanh(
[412](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:412) attn_scores / self.cfg.attn_scores_soft_cap
[413](https://file+.vscode-resource.vscode-cdn.net/home/server_runner/Desktop/AbliterationExperiments/abliterator/~/anaconda3/envs/abliteratorENV2/lib/python3.11/site-packages/transformer_lens/components/abstract_attention.py:413) )
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`
I have an ubuntu 22.04 rig with 2x 3090s. Nvidia SMI is located below
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 On | N/A |
| 56% 69C P2 126W / 350W | 9744MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Ti On | 00000000:06:00.0 Off | Off |
| 0% 39C P8 16W / 480W | 10116MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1699 G /usr/lib/xorg/Xorg 199MiB |
| 0 N/A N/A 1878 G /usr/bin/gnome-shell 72MiB |
| 0 N/A N/A 2482 G ...erProcess --variations-seed-version 127MiB |
| 0 N/A N/A 3354 G ...irefox/4539/usr/lib/firefox/firefox 175MiB |
| 0 N/A N/A 6859 C ...da3/envs/abliteratorENV2/bin/python 9138MiB |
| 1 N/A N/A 1699 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 6859 C ...da3/envs/abliteratorENV2/bin/python 10098MiB |
+---------------------------------------------------------------------------------------+
I am running abliterator inside a conda environment. The CUDA version associated with pytorch inside my environment is different than the system CUDA (12.1 vs 12.3). Could this be the source of the error?
The text was updated successfully, but these errors were encountered:
I am attempting to use the following notebook
https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule/blob/main/MopeyMule-Induce-Melancholy.ipynb
I receive the following error when I run the following parts of the notebook
Error
Verbose error log
I have an ubuntu 22.04 rig with 2x 3090s. Nvidia SMI is located below
I am running abliterator inside a conda environment. The CUDA version associated with pytorch inside my environment is different than the system CUDA (12.1 vs 12.3). Could this be the source of the error?
The text was updated successfully, but these errors were encountered: