You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When attempting to initialize a TPU pod, a ConnectionRefusedError: [Errno 111] Connection refused TPU error occurs.
To Reproduce
Steps to reproduce the behavior:
Setup TPU Pod: Ensure that your TPU pod is properly configured and active.
Install Dependencies: Ensure torch >= v2.4.1 and torch_xla >= v2.4.0 are installed.
Run the Following Script:
importtorch.distributedasdistimporttorch_xla.runtimeasxrimporttorch_xla.distributed.xla_backend# Import to register the `xla://` init_methodxr.use_spmd()
dist.init_process_group("gloo", init_method="xla://")
Expected Behavior
The script should run without errors, initializing the process group correctly on the TPU pod.
Environment
XLA Backend: TPU
Torch XLA Version: 2.4.0
Torch Version: 2.4.1 or greater
Additional Context
The issue does not occur with torch and torch_xla versions <= 2.4.0.
🐛 Bug Report
Description
When attempting to initialize a TPU pod, a
ConnectionRefusedError: [Errno 111] Connection refused TPU
error occurs.To Reproduce
Steps to reproduce the behavior:
torch
>= v2.4.1 andtorch_xla
>= v2.4.0 are installed.Expected Behavior
The script should run without errors, initializing the process group correctly on the TPU pod.
Environment
Additional Context
torch
andtorch_xla
versions <= 2.4.0.Error Message
The text was updated successfully, but these errors were encountered: