Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore shelljob in go-criu #150

Open
mihkeltiks opened this issue Sep 30, 2023 · 9 comments
Open

Restore shelljob in go-criu #150

mihkeltiks opened this issue Sep 30, 2023 · 9 comments

Comments

@mihkeltiks
Copy link

I am trying to figure out, how to go about restoring shelljobs in go-criu. Inside my Go code, I'm launching a child process and dumping it. When I go to restore it, I get the following error every time.
restore failed: operation failed (msg:Error (criu/tty.c:991): tty: Don't have tty to inherit session from, aborting err:0)
As far as I understand it is trying to inherit the shell session that the Go program is running it, so it fails. Am I missing a simple way to do this? I am setting the shelljob paremeter on both dump and restore, and have also tried setting setsid on the child process, but that isn't doing anything. Thanks!

@snprajwal
Copy link
Member

Are you using the -j/--shell-job flag while invoking CRIU? Also, could you please share the debug logs? You can generate them with the -v4 -o dump.log flags.

@mihkeltiks
Copy link
Author

Yes, I have been trying different options, but essentially these options for dump and restore. I have attached the dump and restore logs.

 opts := &rpc.CriuOpts{
     Pid:          proto.Int32(int32(pid)),
     ImagesDirFd:  proto.Int32(int32(img.Fd())),
     LogLevel:     proto.Int32(4),
     ShellJob:     proto.Bool(true),
     LogToStderr:  proto.Bool(true),
     LeaveRunning: proto.Bool(true),
     LogFile:      proto.String("dump.log"),
 }

 opts := &rpc.CriuOpts{
     ImagesDirFd: proto.Int32(int32(img.Fd())),
     LogLevel:    proto.Int32(4),
     ShellJob:    proto.Bool(true),
     LogFile:     proto.String("restore.log"),
 }

The dump works fine and I can restore it later on manually from the commandline, but the restore in Go is crashing with mentioned error. If it was confusing, I am trying to also restore it within the same Go program. I attached the parent and child code too. Thanks for the quick response!
logs_zipped.zip

@snprajwal
Copy link
Member

snprajwal commented Oct 2, 2023

When you start a child process from within the Go code, it does not directly have access to the stdin/stdout/stderr of the TTY from which you are running the main program. You are explicitly setting the "current" stdout/stderr for the child program in the following lines:

// newparent.go:149
	cmd.Stdout = os.Stdout
	cmd.Stderr = os.Stderr

When CRIU is restoring the child process, these file descriptors are not recognised as a terminal/TTY since the child never had direct access to it in the first place (at least, I think this is how it works, my explanation might be wrong).

One way to make this work is by using a pipe to connect the stdin/stdout/stderr of the parent process to the child process.

// newparent.go:149
	_, err := cmd.StdinPipe()
	// Handle error
	_, err = cmd.StdoutPipe()
	// Handle error
	_, err = cmd.StderrPipe()
	// Handle error

Making this change locally allowed me to successfully restore the child from within the parent process, like how you're trying to do.

@adrianreber
Copy link
Member

Not sure if this is related, but crun uses a callback to get an FD from the restored process from CRIU.

https://github.com/containers/crun/blob/dd52246b02b374330a6a747d57da9a8f326d7cba/src/libcrun/criu.c#L183

I do not think the go interface exposes this callback. I am also not sure it is related. But it reminded me a bit about this request here. Maybe it helps.

@rst0git
Copy link
Member

rst0git commented Oct 3, 2023

runc also implements something similar using the orphan-pts-master hook.

@mihkeltiks
Copy link
Author

mihkeltiks commented Oct 3, 2023

Thanks, It seems like it works. I suspected it was some Go technicality I didn't know or the --external option for CRIU, but the documentation isn't quite user-friendly. But I have another question about the output. So I ended up with a StartBinary function like this:

func startBinary(target ...string) (*exec.Cmd, bytes.Buffer) {
	  cmd := exec.Command(target[0], target[1:]...)
  
	  var stdBuffer bytes.Buffer
	  mw := io.MultiWriter(os.Stdout, &stdBuffer)
  
	  cmd.Stdout = mw
	  cmd.Stderr = mw
  
	  cmd.SysProcAttr = &syscall.SysProcAttr{
		  Ptrace: true,
		  Setsid: true,
	  }
  
	  if err := cmd.Start(); err != nil {
		  log.Fatal("Failed to start child process:", err)
	  }
  
	  go func() {
		  log.Println(stdBuffer.String())
	  }()
  
	  return cmd, stdBuffer
   }

Like so, I get the output to the terminal as it comes, but it doesn't seem to continue after the restore. Is there some trick for this too?
EDIT: It looks like the process stays in state tsl after ATTACH and CONT, Tracerpid seems the same though.

@mihkeltiks
Copy link
Author

Hi. I restructured the code a bit and I realize that the restored process, though CRIU says it restores successfully, never actually is restored. When SIGSTOP is sent to the process before dump, it revives and lives until a CONT or 2 are sent to it. When I don't send the SIGSTOP on the other hand. CRIU says it restores successfully, but instantly dies, though it is impossible that the process does it on its own. I am really at a loss here. I am quite sure it has to do with the pipes, but am running out of thoughts here. Do you have any ideas why this could be?
problem.zip

@MY-Chen2000
Copy link

MY-Chen2000 commented Feb 15, 2024

I met the same problem. I created a simple looper process by running a C++ script. It can print numbers in the terminal. The dumping worked fine, but when I tried to restore the process, the same error occured. I can restore it manually by using the 'criu' commandline, but I can not do this in go-criu.

@mihkeltiks
Copy link
Author

So I managed to work around this by starting a tty in Go and launching it from there manually. There is probably a better way with some external sockets or something, but I didn't have the capacity to find it then.
`
import ("github.com/creack/pty")

cmd := exec.Command("/usr/local/sbin/criu", "restore", "-v4", "-o", "restore.log", "-j", "--tcp-established", "-D", checkpointDir)

f, err := pty.Start(cmd)
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants