Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy progress - feature request #839

Open
deitch opened this issue Oct 21, 2024 · 5 comments
Open

Copy progress - feature request #839

deitch opened this issue Oct 21, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@deitch
Copy link
Contributor

deitch commented Oct 21, 2024

As discussed in this comment (with thanks to @shizhMSFT for shepherding it through).

Summary

  • Request: Add support for ongoing progress updates when running oras.Copy()
  • Context: if the downloaded blobs/manifests are large (and sometimes even if not), being able to know status updates, and that things are not stuck, is very useful
  • Boundaries: this should not be concerned with UI or other aspects, except, potentially, for oras CLI, and that is out of scope at this time. The entire interface should be one that lets updates be sent by code to some consumer, which handles UI (if any)

Proposed design

The signature for oras.Copy includes oras.CopyOptions as the last parameter. If this were variadic, I would suggest adding another WithProgressUpdate() or similar, but we use a single CopyOptions, so I propose adding another property: CopyOptions.Progress. The type depends on the design choice.

There are two ways to do this:

  • function: CopyOptions.Progress func(Update). With each update, Copy() (or its underlying functions) would call the passed function, passing it the Update (see below).
  • channel: CopyOptions.Progress chan<- Update. With each update Copy() would send an Update to the channel

If Progress is nil, then this functionality is bypassed.

Despite my linked PR using channels, I have no strong preference for channel over function.

Preventing Blocking

With both functions and channels, there is a concern that it might block. In principle, I think that if someone calls Copy() and passes it a blocking function or unbuffered channel, that is their issue. However, it can cause us headaches to support them.

I would consider having each call that sends to the channel or calls func() to be in a separate short-lived goroutine, which calls fund or sends to the channel, wrapped in a timeout.

Frequency of Update

My initial design is to rely on the underlying io.CopyN(). Whatever we use for that, we use for writing updates. However, that can overwhelm if the default io.Copy() is used. If I recall correctly, io.Copy() defaults to 32K. With a 100MB blob, that is ~3000 updates. That may or may not be good.

However we control the update frequency, I think it should be byte-based, not time-based. I.e. "updates every x KB" instead of "updates every y seconds." That is more useful, and also easier to implement.

In terms of controlling the update frequency, the simplest way is CopyOption.ProgresssFrequency uint. If it is 0, stick to the default.

An alternative is to have CopyOption.Progress be a struct with both the channel/func (whichever is chosen) and an update frequency property.

A third method - and probably the simplest - is not to control it at all, but rather have it be part of CopyOption.Progress. Our call Copy() calls that / sends to channel, and it buffers as often as it wants. This is the simplest, but is subject to making our "blocking control", i.e. goroutines, being overwhelmed.

Open to ideas.

Structure of update message

The oras.Update should be simple and contain only 2 properties:

type Update struct {
   Copied int64
   Descriptor descriptor
}

The descriptor is important for 2 reasons:

  1. To know the total expected size (which lets the consumer calculate the percentage complete)
  2. To know what downloading blob/manifest this is for. The func can be called / channel can receive messages multiple times, even in parallel. This lets the consumer know exactly what the update is for.

Sample

Channel:

ch := make(chan oras.Update, 1000)
opts := oras.CopyOptions{
    ProgressChannel: ch        
}
desc, err := oras.Copy(ctx, src, tagName, dst, tagName, opts)
go func(ch <- chan CopyUpdate) {
    for msg := range ch {
        fmt.Printf("copied %d bytes out of %d total for %s\n", msg.Copied, msg.Descriptor.Size, msg.Descriptor.Digest)
    }
}(ch)

Func:

f := func(msg oras.Update) {
        fmt.Printf("copied %d bytes out of %d total for %s\n", msg.Copied, msg.Descriptor.Size, msg.Descriptor.Digest)
}
opts := oras.CopyOptions{
    ProgressHandler: f
}
desc, err := oras.Copy(ctx, src, tagName, dst, tagName, opts)
@shizhMSFT
Copy link
Contributor

How about a type ProgressManager interface in CopyOptions?

@deitch
Copy link
Contributor Author

deitch commented Oct 21, 2024

How about a type ProgressManager interface in CopyOptions?

What does it look like? How is it better than a channel or a func?

Maybe a practical example would help me understand it?

@shizhMSFT
Copy link
Contributor

How about a type ProgressManager interface in CopyOptions?

What does it look like? How is it better than a channel or a func?

Maybe a practical example would help me understand it?

Something like https://pkg.go.dev/oras.land/oras@v1.2.0/cmd/oras/internal/display/status/progress#Manager

type Manager interface {
	Add() (Status, error)
	Close() error
}

type Status chan *status

// status is used as message to update progress view.
type status struct {
	done        bool // done is true when the end time is set
	prompt      string
	descriptor  ocispec.Descriptor
	offset      int64
	total       humanize.Bytes
	speedWindow *speedWindow

	startTime time.Time
	endTime   time.Time
	mark      spinner
	lock      sync.Mutex
}

The oras CLI combines the status update and view model in a single structure.

Maybe we can improve a bit by separating them. /cc @qweeah

@deitch
Copy link
Contributor Author

deitch commented Oct 21, 2024

I truly don't remember that part. 😁

I could see having an err error and a done bool property, not sure I would want much else.

What is the Manager used for? Is that just so that a UI-implementing side (like oras CLI) can manage it? If so, I would keep it separate. Simplest possible method (channel or func call) passed to Copy(), let the consuming side worry about what it does with it.

If we want to add progress to oras CLI, it becomes a consuming party, I think I would handle it as a separate addition in a separate PR.

@FeynmanZhou FeynmanZhou added the enhancement New feature or request label Oct 21, 2024
@deitch
Copy link
Contributor Author

deitch commented Oct 25, 2024

I thought about this a bit more over the last few days. I still am having a hard time seeing the benefit of a ProgressManager interface over just having a func or chan. I am trying to keep it as simple as possible, which to me looks like:

 := func(msg oras.Update) {
        fmt.Printf("copied %d bytes out of %d total for %s\n", msg.Copied, msg.Descriptor.Size, msg.Descriptor.Digest)
}
opts := oras.CopyOptions{
    ProgressHandler: f
}
desc, err := oras.Copy(ctx, src, tagName, dst, tagName, opts)

With each copy of data, it just calls f(update).

I have been thinking a bit more about the whole "how do we call the function without blocking". I had suggested earlier:

I would consider having each call that sends to the channel or calls func() to be in a separate short-lived goroutine, which calls fund or sends to the channel, wrapped in a timeout.

but I like this less. I am worried about a proliferation of goroutines. If we have frequent updates, that could get out of control.

Here is a possible alternative.

  1. When we call Copy(), if the handler is not nil, we spin up a chan Update and one goroutine.
  2. With each update, the main thread (from inside Copy()) sends an Update on the channel. The goroutine we spun up receives that update and then calls the handler function.
  3. Since we create the channel, we control the channel size, we can ensure it is well buffered. We also can have the main Copy() thread put select default on sending to the channel, so we do not have to worry about it.

This is less of an issue if we pass a channel in the options, instead of a function, but I do think the option is cleaner.

Thoughts @shizhMSFT ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants