Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design-proposal: KubeVirt DRA design proposal #293

Closed
wants to merge 1 commit into from

Conversation

rthallisey
Copy link
Contributor

@rthallisey rthallisey commented May 17, 2024

DRA (Dynamic Resource Allocation) design proposal.

What this PR does / why we need it:

KubeVirt DRA integration will allow VM users fine-grained control of devices. This is important for many use-cases:

  • Full Device Advertisement
    In the current device-plugin model, GPUs can only be advertised in one mode (Passthrough, MIG, vGPU) for the full lifecycle of the device. This model doesn’t work when an end-user has the ability to choose which type of device they need - PT, vGPU, or MIG.
    Advertising full devices is a common use-case in ephemeral workloads like HPC, ML, Cloud Gaming, ect.., that require on-demand driver switches on the GPU, based on the type of workload requested.
  • Pluggable CPU Management
    Similar to the GPU, the CPU can be logically separated and handed out to workloads. Conceptually the use-case would be the same as GPUs - allow a plugin to handle the specialized allocation of the CPU. Similarly, need memory management as well. We should be allowing full platform management for specialized vertical solutions.

DRA is not a device-plugin replacement - it solves more problems.

Release note:

None

Signed-off-by: Ryan Hallisey <rhallisey@nvidia.com>
@kubevirt-bot kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label May 17, 2024
@kubevirt-bot kubevirt-bot requested a review from jobbler May 17, 2024 15:27
@kubevirt-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fabiand for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rthallisey
Copy link
Contributor Author

cc @varunrsekar

@alaypatel07
Copy link

/label sig-api

@kubevirt-bot
Copy link

@alaypatel07: The label(s) /label sig-api cannot be applied. These labels are supported: good-first-issue. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/label sig-api

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@maiqueb
Copy link
Contributor

maiqueb commented May 17, 2024

/sig api
/cc

@kubevirt-bot kubevirt-bot requested a review from maiqueb May 17, 2024 15:48
@kubevirt-bot kubevirt-bot added the sig/api Denotes an issue or PR that relates to changes in api. label May 17, 2024
## Definition of Users
A user is a person that wants to attach a device to a VM

## User Stories

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional usecase:

  • As a user, I would like to use the dra driver for certain devices and device-plugins for others.

to exist in Kubernetes, but DRA will offer vendors more control over the device topology.

## Motivation
DRA adoption is important for KubeVirt so that vendors can expect the same

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, DRA vendors design their drivers using CDI with the idea in mind that the requesting pod is going to consume the device (say, GPU) directly. In existing KubeVirt atchitecture, virt-launcher pod has kept a minimal security profile where even though it is requesting the devices from device-plugins, it only gets partial access to the devices, just enough to be able to pass it along to the libvirt domain.

I'm wondering if it will break the security posture of KubeVirt if we introduce DRA drivers that allows full access to the devices from the virt-launcher pod.

Comment on lines +131 to +135
spec:
resourceClaims:
- name: a100-40C
source:
resourceClaimTemplateName: a100-40C-claim-template

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically for GPUs, this is the spec I had in mind to keep in-sync with the existing architecture and to address an usecase such as:

As a user, I would like to use the dra driver for certain devices and device-plugins for others.

---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: vm-cirros
  name: vm-cirros
spec:
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-cirros
    spec:
       domain:
          devices:
             gpus:
             - name: gpu0 
               resourceClaim:
                  name: a100-40C
                  source:
                     resourceClaimTemplateName: a100-40C-claim-template
–--
apiVersion: v1
kind: Pod
metadata:
  name: virt-launcher-cirros
spec:
  containers:
  - name: virt-launcher
    image: virt-launcher
    resources:
      claims:
      - name: a100-40C
  resourceClaims:
  - name: a100-40C
    source:
      resourceClaimTemplateName: a100-40C-claim-template

The idea is that, for a gpu device, if the resourceClaim field is set, then kubevirt will know its to be provisioned by a dra driver and applies it to the virt-launcher pod spec. If instead, the deviceName is set, then kubevirt will know it is to be provisioned by a device-plugin and will set resources.requests[<deviceName>].

The same rationale can be applied to the other device types from https://pkg.go.dev/kubevirt.io/api/core/v1#DomainSpec that would need DRA integration

@alicefr
Copy link
Member

alicefr commented May 21, 2024

/cc

@kubevirt-bot kubevirt-bot requested a review from alicefr May 21, 2024 08:00
@alicefr
Copy link
Member

alicefr commented May 21, 2024

@rthallisey one thing that is difficult for me to model with DRA is how we pass the device information. Today, there is an implicit API/mechanism between the device plugin and kubevirt how the device information are passed through environmental variables. For example, kubevirt cannot work out-of-the-box with any device plugins.
Here, we kind have the some problem. Do you have any ideas how we could formalize more the mechanism how to pass the device information?

resourceClaims:
- name: rtx4090
source:
resourceClaimTemplateName: rtx4090-claim-template
Copy link
Member

@alicefr alicefr May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to reference a claim here or the claim template?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to have both.

@iholder101
Copy link
Contributor

/sig compute
/cc
FYI @jean-edouard

@fabiand
Copy link
Member

fabiand commented Jun 3, 2024

Hey. Does it make sense to split this proposal into two:

  1. Proposal/Design how DRA resources can be consumed by KubeVirt - this is mostly api integration and then plumbing
  2. Proposal/Design how DRA becomes a DRA provider for it's own resources. This is mostly about how we can change/extend the DP work to then expose DRA resources instead of DP.

Thoughts?

@alicefr @vladikr

@rthallisey
Copy link
Contributor Author

@alicefr

@rthallisey one thing that is difficult for me to model with DRA is how we pass the device information. Today, there is an implicit API/mechanism between the device plugin and kubevirt how the device information are passed through environmental variables. For example, kubevirt cannot work out-of-the-box with any device plugins.
Here, we kind have the some problem. Do you have any ideas how we could formalize more the mechanism how to pass the device information?

@varunrsekar is discussing this with upstream DRA folks. We're working on a path to formalize device info in vendor plugins.

@kubevirt-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 29, 2024
@aburdenthehand
Copy link
Contributor

@rthallisey @varunrsekar Are we still going ahead with this?

@lyarwood
Copy link
Member

lyarwood commented Oct 2, 2024

/cc

@iholder101 iholder101 mentioned this pull request Oct 8, 2024
8 tasks
@rthallisey
Copy link
Contributor Author

@aburdenthehand yes

@rthallisey
Copy link
Contributor Author

Discussion has moved to #331

@rthallisey rthallisey closed this Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/api Denotes an issue or PR that relates to changes in api. sig/compute size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants