Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-2933] Don't add duplicated taskGroup to app #928

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

XbaoWu
Copy link

@XbaoWu XbaoWu commented Oct 17, 2024

What is this PR for?

I think that before properly handling ph tasks with different resource specifications within the same task group, it is necessary to avoid having two task-group with the same name but different resource specifications in the app.

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

  • - Task

What is the Jira issue?

Issue associated with this PR:https://issues.apache.org/jira/browse/YUNIKORN-2933

How should this be tested?

  1. Submit a taskgroup with the same name, for example:
apiVersion: batch/v1
kind: Job
metadata:
  name: gang-scheduling-job-example-1
  namespace: [namespace Name]
spec:
  completions: 2
  parallelism: 2
  template:
    metadata:
      labels:
        app: sleep
        applicationId: "gang-scheduling-job-example-1"
        queue: root.gang-example
      annotations:
        yunikorn.apache.org/task-group-name: task-group-example
        yunikorn.apache.org/task-groups: |-
          [{
              "name": "task-group-example",
              "minMember": 1,
              "minResource": {
                "cpu": "1",
                "memory": "50M"
              }
          },{
              "name": "task-group-example",
              "minMember": 2,
              "minResource": {
                "cpu": "1",
                "memory": "100M"
              }
          }]
    spec:
      schedulerName: yunikorn
      restartPolicy: Never
      containers:
        - name: sleep30
          image: "busybox:stable"
          imagePullPolicy: IfNotPresent
          command: ["sleep", "300"]
          resources:
            limits:
              cpu: "1"
              memory: "100M"
  1. Observe whether only 2 ph Pods are created on K8S (kubectl -n [namespace Name] get pod -w)
NAME                                  READY   STATUS    RESTARTS   AGE
gang-scheduling-job-example-1-84ld5   0/1     Pending   0          13s
gang-scheduling-job-example-1-j9rpd   0/1     Pending   0          13s
tg-gang-scheduling-job-example--task-group-example-eu770s0adb   0/1     Pending   0          17s
tg-gang-scheduling-job-example--task-group-example-qaoxan96yt   0/1     Pending   0          17s
tg-gang-scheduling-job-example--task-group-example-eu770s0adb   0/1     Pending   0          18s
tg-gang-scheduling-job-example--task-group-example-qaoxan96yt   0/1     Pending   0          18s
tg-gang-scheduling-job-example--task-group-example-eu770s0adb   0/1     ContainerCreating   0          19s
tg-gang-scheduling-job-example--task-group-example-qaoxan96yt   0/1     ContainerCreating   0          19s
tg-gang-scheduling-job-example--task-group-example-eu770s0adb   0/1     Terminating         0          19s
tg-gang-scheduling-job-example--task-group-example-qaoxan96yt   0/1     Terminating         0          20s
tg-gang-scheduling-job-example--task-group-example-eu770s0adb   1/1     Terminating         0          20s
tg-gang-scheduling-job-example--task-group-example-qaoxan96yt   1/1     Terminating         0          21s
tg-gang-scheduling-job-example--task-group-example-eu770s0adb   0/1     Terminating         0          21s
gang-scheduling-job-example-1-84ld5                             0/1     Pending             0          49s
tg-gang-scheduling-job-example--task-group-example-eu770s0adb   0/1     Terminating         0          21s
tg-gang-scheduling-job-example--task-group-example-eu770s0adb   0/1     Terminating         0          21s
gang-scheduling-job-example-1-84ld5                             0/1     ContainerCreating   0          49s
tg-gang-scheduling-job-example--task-group-example-qaoxan96yt   0/1     Terminating         0          22s
gang-scheduling-job-example-1-j9rpd                             0/1     Pending             0          50s
tg-gang-scheduling-job-example--task-group-example-qaoxan96yt   0/1     Terminating         0          22s
tg-gang-scheduling-job-example--task-group-example-qaoxan96yt   0/1     Terminating         0          22s
gang-scheduling-job-example-1-j9rpd                             0/1     ContainerCreating   0          50s
gang-scheduling-job-example-1-84ld5                             1/1     Running             0          51s
gang-scheduling-job-example-1-j9rpd                             1/1     Running             0          52s

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@pbacsko pbacsko changed the title Don't add duplicated taskGroup to app [YUNIKORN-2933] Don't add duplicated taskGroup to app Oct 17, 2024
Copy link

codecov bot commented Oct 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.12%. Comparing base (753e4f8) to head (e8e5407).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #928      +/-   ##
==========================================
- Coverage   68.62%   68.12%   -0.51%     
==========================================
  Files          70       70              
  Lines        7675     9219    +1544     
==========================================
+ Hits         5267     6280    +1013     
- Misses       2202     2732     +530     
- Partials      206      207       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits

@@ -80,7 +81,7 @@ func NewApplication(appID, queueName, user string, groups []string, tags map[str
taskMap: taskMap,
tags: tags,
sm: newAppState(),
taskGroups: make([]TaskGroup, 0),
taskGroups: taskGroups,
Copy link
Contributor

@pbacsko pbacsko Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

taskGroups: make(map[string]*TaskGroup)

@@ -71,6 +71,7 @@ func (app *Application) String() string {

func NewApplication(appID, queueName, user string, groups []string, tags map[string]string, scheduler api.SchedulerAPI) *Application {
taskMap := make(map[string]*Task)
taskGroups := make(map[string]*TaskGroup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have to be a pointer? Probably not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have to be a pointer? Probably not.

Thank you for your reply, I agree with you.

for _, taskGroup := range app.taskGroups {
app.placeholderAsk = common.Add(app.placeholderAsk, common.GetTGResource(taskGroup.MinResource, int64(taskGroup.MinMember)))
for _, tg := range taskGroups {
taskGroup := tg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the map is string[TaskGroup] then this is not needed I believe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the map is string[TaskGroup] then this is not needed I believe.

Yes, if the map is string[TaskGroup], it really doesn 't need to be done here.

Comment on lines +173 to +175
log.Log(log.ShimCacheApplication).Warn("duplicate task-group within the task-groups",
zap.String("appID", app.applicationID),
zap.String("groupName", taskGroup.Name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you have an unit test which covers this branch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you have an unit test which covers this branch.

Well yeah, that 's a good addition. I will provide the unit test which covers this branch.

Comment on lines 192 to 199

taskGroups := make([]TaskGroup, 0)
if len(app.taskGroups) > 0 {
for _, taskGroup := range app.taskGroups {
taskGroups = append(taskGroups, *taskGroup)
}
}
return taskGroups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

if len(app.taskGroups) > 0 {
  taskGroups := make([]TaskGroup, 0, len(app.taskGroups))
  for _, taskGroup := range app.taskGroups {
    taskGroups := append(taskGroups , *taskGroup) 
  }
  return taskGroups
}
return nil


@wilfred-s
Copy link
Contributor

We cannot just eat the duplicates and not provide feedback to the customer via an event or something on the original pod.
See the comment in YUNIKORN-2517. Before making any code changes we need to define how we want to behave.

Without specifying and documenting the behaviour I am against making this change.

Copy link
Contributor

@wilfred-s wilfred-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 without properly defining behaviour firts.

@XbaoWu
Copy link
Author

XbaoWu commented Oct 18, 2024

We cannot just eat the duplicates and not provide feedback to the customer via an event or something on the original pod. See the comment in YUNIKORN-2517. Before making any code changes we need to define how we want to behave.

Without specifying and documenting the behaviour I am against making this change.

Thank you for your reply. In PR, I do ignore prompting submitter about our behavior, which is really important. I will add this part of the description in the code and feedback to the submitter via an event on the original pod.

@pbacsko
Copy link
Contributor

pbacsko commented Oct 18, 2024

We cannot just eat the duplicates and not provide feedback to the customer via an event or something on the original pod. See the comment in YUNIKORN-2517. Before making any code changes we need to define how we want to behave.

Without specifying and documenting the behaviour I am against making this change.

+1 I forgot to say this. And also let's add some info to the docs on the site about how this is intended to work and warn the users that they should pay attention to the task group definition. We might even reject pods with incorrect taskgroup data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants