Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auth: Fix missing snapshots and backups from storage pool used-by URLs #14324

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

markylaing
Copy link
Contributor

The underlying cause of this bug was that general filtering of used-by URLs makes the assumption that the can_view entitlement is available for all entity types. It is a fair assumption, but wasn't true for storage volume or instance backups or snapshots.

To fix this, four new entity types have been added to the authorization model:

  • instance_backup
  • instance_snapshot
  • storage_volume_backup
  • storage_volume_snapshot

Each has associated entitlements:

  • can_edit
  • can_view
  • can_delete

It is still not possible to grant these entitlements via the API. Instead, they are granted via can_manage_snapshots or can_manage_backups on the associated instance or storage volume.

The OpenFGADatastore implementation has been updated to handle instance and storage_volume relations between the parent and it's snapshots/backups.

Closes #14291

@markylaing markylaing added the Bug Confirmed to be a bug label Oct 22, 2024
@markylaing markylaing added this to the lxd-6.2 milestone Oct 22, 2024
@markylaing markylaing self-assigned this Oct 22, 2024
@github-actions github-actions bot added the Documentation Documentation needs updating label Oct 22, 2024
Copy link

Heads up @mionaalex - the "Documentation" label was applied to this issue.

@markylaing
Copy link
Contributor Author

CC @mas-who @edlerd

@tomponline
Copy link
Member

tests are sad

@@ -149,6 +150,40 @@ func (o *openfgaStore) Read(ctx context.Context, s string, key *openfgav1.TupleK
},
}

case "instance":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the entity type constants in this switch statement to provide a link between the entity type definition and the logic here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do yes. The reason I didn't do this is because these are relations in the model and not necessarily entity types. We have used the same name in our model though, and I think it makes sense to always do so, so I'll change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its useful to see where they are used.

@markylaing
Copy link
Contributor Author

markylaing commented Oct 23, 2024

@tomponline tests are mostly green except for one: https://github.com/canonical/lxd/actions/runs/11463430112/job/31913802942#step:12:52804

I'm not certain why this is failing as it doesn't seem to have anything to do with this PR. It is potentially related to #14315 since lxc profile assign calls PUT /1.0/instances/{name} which does some Profile.ToAPI work. I also don't understand why it only failed with the dir storage backend.

Edit: Note that this also doesn't fail locally. I'll have to get a tmate session running.

@tomponline
Copy link
Member

tomponline commented Oct 23, 2024

I'm not certain why this is failing as it doesn't seem to have anything to do with this PR. It is potentially related to #14315 since lxc profile assign calls PUT /1.0/instances/{name} which does some Profile.ToAPI work. I also don't understand why it only failed with the dir storage backend.

@hamistao please can you check this out, thanks

Seems like a panic.

@markylaing
Copy link
Contributor Author

@tomponline @hamistao The CI passed on the third attempt. I'll investigate a bit more though as I don't want to introduce any races, especially when they may be causing a panic.

@markylaing
Copy link
Contributor Author

@tomponline @hamistao The CI passed on the third attempt. I'll investigate a bit more though as I don't want to introduce any races, especially when they may be causing a panic.

I've been investigating this for an hour or so with no progress. It would be very useful to surface panics in the test logs. I'm trying to figure out a way to do this.

@tomponline
Copy link
Member

@tomponline @hamistao The CI passed on the third attempt. I'll investigate a bit more though as I don't want to introduce any races, especially when they may be causing a panic.

I've been investigating this for an hour or so with no progress. It would be very useful to surface panics in the test logs. I'm trying to figure out a way to do this.

Did you identify which commit introduced it yet?

Did you try reverting the earlier profiles PR?

@markylaing
Copy link
Contributor Author

Did you identify which commit introduced it yet?

Did you try reverting the earlier profiles PR?

With it being intermittent I didn't think reverting the profiles PR would tell me very much (e.g. I'll need to figure out where that panic is occurring in either case). I've added a commit to check LXD logs for panics. It's failing on standalone tests but not in the cluster tests which is a bit odd. Still investigating.

@markylaing
Copy link
Contributor Author

I've re-run the test 8 times now and the panic only occurred on the first two runs. I've added a PR to handle panics a bit more cleanly in the future (#14346). If it happens again it should be obvious where it occurred.

@markylaing
Copy link
Contributor Author

Of course it fails again as soon as I move the panic checker work into another PR 🤦

Signed-off-by: Mark Laing <mark.laing@canonical.com>
This runs the panic checker against all currently running LXD daemons.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
This commit reverts any changes made to the current directory in
any test suites.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Adds instance and storage volume snapshots and backups to the OpenFGA
model. These entitlements cannot be assigned to identities, service
accounts, or group members. Instead they are inherited from the parent
instance or volume.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
…d backups.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
The auth.ValidateEntitlement function validates all entitlements that
can be granted via the API. Since the new entitlements on snapshots and
backups cannot be granted via the API, this check fails.

The OpenFGA server will return an error if an invalid query is performed
based on it's own understanding of the authorization model.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Previously the only entities that had inherited relations were project and
server. Now that we are linking instances and storage volumes to their
snapshots and backups, the OpenFGADatastore implementation needs to handle
these relations.

On Read, we can connect a snapshot or backup to its parent instance or
storage volume using the information stored in its URL. For example, the
storage volume backup URL:

/1.0/storage-pools/default/volumes/custom/vol1/backups/backup1?project=project1

is related to its parent:

/1.0/storage-pools/default/volumes/custom/vol1?project=project1

via the `storage_volume relation`.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
…tartingWithUser.

Previously the only entities that had inherited relations were project and
server. Now that we are linking instances and storage volumes to their
snapshots and backups, the OpenFGADatastore implementation needs to handle
these relations.

On ReadStartingWithUser, the function needs to return all backups or snapshots that
are related to a parent instance or storage volume. This is used in the `ListObjects`
call to the OpenFGA server, which is used by `(auth.Authorizer).GetPermissionChecker`.

To do this, I have naively queried for all snapshots or backups in the project, and
filtered out those that don't have the correct parent. This keeps the implementation
simple and makes use of `GetEntityURLs`, which performs as few queries as possible.
Further optimisation may be needed.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
We can now use the `can_view`, `can_edit`, and `can_delete` entitlements
with instance backups and snapshots. We should do this so that our checks
more accurately reflect the authorization model.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
The access handler was performing some logic to determine
the location of the storage volume for use in the access check.
This was based on whether the storage pool is remote, and if not,
the cluster member where the volume is located.

This commit removes that logic and adds a "location" field to
`storageVolumeDetails` so that it can be used in the handlers.
The logic for determining the location is modified to suit the call
site. It is only set when the pool is not remote.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
The storage volume snapshot and backup access handlers need to share
almost identical logic to the storage volume access handler. Including
getting the storage pool, understanding if the storage volume is located
on another cluster member, and so forth.

This commit parameterises the function so that it can be used by the
snapshot and backup entity types as well; creating and checking against
the correct URL when called.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
We can now check `can_view`, `can_edit`, and `can_delete` against
the backup/snapshot itself. We should do so to more accurately reflect
the authorization model.

Signed-off-by: Mark Laing <mark.laing@canonical.com>
Signed-off-by: Mark Laing <mark.laing@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug Documentation Documentation needs updating
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snapshots missing in used_by for custom volumes and storage pools on latest/edge LXD build
2 participants