Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add object tagging - Moodle 310 #619

Open
wants to merge 17 commits into
base: MOODLE_310_STABLE
Choose a base branch
from

Conversation

matthewhilton
Copy link
Contributor

@matthewhilton matthewhilton commented Jul 22, 2024

Summary
Allows objectfs to send tags for an object to external stores. Initially only S3 is supported. location and environment tags are added.

For technical reasons, mimetype is also stored but this is stored in metadata instead of the tags (and thus is only for new objects, migration of metadata is not feasible)

The use case for this is to be able to setup conditional lifecycle rules to send orphaned files to deep archive and eventually delete them.

See #11 (comment) for the rationale behind using S3 tags instead of other methods such as file prefixes or separate buckets.

How it works
See TAGGING.md

Cherry picks

Related issues

@matthewhilton
Copy link
Contributor Author

matthewhilton commented Jul 22, 2024

I've been pondering the idea of removing the intermediary table from the equation here.

Originally the idea is we cache the tag values in an intermediary table, so the values can be easily queried on the moodle side for reporting. This allows you to have 'expensive' tags i.e. those that are computationally expensive to calculate, but also be able to query them very easily using the DB for reporting.

However if we want to only have 'cheap' tags, we might be able to get away with just aggregating their values in the report data table (i.e every time report is run, it recalculates the values and sums the counts together), and then instead of querying db for the cached tags, it would simply re-calculate it the moment the object is about to get updated.

The downside of this is the report generating gets a tad bit more complex and heavy because we need to constantly re-calculate the tag values (which can come from various random places in the db) as opposed to a single tag cache intermediary table.

Another downside is we wouldn't then be able to track how many have actually been tagged in the external store, whereas with an intermediary table we at least know if its in there, its value is 99% of the time the same in the external store.

I'm leaning towards the simpler solution which is just to have them in the intermediary table.

@matthewhilton matthewhilton force-pushed the wip-add-object-tagging branch 4 times, most recently from bcd9fa2 to 9327509 Compare July 29, 2024 05:38
@matthewhilton matthewhilton changed the base branch from MOODLE_310_STABLE to fix-codestandards July 29, 2024 05:39
@matthewhilton matthewhilton force-pushed the wip-add-object-tagging branch 3 times, most recently from f5928ea to 9c7e9bd Compare July 29, 2024 22:06
@matthewhilton
Copy link
Contributor Author

matthewhilton commented Jul 29, 2024

This is mostly ready to go, will give it a test for these scenarios:

Manual testing

  • New site install - enable tagging and confirm tags are uploaded when objects are replicated
  • Upgraded site with existing object - enable tagging and confirm tags are uploaded via migration and also when new objects are uploaded
  • Site without overwrite - confirm cannot overwrite objects and tags remain intact
  • Site with overwrite - confirm can overwrite and tags are overwritten and updated
  • Tagging enabled without permissions in S3 - Confirm exceptions are thrown correctly, and check API class alerts issue
  • Object report - generates object report with correct tag value counts.

@matthewhilton matthewhilton force-pushed the wip-add-object-tagging branch 3 times, most recently from b009b8c to b2d42f1 Compare July 31, 2024 21:57
Base automatically changed from fix-codestandards to MOODLE_310_STABLE August 1, 2024 02:19
@matthewhilton matthewhilton force-pushed the wip-add-object-tagging branch 4 times, most recently from 36e1894 to ee3f161 Compare August 2, 2024 01:21
@matthewhilton matthewhilton marked this pull request as ready for review August 2, 2024 01:27
@matthewhilton matthewhilton changed the title Add object tagging Add object tagging - Moodle 310 Aug 4, 2024
Copy link

@bwalkerl bwalkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @matthewhilton this is looking great from the code perspective. I've pointed out a few things I've noticed, but these are mostly just some extra considerations and discussion points. I've focused more on reviewing the code and haven't tested this end to end myself - so I'm partly relying on your test results for that.

My only real concern here is the perform at scale, mostly when back filling with the adhoc task, but this is an optional feature so I think it should be fine to roll out and test as needed.

classes/local/store/object_file_system.php Outdated Show resolved Hide resolved
classes/local/store/object_file_system.php Show resolved Hide resolved
classes/task/update_object_tags.php Outdated Show resolved Hide resolved
classes/local/store/s3/file_system.php Outdated Show resolved Hide resolved
classes/local/store/s3/client.php Outdated Show resolved Hide resolved
classes/local/tag/tag_manager.php Outdated Show resolved Hide resolved
db/install.xml Outdated Show resolved Hide resolved
classes/local/report/objectfs_report.php Show resolved Hide resolved
TAGGING.md Show resolved Hide resolved
classes/task/update_object_tags.php Outdated Show resolved Hide resolved
@matthewhilton
Copy link
Contributor Author

I've not addressed all the comments yet, but just an fyi pushed an update to add a bit more visibility to the migration progress, now there is a nice table in the admin settings (for a lack of a better place to put it), let me know if you see any tweaks that should be made here, or if there are better styling options available (I couldn't find any):

image

Additionally pushed a CI fix, since moodle plugin ci looks to have updated on my local and was throwing errors where it wasn't before. Note the Github CI is failing due to an upstream nvm issue which i've been tracking but is not fixed yet (so it will stay red for now)

Copy link

@bwalkerl bwalkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @matthewhilton - feels like you went above and beyond with the changes so far. I've left a couple of new comments on these.

settings.php Outdated Show resolved Hide resolved
classes/task/update_object_tags.php Outdated Show resolved Hide resolved
classes/local/store/object_file_system.php Outdated Show resolved Hide resolved
classes/task/update_object_tags.php Show resolved Hide resolved
@matthewhilton matthewhilton force-pushed the wip-add-object-tagging branch 2 times, most recently from d50151c to d0f301b Compare August 20, 2024 04:56
@matthewhilton
Copy link
Contributor Author

Just marking as draft temporarily, will be adding some features we discussed internally and then will re-open it.

@matthewhilton matthewhilton marked this pull request as draft August 23, 2024 02:12
@matthewhilton
Copy link
Contributor Author

matthewhilton commented Aug 23, 2024

Had a bit of time to work on object orphaning support.

The big issue i've found so far is that when an object is orphaned, we lose its mimetype. This is because when it goes to update the object tags after it is orphaned, the mimetype data in the mdl_files table no longer exists.

There are couple of ways to go about this:

  1. Not care about mimetype anymore. Do we really need it ?
  2. Push mimetype as metadata, instead of tags. This can only be done on upload (i.e. only new objects) but it is immutable so regardless of what we do with tags, the mimetype still remains. Theres no extra costs for metadata
  3. Instead of the current 'replace all' tags on update, we implement some complex kind of reconciliation mechanism and only update the tags necessary. This probably has loads of edge cases.

IMO I think option 2 is the most balanced approach

@matthewhilton matthewhilton force-pushed the wip-add-object-tagging branch 2 times, most recently from f770842 to 6ade842 Compare September 2, 2024 23:11
@matthewhilton
Copy link
Contributor Author

matthewhilton commented Sep 2, 2024

I've finished the support for tagging orphaned objects.

We now have a simple location tag which has values orphaned or active.

I split this up from the env tag since the env tag is now also used by objectfs to determine if it should overwrite tags (mainly used for lower envs that we don't want messing with prod shared objects)

I also had to move mimetype from the tags to the metadata for reasons mentioned above.

settings.php Outdated Show resolved Hide resolved
@matthewhilton matthewhilton mentioned this pull request Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants