Skip to content
This repository has been archived by the owner on Oct 22, 2023. It is now read-only.

Training on large datasets >100k #121

Open
dustyny opened this issue Mar 13, 2023 · 0 comments
Open

Training on large datasets >100k #121

dustyny opened this issue Mar 13, 2023 · 0 comments

Comments

@dustyny
Copy link

dustyny commented Mar 13, 2023

sorry to submit this as an issue, there wasn’t an option to post a question.

I have a project where I would like to generate scientific data visualizations. I have 120,00 images with highly accurate text descriptions ranging from, 35 to 75 tokens in length. The images are Greyscale (but saved in RGB files), they are 512 pixels wide and 256 high and there is a strict structure to how the pixels relate to the analysis data.

I’m running a 24 code Intel i-9 and a Nvidia 4090 with 24 Gb of RAM.

For the final result when I generate an image, I don’t want it to have any aspects other than the scientific data. So training that causes the model to forget other classes is actually referred for this project. Given the number of images I do assume training is going to take a long time, so no concern there (though faster is better)..

What’s most important is these images are as accurate as possible. I’ll be generating synthetic data with them so the image has to be as high quality as possible.

is this something StableTurner can help with, if so any guidance on what I should be considering? Happy to do a video on the process and post to YouTube once I get it worked out

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant