Skip to content

Is there such a thing as too much training data? #9377

Discussion options

You must be logged in to vote

Hi @DarkSoliditi , there's no fixed rule in the amount of samples you're going to train. However, you still need to note the following:

  • The size of your training set can affect your training time.
  • Be careful with data imbalance re: your labels. You don't want your model to skew to one particular address
  • Quality of your labeled samples. Are they clean and informative enough?

My suggestion is to try your 1000 examples first, check your data distribution (if they lean towards a particular label or not), apply the usual cross-validation techniques, and decide if you will still need more samples.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@DarkSoliditi
Comment options

@ljvmiranda921
Comment options

Answer selected by DarkSoliditi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
training Training and updating models
2 participants