Replies: 1 comment
-
This is actually a really nice description of the next-word prediction task in pretraining! In practice, that's why it's so important to have a large and diverse dataset. But yes, your understanding is spot on there. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I do have a simple question:
If we train a llm with 10e100000 repetitions of the sentence: 'there is a cat on the sofa'
like explicitly sentence(x)=there(x1), is(x2),...sofa(x7), feeding the first six and let it predict the 7th
In the end, you have a pre-trained model which all it does is text completion
if you input "there is a cat on the" it will say sofa
but if you just say "there" it will output "is", cause it is the only thing it knows, furthermore if you input "is a cat" it will trigger the problem Y = P(y|cat, a, is) and this is modeled to be Y = "on" cause it is trained implicitly for this data when watching the corpus and changing all the rows of the W_keys, W_q, Wv, not onlly the last one.
Is my thought accurate or not?
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions