Toy example: Train on a dataset #416

Iosifts · 2024-10-24T15:33:52Z

Iosifts
Oct 24, 2024

I do have a simple question:

If we train a llm with 10e100000 repetitions of the sentence: 'there is a cat on the sofa'
like explicitly sentence(x)=there(x1), is(x2),...sofa(x7), feeding the first six and let it predict the 7th
In the end, you have a pre-trained model which all it does is text completion
if you input "there is a cat on the" it will say sofa
but if you just say "there" it will output "is", cause it is the only thing it knows, furthermore if you input "is a cat" it will trigger the problem Y = P(y|cat, a, is) and this is modeled to be Y = "on" cause it is trained implicitly for this data when watching the corpus and changing all the rows of the W_keys, W_q, Wv, not onlly the last one.

Is my thought accurate or not?
Thanks in advance!

rasbt · 2024-10-24T23:27:18Z

rasbt
Oct 24, 2024
Maintainer

This is actually a really nice description of the next-word prediction task in pretraining! In practice, that's why it's so important to have a large and diverse dataset. But yes, your understanding is spot on there.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Toy example: Train on a dataset #416

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Toy example: Train on a dataset #416

Iosifts Oct 24, 2024

Replies: 1 comment

rasbt Oct 24, 2024 Maintainer

Iosifts
Oct 24, 2024

rasbt
Oct 24, 2024
Maintainer