In dataset Wmt17 zh-en,The result is not good as wmt14 en-de #108

QiyaoHuang · 2021-10-13T13:40:41Z

When I use the dataset wmt14en-de ,I got the bleu score:24.5,which is just like the paper's score,
but when I use the same way to train the model with Wmt17 zh-en,the bleu score is only 7.0.

the dataset Wmt17 zh-en:
http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz",
["training/news-commentary-v12.zh-en.en",
"training/news-commentary-v12.zh-en.zh"]]]
why
how can I do ?

aseaday · 2021-11-12T08:48:08Z

How do you tokenize the Chinese corpus?

QiyaoHuang · 2021-11-17T04:42:04Z

你如何标记中文语料库？
使用本项目模板例子里提供的tokenize方式，和我在wmt14en-de上做法相同

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In dataset Wmt17 zh-en,The result is not good as wmt14 en-de #108

In dataset Wmt17 zh-en,The result is not good as wmt14 en-de #108

QiyaoHuang commented Oct 13, 2021

aseaday commented Nov 12, 2021

QiyaoHuang commented Nov 17, 2021

In dataset Wmt17 zh-en,The result is not good as wmt14 en-de #108

In dataset Wmt17 zh-en,The result is not good as wmt14 en-de #108

Comments

QiyaoHuang commented Oct 13, 2021

aseaday commented Nov 12, 2021

QiyaoHuang commented Nov 17, 2021