Translation chat
wmt20_en_ja_chat¶
This dataset is created as a test set for the WMT20 shared task on news translation. This is English to Japanese translation. This is a evaluation setup for chat LLMs.
References:
- Data Source
- 2020 Fifth Conference on Machine Translation (WMT20)
local dataset = { class_path: 'SacreBleuChatDataset', init_args: { name: 'wmt20', langpair: 'en-ja' }, }; { class_path: 'ChatResponse', init_args: { eval_dataset: dataset, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { // Use the eval dataset for few-shot data, // but `RandomFewShotGenerator` will avoid using the same few-shot isntances as the input. dataset: dataset, num_shots: 4, }, }, metrics: [ { class_path: 'BLEU', init_args: { tokenize_option: 'ja-mecab' } }, ], gen_kwargs: { max_new_tokens: 128, stop_sequences: ['`'] }, batch_size: 4, }, }
wmt20_ja_en_chat¶
This dataset is created as a test set for the WMT20 shared task on news translation. This is Japanese to English translation. This is a evaluation setup for chat LLMs.
References:
- Data Source
- 2020 Fifth Conference on Machine Translation (WMT20)
local dataset = { class_path: 'SacreBleuChatDataset', init_args: { name: 'wmt20', langpair: 'ja-en' }, }; { class_path: 'ChatResponse', init_args: { eval_dataset: dataset, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { // Use the eval dataset for few-shot data, // but `RandomFewShotGenerator` will avoid using the same few-shot isntances as the input. dataset: dataset, num_shots: 4, }, }, metrics: [ { class_path: 'BLEU', init_args: { tokenize_option: 'intl' } }, ], gen_kwargs: { max_new_tokens: 128, stop_sequences: ['`'] }, batch_size: 4, }, }