Ja chat
aio_chat¶
AI王 (AI king) is a Japanese quiz dataset developed for research and competition purposes. This is a evaluation setup for chat LLMs.
References:
- Hugging Face Dataset
- AI王 〜クイズAI日本一決定戦〜
- JAQKET: クイズを題材にした日本語 QA データセットの構築
local dataset_base_args = { class_path: 'HFChatDataset', init_args: { path: 'llm-book/aio', input_template: '{{ question }}', reference_list_template: '{{ answers }}', dataset_kwargs: { trust_remote_code: true }, }, }; { class_path: 'ChatResponse', init_args: { eval_dataset: dataset_base_args { init_args+: { split: 'validation' } }, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { dataset: dataset_base_args { init_args+: { split: 'train' } }, num_shots: 4, }, }, metrics: [ { class_path: 'CharF1', init_args: { lm_output_processor: { class_path: 'AIONormalizer' }, reference_processor: { class_path: 'AIONormalizer' }, }, }, { class_path: 'ExactMatch', init_args: { lm_output_processor: { class_path: 'AIONormalizer' }, reference_processor: { class_path: 'AIONormalizer' }, }, }, ], gen_kwargs: { max_new_tokens: 32 }, batch_size: 4, }, }
elyza_tasks_100¶
A dataset for evaluating instruction-tuned models developed by ELYZA Inc.
References:
- Hugging Face Dataset
- 公式ブログ
{ class_path: 'ChatResponse', init_args: { eval_dataset: { class_path: 'HFChatDataset', init_args: { path: 'elyza/ELYZA-tasks-100', split: 'test', input_template: '{{ input }}', reference_template: '{{ output }}', extra_info_templates: { eval_aspect: '{{ eval_aspect }}' }, }, }, metrics: [ { class_path: 'OutputLengthStats' }, ], gen_kwargs: { max_new_tokens: 1024 }, batch_size: 4, }, }
mgsm_ja_chat¶
Multilingual Grade School Math Benchmark (MGSM) is a benchmark of grade-school math problems. This is a Japanese subset of the benchmark. This is a evaluation setup for chat LLMs.
References:
- Hugging Face Dataset
- Language Models are Multilingual Chain-of-Thought Reasoners
local dataset_base_args = { class_path: 'HFChatDataset', init_args: { path: 'juletxara/mgsm', subset: 'ja', reference_template: '{{ answer }}', }, }; { class_path: 'ChatResponse', init_args: { eval_dataset: dataset_base_args { init_args+: { split: 'test', input_template: '問題: {{ question }}' } }, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { dataset: dataset_base_args { init_args+: { split: 'train', input_template: '{{ question }}' } }, num_shots: 4, }, }, metrics: [ { class_path: 'ExactMatch', init_args: { lm_output_processor: { class_path: 'RegexExtractor', init_args: { pattern: '-?[0-9.,]+' } } } }, ], gen_kwargs: { max_new_tokens: 256 }, }, }
mt-ja¶
Multi-Turn Benchmark for large language models in Japanese.
References:
- Data Source
{ class_path: 'ChatResponse', init_args: { eval_dataset: { class_path: 'ChatbotBench', init_args: { path_or_name: 'mt-ja', ref_path_or_name: 'mt-ja-ref-gpt4', }, }, metrics: [ { class_path: 'OutputLengthStats' }, ], gen_kwargs: { max_new_tokens: 1024 }, batch_size: 4, }, }
rakuda-v2-ja¶
Rakuda benckmark concists of a set of 40 questions in Japanese about Japanese-specific topics designed to evaluate the capabilities of AI Assistants in Japanese.
References:
- Original Repository
- Hugging Face Dataset
{ class_path: 'ChatResponse', init_args: { eval_dataset: { class_path: 'ChatbotBench', init_args: { path_or_name: 'rakuda-v2-ja', }, }, metrics: [ { class_path: 'OutputLengthStats' }, ], gen_kwargs: { max_new_tokens: 1024 }, batch_size: 4, }, }
vicuna-ja¶
Vicuna Benchmark for large language models in Japanese.
References:
- Data Source
{ class_path: 'ChatResponse', init_args: { eval_dataset: { class_path: 'ChatbotBench', init_args: { path_or_name: 'vicuna-ja', ref_path_or_name: 'vicuna-ja-ref-gpt4', }, }, metrics: [ { class_path: 'OutputLengthStats' }, ], gen_kwargs: { max_new_tokens: 1024 }, batch_size: 4, }, }