Skip to content

Ja chat

aio_chat

AI王 (AI king) is a Japanese quiz dataset developed for research and competition purposes. This is a evaluation setup for chat LLMs.

References:

  • Hugging Face Dataset
  • AI王 〜クイズAI日本一決定戦〜
  • JAQKET: クイズを題材にした日本語 QA データセットの構築
    local dataset_base_args = {
      class_path: 'HFChatDataset',
      init_args: {
        path: 'llm-book/aio',
        input_template: '{{ question }}',
        reference_list_template: '{{ answers }}',
        dataset_kwargs: { trust_remote_code: true },
      },
    };
    
    {
      class_path: 'ChatResponse',
      init_args: {
        eval_dataset: dataset_base_args { init_args+: { split: 'validation' } },
        few_shot_generator: {
          class_path: 'RandomFewShotGenerator',
          init_args: {
            dataset: dataset_base_args { init_args+: { split: 'train' } },
            num_shots: 4,
          },
        },
        metrics: [
          {
            class_path: 'CharF1',
            init_args: {
              lm_output_processor: { class_path: 'AIONormalizer' },
              reference_processor: { class_path: 'AIONormalizer' },
            },
          },
          {
            class_path: 'ExactMatch',
            init_args: {
              lm_output_processor: { class_path: 'AIONormalizer' },
              reference_processor: { class_path: 'AIONormalizer' },
            },
          },
        ],
        gen_kwargs: { max_new_tokens: 32 },
        batch_size: 4,
      },
    }
    

elyza_tasks_100

A dataset for evaluating instruction-tuned models developed by ELYZA Inc.

References:

  • Hugging Face Dataset
  • 公式ブログ
    {
      class_path: 'ChatResponse',
      init_args: {
        eval_dataset: {
          class_path: 'HFChatDataset',
          init_args: {
            path: 'elyza/ELYZA-tasks-100',
            split: 'test',
            input_template: '{{ input }}',
            reference_template: '{{ output }}',
            extra_info_templates: { eval_aspect: '{{ eval_aspect }}' },
          },
        },
        metrics: [
          { class_path: 'OutputLengthStats' },
        ],
        gen_kwargs: { max_new_tokens: 1024 },
        batch_size: 4,
      },
    }
    

mgsm_ja_chat

Multilingual Grade School Math Benchmark (MGSM) is a benchmark of grade-school math problems. This is a Japanese subset of the benchmark. This is a evaluation setup for chat LLMs.

References:

  • Hugging Face Dataset
  • Language Models are Multilingual Chain-of-Thought Reasoners
    local dataset_base_args = {
      class_path: 'HFChatDataset',
      init_args: {
        path: 'juletxara/mgsm',
        subset: 'ja',
        reference_template: '{{ answer }}',
      },
    };
    
    {
      class_path: 'ChatResponse',
      init_args: {
        eval_dataset: dataset_base_args { init_args+: { split: 'test', input_template: '問題: {{ question }}' } },
        few_shot_generator: {
          class_path: 'RandomFewShotGenerator',
          init_args: {
            dataset: dataset_base_args { init_args+: { split: 'train', input_template: '{{ question }}' } },
            num_shots: 4,
          },
        },
        metrics: [
          { class_path: 'ExactMatch', init_args: { lm_output_processor: { class_path: 'RegexExtractor', init_args: { pattern: '-?[0-9.,]+' } } } },
        ],
        gen_kwargs: { max_new_tokens: 256 },
      },
    }
    

mt-ja

Multi-Turn Benchmark for large language models in Japanese.

References:

  • Data Source
    {
      class_path: 'ChatResponse',
      init_args: {
        eval_dataset: {
          class_path: 'ChatbotBench',
          init_args: {
            path_or_name: 'mt-ja',
            ref_path_or_name: 'mt-ja-ref-gpt4',
          },
        },
        metrics: [
          { class_path: 'OutputLengthStats' },
        ],
        gen_kwargs: { max_new_tokens: 1024 },
        batch_size: 4,
      },
    }
    

rakuda-v2-ja

Rakuda benckmark concists of a set of 40 questions in Japanese about Japanese-specific topics designed to evaluate the capabilities of AI Assistants in Japanese.

References:

  • Original Repository
  • Hugging Face Dataset
    {
      class_path: 'ChatResponse',
      init_args: {
        eval_dataset: {
          class_path: 'ChatbotBench',
          init_args: {
            path_or_name: 'rakuda-v2-ja',
          },
        },
        metrics: [
          { class_path: 'OutputLengthStats' },
        ],
        gen_kwargs: { max_new_tokens: 1024 },
        batch_size: 4,
      },
    }
    

vicuna-ja

Vicuna Benchmark for large language models in Japanese.

References:

  • Data Source
    {
      class_path: 'ChatResponse',
      init_args: {
        eval_dataset: {
          class_path: 'ChatbotBench',
          init_args: {
            path_or_name: 'vicuna-ja',
            ref_path_or_name: 'vicuna-ja-ref-gpt4',
          },
        },
        metrics: [
          { class_path: 'OutputLengthStats' },
        ],
        gen_kwargs: { max_new_tokens: 1024 },
        batch_size: 4,
      },
    }