Skip to content

Ja multiple choice

jcommonsenseqa_mc

JCommonsenseQA is a Japanese version of CommonsenseQA, which is a multiple-choice question answering dataset that requires commonsense reasoning ability. The dataset is built using crowdsourcing with seeds extracted from the knowledge base ConceptNet. This is a setup for multiple choice where the model chooses the correct answer based on the log-probabilities of the choices.

References:

  • Hugging Face Dataset
  • Original Repository
  • JGLUE: Japanese General Language Understanding Evaluation
  • JGLUE: 日本語言語理解ベンチマーク
    local dataset_base_args = {
      path: 'llm-book/JGLUE',
      subset: 'JCommonsenseQA',
      choices_templates: ['{{ choice0 }}', '{{ choice1 }}', '{{ choice2 }}', '{{ choice3 }}', '{{ choice4 }}'],
      answer_index_template: '{{ label }}',
    };
    
    {
      class_path: 'MultipleChoice',
      init_args: {
        eval_dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'validation' },
        },
        few_shot_generator: {
          class_path: 'RandomFewShotGenerator',
          init_args: {
            dataset: {
              class_path: 'HFMultipleChoiceDataset',
              init_args: dataset_base_args { split: 'train' },
            },
            num_shots: 0,
          },
        },
        prompt_template: |||
          {% for item in few_shot_data %}
          問題:{{ item.question }}
          回答:「{{ item.choices[item.answer_index] }}」
          {% endfor %}
          問題:{{question}}
        ||| + '回答:「',
      },
    }
    

xwinograd_ja

XWinograd is a multilingual collection of Winograd Schemas in six languages that can be used for evaluation of cross-lingual commonsense reasoning capabilities. This is an Japanese subset of the dataset.

References: