Ja multiple choice

jcommonsenseqa_mc¶

JCommonsenseQA is a Japanese version of CommonsenseQA, which is a multiple-choice question answering dataset that requires commonsense reasoning ability. The dataset is built using crowdsourcing with seeds extracted from the knowledge base ConceptNet. This is a setup for multiple choice where the model chooses the correct answer based on the log-probabilities of the choices.

References:

Hugging Face Dataset
Original Repository
JGLUE: Japanese General Language Understanding Evaluation

JGLUE: 日本語言語理解ベンチマーク

local dataset_base_args = {
  path: 'llm-book/JGLUE',
  subset: 'JCommonsenseQA',
  choices_templates: ['{{ choice0 }}', '{{ choice1 }}', '{{ choice2 }}', '{{ choice3 }}', '{{ choice4 }}'],
  answer_index_template: '{{ label }}',
};

{
  class_path: 'MultipleChoice',
  init_args: {
    eval_dataset: {
      class_path: 'HFMultipleChoiceDataset',
      init_args: dataset_base_args { split: 'validation' },
    },
    few_shot_generator: {
      class_path: 'RandomFewShotGenerator',
      init_args: {
        dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'train' },
        },
        num_shots: 0,
      },
    },
    prompt_template: |||
      {% for item in few_shot_data %}
      問題：{{ item.question }}
      回答：「{{ item.choices[item.answer_index] }}」
      {% endfor %}
      問題：{{question}}
    ||| + '回答：「',
  },
}

xwinograd_ja¶

XWinograd is a multilingual collection of Winograd Schemas in six languages that can be used for evaluation of cross-lingual commonsense reasoning capabilities. This is an Japanese subset of the dataset.

References:

Hugging Face Dataset

It’s All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning

{
  class_path: 'MultipleChoice',
  init_args: {
    eval_dataset: {
      class_path: 'HFMultipleChoiceDataset',
      init_args: {
        path: 'Muennighoff/xwinograd',
        subset: 'jp',
        split: 'test',
        choices_templates: [
          '{{ option1 }}{{ sentence.split("_")[1] }}',
          '{{ option2 }}{{ sentence.split("_")[1] }}',
        ],
        answer_index_template: '{{ answer | int - 1 }}',
        input_templates: { context: '{{ sentence.split("_")[0] }}' },
      },
    },
    prompt_template: '{{ context }}',
  },
}