Skip to content

En multiple choice

arc_challenge

The ARC dataset contains 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, and this is the Challenge Set.

References:

  • Hugging Face Dataset
  • Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
    local dataset_base_args = {
      path: 'allenai/ai2_arc',
      subset: 'ARC-Challenge',
      choices_templates: [
        '{% if choices.text | length > 0 %}{{ choices.text[0] }}{% endif %}',
        '{% if choices.text | length > 1 %}{{ choices.text[1] }}{% endif %}',
        '{% if choices.text | length > 2 %}{{ choices.text[2] }}{% endif %}',
        '{% if choices.text | length > 3 %}{{ choices.text[3] }}{% endif %}',
        '{% if choices.text | length > 4 %}{{ choices.text[4] }}{% endif %}',
      ],
      // answerKey is one of A, B, C, D, E, 1, 2, 3, 4
      answer_index_template: '{% if answerKey == "A" %}0{% elif answerKey == "B" %}1{% elif answerKey == "C" %}2{% elif answerKey == "D" %}3{% elif answerKey == "E" %}3{% else %}{{ answerKey | int - 1 }}{% endif %}',
      whitespace_before_choices: true,
    };
    
    {
      class_path: 'MultipleChoice',
      init_args: {
        eval_dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'test' },
        },
        few_shot_generator: {
          class_path: 'RandomFewShotGenerator',
          init_args: {
            dataset: {
              class_path: 'HFMultipleChoiceDataset',
              init_args: dataset_base_args { split: 'train' },
            },
            num_shots: 4,
          },
        },
        prompt_template: |||
          {% for item in few_shot_data %}
          Question: {{ item.question }}
          Answer:{{ item.choices[item.answer_index] }}
          {% endfor %}
          Question: {{ question }}
        ||| + 'Answer:',
      },
    }
    

arc_easy

The ARC dataset contains 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, and this is the Easy Set.

References:

  • Hugging Face Dataset
  • Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
    local dataset_base_args = {
      path: 'allenai/ai2_arc',
      subset: 'ARC-Easy',
      choices_templates: [
        '{% if choices.text | length > 0 %}{{ choices.text[0] }}{% endif %}',
        '{% if choices.text | length > 1 %}{{ choices.text[1] }}{% endif %}',
        '{% if choices.text | length > 2 %}{{ choices.text[2] }}{% endif %}',
        '{% if choices.text | length > 3 %}{{ choices.text[3] }}{% endif %}',
        '{% if choices.text | length > 4 %}{{ choices.text[4] }}{% endif %}',
      ],
      // answerKey is one of A, B, C, D, E, 1, 2, 3, 4
      answer_index_template: '{% if answerKey == "A" %}0{% elif answerKey == "B" %}1{% elif answerKey == "C" %}2{% elif answerKey == "D" %}3{% elif answerKey == "E" %}3{% else %}{{ answerKey | int - 1 }}{% endif %}',
      whitespace_before_choices: true,
    };
    
    {
      class_path: 'MultipleChoice',
      init_args: {
        eval_dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'test' },
        },
        few_shot_generator: {
          class_path: 'RandomFewShotGenerator',
          init_args: {
            dataset: {
              class_path: 'HFMultipleChoiceDataset',
              init_args: dataset_base_args { split: 'train' },
            },
            num_shots: 4,
          },
        },
        prompt_template: |||
          {% for item in few_shot_data %}
          Question: {{ item.question }}
          Answer:{{ item.choices[item.answer_index] }}
          {% endfor %}
          Question: {{ question }}
        ||| + 'Answer:',
      },
    }
    

commonsense_qa_mc

CommonsenseQA is a multiple-choice question answering dataset that requires different types of commonsense knowledge to predict the correct answers. This is a setup for multiple choice where the model chooses the correct answer based on the log-probabilities of the choices.

References:

  • Hugging Face Dataset
  • CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
    local dataset_base_args = {
      path: 'tau/commonsense_qa',
      choices_templates: ['{{ choices.text[0] }}', '{{ choices.text[1] }}', '{{ choices.text[2] }}', '{{ choices.text[3] }}', '{{ choices.text[4] }}'],
      answer_index_template: '{% if answerKey == "A" %}0{% elif answerKey == "B" %}1{% elif answerKey == "C" %}2{% elif answerKey == "D" %}3{% elif answerKey == "E" %}4{% endif %}',
      whitespace_before_choices: true,
    };
    
    {
      class_path: 'MultipleChoice',
      init_args: {
        eval_dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'validation' },
        },
        few_shot_generator: {
          class_path: 'RandomFewShotGenerator',
          init_args: {
            dataset: {
              class_path: 'HFMultipleChoiceDataset',
              init_args: dataset_base_args { split: 'train' },
            },
            num_shots: 4,
          },
        },
        prompt_template: |||
          {% for item in few_shot_data %}
          Question: {{ item.question }}
          Answer:{{ item.choices[item.answer_index] }}
          {% endfor %}
          Question: {{ question }}
        ||| + 'Answer:',
      },
    }
    

hellaswag

Hellaswag is a dataset for physically situated commonsense reasoning. The dataset is constructed through adversarial filtering to make it challenging for models to perform well.

References:

  • Hugging Face Dataset
  • HellaSwag: Can a Machine Really Finish Your Sentence?
    local dataset_base_args = {
      path: 'Rowan/hellaswag',
      choices_templates: ['{{ endings[0] }}', '{{ endings[1] }}', '{{ endings[2] }}', '{{ endings[3] }}'],
      answer_index_template: '{{ label }}',
      whitespace_before_choices: true,
    };
    
    {
      class_path: 'MultipleChoice',
      init_args: {
        eval_dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'validation' },
        },
        few_shot_generator: {
          class_path: 'RandomFewShotGenerator',
          init_args: {
            dataset: {
              class_path: 'HFMultipleChoiceDataset',
              init_args: dataset_base_args { split: 'train' },
            },
            num_shots: 4,
          },
        },
        prompt_template: |||
          {% for item in few_shot_data %}
          {{ item.ctx }}{{ item.choices[item.answer_index] }}
          {% endfor %}
        ||| + '{{ ctx }}',
      },
    }
    

openbookqa

OpenBookQA contains questions that require multi-step reasoning, use of additional common and commonsense knowledge, and rich text comprehension.

References:

  • Hugging Face Dataset
  • Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
    local dataset_base_args = {
      path: 'allenai/openbookqa',
      subset: 'main',
      choices_templates: ['{{ choices.text[0] }}', '{{ choices.text[1] }}', '{{ choices.text[2] }}', '{{ choices.text[3] }}'],
      answer_index_template: '{% if answerKey == "A" %}0{% elif answerKey == "B" %}1{% elif answerKey == "C" %}2{% elif answerKey == "D" %}3{% endif %}',
      whitespace_before_choices: true,
    };
    
    {
      class_path: 'MultipleChoice',
      init_args: {
        eval_dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'test' },
        },
        few_shot_generator: {
          class_path: 'RandomFewShotGenerator',
          init_args: {
            dataset: {
              class_path: 'HFMultipleChoiceDataset',
              init_args: dataset_base_args { split: 'train' },
            },
            num_shots: 4,
          },
        },
        prompt_template: |||
          {% for item in few_shot_data %}
          Question: {{ item.question_stem }}
          Answer:{{ item.choices[item.answer_index] }}
          {% endfor %}
          Question: {{ question_stem }}
        ||| + 'Answer:',
      },
    }
    

piqa

The PIQA dataset introduces the task of physical commonsense reasoning and a corresponding benchmark dataset

References:

  • Hugging Face Dataset
  • PIQA: Reasoning about Physical Commonsense in Natural Language
    local dataset_base_args = {
      path: 'ybisk/piqa',
      choices_templates: ['{{ sol1 }}', '{{ sol2 }}'],
      answer_index_template: '{{ label }}',
      whitespace_before_choices: true,
      dataset_kwargs: { trust_remote_code: true },
    };
    
    {
      class_path: 'MultipleChoice',
      init_args: {
        eval_dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'validation' },
        },
        few_shot_generator: {
          class_path: 'RandomFewShotGenerator',
          init_args: {
            dataset: {
              class_path: 'HFMultipleChoiceDataset',
              init_args: dataset_base_args { split: 'train' },
            },
            num_shots: 4,
          },
        },
        prompt_template: |||
          {% for item in few_shot_data %}
          {{ item.goal }}{{ item.choices[item.answer_index] }}
          {% endfor %}
        ||| + '{{ goal }}',
      },
    }
    

xwinograd_en

XWinograd is a multilingual collection of Winograd Schemas in six languages that can be used for evaluation of cross-lingual commonsense reasoning capabilities. This is an English subset of the dataset.

References: