En multiple choice

arc_challenge¶

The ARC dataset contains 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, and this is the Challenge Set.

References:

Hugging Face Dataset

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

local dataset_base_args = {
  path: 'allenai/ai2_arc',
  subset: 'ARC-Challenge',
  choices_templates: [
    '{% if choices.text | length > 0 %}{{ choices.text[0] }}{% endif %}',
    '{% if choices.text | length > 1 %}{{ choices.text[1] }}{% endif %}',
    '{% if choices.text | length > 2 %}{{ choices.text[2] }}{% endif %}',
    '{% if choices.text | length > 3 %}{{ choices.text[3] }}{% endif %}',
    '{% if choices.text | length > 4 %}{{ choices.text[4] }}{% endif %}',
  ],
  // answerKey is one of A, B, C, D, E, 1, 2, 3, 4
  answer_index_template: '{% if answerKey == "A" %}0{% elif answerKey == "B" %}1{% elif answerKey == "C" %}2{% elif answerKey == "D" %}3{% elif answerKey == "E" %}3{% else %}{{ answerKey | int - 1 }}{% endif %}',
  whitespace_before_choices: true,
};

{
  class_path: 'MultipleChoice',
  init_args: {
    eval_dataset: {
      class_path: 'HFMultipleChoiceDataset',
      init_args: dataset_base_args { split: 'test' },
    },
    few_shot_generator: {
      class_path: 'RandomFewShotGenerator',
      init_args: {
        dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'train' },
        },
        num_shots: 4,
      },
    },
    prompt_template: |||
      {% for item in few_shot_data %}
      Question: {{ item.question }}
      Answer:{{ item.choices[item.answer_index] }}
      {% endfor %}
      Question: {{ question }}
    ||| + 'Answer:',
  },
}

arc_easy¶

The ARC dataset contains 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, and this is the Easy Set.

References:

Hugging Face Dataset

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

local dataset_base_args = {
  path: 'allenai/ai2_arc',
  subset: 'ARC-Easy',
  choices_templates: [
    '{% if choices.text | length > 0 %}{{ choices.text[0] }}{% endif %}',
    '{% if choices.text | length > 1 %}{{ choices.text[1] }}{% endif %}',
    '{% if choices.text | length > 2 %}{{ choices.text[2] }}{% endif %}',
    '{% if choices.text | length > 3 %}{{ choices.text[3] }}{% endif %}',
    '{% if choices.text | length > 4 %}{{ choices.text[4] }}{% endif %}',
  ],
  // answerKey is one of A, B, C, D, E, 1, 2, 3, 4
  answer_index_template: '{% if answerKey == "A" %}0{% elif answerKey == "B" %}1{% elif answerKey == "C" %}2{% elif answerKey == "D" %}3{% elif answerKey == "E" %}3{% else %}{{ answerKey | int - 1 }}{% endif %}',
  whitespace_before_choices: true,
};

{
  class_path: 'MultipleChoice',
  init_args: {
    eval_dataset: {
      class_path: 'HFMultipleChoiceDataset',
      init_args: dataset_base_args { split: 'test' },
    },
    few_shot_generator: {
      class_path: 'RandomFewShotGenerator',
      init_args: {
        dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'train' },
        },
        num_shots: 4,
      },
    },
    prompt_template: |||
      {% for item in few_shot_data %}
      Question: {{ item.question }}
      Answer:{{ item.choices[item.answer_index] }}
      {% endfor %}
      Question: {{ question }}
    ||| + 'Answer:',
  },
}

commonsense_qa_mc¶

CommonsenseQA is a multiple-choice question answering dataset that requires different types of commonsense knowledge to predict the correct answers. This is a setup for multiple choice where the model chooses the correct answer based on the log-probabilities of the choices.

References:

Hugging Face Dataset

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

local dataset_base_args = {
  path: 'tau/commonsense_qa',
  choices_templates: ['{{ choices.text[0] }}', '{{ choices.text[1] }}', '{{ choices.text[2] }}', '{{ choices.text[3] }}', '{{ choices.text[4] }}'],
  answer_index_template: '{% if answerKey == "A" %}0{% elif answerKey == "B" %}1{% elif answerKey == "C" %}2{% elif answerKey == "D" %}3{% elif answerKey == "E" %}4{% endif %}',
  whitespace_before_choices: true,
};

{
  class_path: 'MultipleChoice',
  init_args: {
    eval_dataset: {
      class_path: 'HFMultipleChoiceDataset',
      init_args: dataset_base_args { split: 'validation' },
    },
    few_shot_generator: {
      class_path: 'RandomFewShotGenerator',
      init_args: {
        dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'train' },
        },
        num_shots: 4,
      },
    },
    prompt_template: |||
      {% for item in few_shot_data %}
      Question: {{ item.question }}
      Answer:{{ item.choices[item.answer_index] }}
      {% endfor %}
      Question: {{ question }}
    ||| + 'Answer:',
  },
}

hellaswag¶

Hellaswag is a dataset for physically situated commonsense reasoning. The dataset is constructed through adversarial filtering to make it challenging for models to perform well.

References:

Hugging Face Dataset

HellaSwag: Can a Machine Really Finish Your Sentence?

local dataset_base_args = {
  path: 'Rowan/hellaswag',
  choices_templates: ['{{ endings[0] }}', '{{ endings[1] }}', '{{ endings[2] }}', '{{ endings[3] }}'],
  answer_index_template: '{{ label }}',
  whitespace_before_choices: true,
};

{
  class_path: 'MultipleChoice',
  init_args: {
    eval_dataset: {
      class_path: 'HFMultipleChoiceDataset',
      init_args: dataset_base_args { split: 'validation' },
    },
    few_shot_generator: {
      class_path: 'RandomFewShotGenerator',
      init_args: {
        dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'train' },
        },
        num_shots: 4,
      },
    },
    prompt_template: |||
      {% for item in few_shot_data %}
      {{ item.ctx }}{{ item.choices[item.answer_index] }}
      {% endfor %}
    ||| + '{{ ctx }}',
  },
}

openbookqa¶

OpenBookQA contains questions that require multi-step reasoning, use of additional common and commonsense knowledge, and rich text comprehension.

References:

Hugging Face Dataset

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

local dataset_base_args = {
  path: 'allenai/openbookqa',
  subset: 'main',
  choices_templates: ['{{ choices.text[0] }}', '{{ choices.text[1] }}', '{{ choices.text[2] }}', '{{ choices.text[3] }}'],
  answer_index_template: '{% if answerKey == "A" %}0{% elif answerKey == "B" %}1{% elif answerKey == "C" %}2{% elif answerKey == "D" %}3{% endif %}',
  whitespace_before_choices: true,
};

{
  class_path: 'MultipleChoice',
  init_args: {
    eval_dataset: {
      class_path: 'HFMultipleChoiceDataset',
      init_args: dataset_base_args { split: 'test' },
    },
    few_shot_generator: {
      class_path: 'RandomFewShotGenerator',
      init_args: {
        dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'train' },
        },
        num_shots: 4,
      },
    },
    prompt_template: |||
      {% for item in few_shot_data %}
      Question: {{ item.question_stem }}
      Answer:{{ item.choices[item.answer_index] }}
      {% endfor %}
      Question: {{ question_stem }}
    ||| + 'Answer:',
  },
}

piqa¶

The PIQA dataset introduces the task of physical commonsense reasoning and a corresponding benchmark dataset

References:

Hugging Face Dataset

PIQA: Reasoning about Physical Commonsense in Natural Language

local dataset_base_args = {
  path: 'ybisk/piqa',
  choices_templates: ['{{ sol1 }}', '{{ sol2 }}'],
  answer_index_template: '{{ label }}',
  whitespace_before_choices: true,
  dataset_kwargs: { trust_remote_code: true },
};

{
  class_path: 'MultipleChoice',
  init_args: {
    eval_dataset: {
      class_path: 'HFMultipleChoiceDataset',
      init_args: dataset_base_args { split: 'validation' },
    },
    few_shot_generator: {
      class_path: 'RandomFewShotGenerator',
      init_args: {
        dataset: {
          class_path: 'HFMultipleChoiceDataset',
          init_args: dataset_base_args { split: 'train' },
        },
        num_shots: 4,
      },
    },
    prompt_template: |||
      {% for item in few_shot_data %}
      {{ item.goal }}{{ item.choices[item.answer_index] }}
      {% endfor %}
    ||| + '{{ goal }}',
  },
}

xwinograd_en¶

XWinograd is a multilingual collection of Winograd Schemas in six languages that can be used for evaluation of cross-lingual commonsense reasoning capabilities. This is an English subset of the dataset.

References:

Hugging Face Dataset

It’s All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning

{
  class_path: 'MultipleChoice',
  init_args: {
    eval_dataset: {
      class_path: 'HFMultipleChoiceDataset',
      init_args: {
        path: 'Muennighoff/xwinograd',
        subset: 'en',
        split: 'test',
        choices_templates: [
          '{{ option1 }}{{ sentence.split("_")[1] }}',
          '{{ option2 }}{{ sentence.split("_")[1] }}',
        ],
        answer_index_template: '{{ answer | int - 1 }}',
        input_templates: { context: '{{ sentence.split("_")[0] }}' },
      },
    },
    prompt_template: '{{ context }}',
  },
}