En multiple choice
arc_challenge¶
The ARC dataset contains 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, and this is the Challenge Set.
References:
- Hugging Face Dataset
- Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
local dataset_base_args = { path: 'allenai/ai2_arc', subset: 'ARC-Challenge', choices_templates: [ '{% if choices.text | length > 0 %}{{ choices.text[0] }}{% endif %}', '{% if choices.text | length > 1 %}{{ choices.text[1] }}{% endif %}', '{% if choices.text | length > 2 %}{{ choices.text[2] }}{% endif %}', '{% if choices.text | length > 3 %}{{ choices.text[3] }}{% endif %}', '{% if choices.text | length > 4 %}{{ choices.text[4] }}{% endif %}', ], // answerKey is one of A, B, C, D, E, 1, 2, 3, 4 answer_index_template: '{% if answerKey == "A" %}0{% elif answerKey == "B" %}1{% elif answerKey == "C" %}2{% elif answerKey == "D" %}3{% elif answerKey == "E" %}3{% else %}{{ answerKey | int - 1 }}{% endif %}', whitespace_before_choices: true, }; { class_path: 'MultipleChoice', init_args: { eval_dataset: { class_path: 'HFMultipleChoiceDataset', init_args: dataset_base_args { split: 'test' }, }, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { dataset: { class_path: 'HFMultipleChoiceDataset', init_args: dataset_base_args { split: 'train' }, }, num_shots: 4, }, }, prompt_template: ||| {% for item in few_shot_data %} Question: {{ item.question }} Answer:{{ item.choices[item.answer_index] }} {% endfor %} Question: {{ question }} ||| + 'Answer:', }, }
arc_easy¶
The ARC dataset contains 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, and this is the Easy Set.
References:
- Hugging Face Dataset
- Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
local dataset_base_args = { path: 'allenai/ai2_arc', subset: 'ARC-Easy', choices_templates: [ '{% if choices.text | length > 0 %}{{ choices.text[0] }}{% endif %}', '{% if choices.text | length > 1 %}{{ choices.text[1] }}{% endif %}', '{% if choices.text | length > 2 %}{{ choices.text[2] }}{% endif %}', '{% if choices.text | length > 3 %}{{ choices.text[3] }}{% endif %}', '{% if choices.text | length > 4 %}{{ choices.text[4] }}{% endif %}', ], // answerKey is one of A, B, C, D, E, 1, 2, 3, 4 answer_index_template: '{% if answerKey == "A" %}0{% elif answerKey == "B" %}1{% elif answerKey == "C" %}2{% elif answerKey == "D" %}3{% elif answerKey == "E" %}3{% else %}{{ answerKey | int - 1 }}{% endif %}', whitespace_before_choices: true, }; { class_path: 'MultipleChoice', init_args: { eval_dataset: { class_path: 'HFMultipleChoiceDataset', init_args: dataset_base_args { split: 'test' }, }, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { dataset: { class_path: 'HFMultipleChoiceDataset', init_args: dataset_base_args { split: 'train' }, }, num_shots: 4, }, }, prompt_template: ||| {% for item in few_shot_data %} Question: {{ item.question }} Answer:{{ item.choices[item.answer_index] }} {% endfor %} Question: {{ question }} ||| + 'Answer:', }, }
commonsense_qa_mc¶
CommonsenseQA is a multiple-choice question answering dataset that requires different types of commonsense knowledge to predict the correct answers. This is a setup for multiple choice where the model chooses the correct answer based on the log-probabilities of the choices.
References:
- Hugging Face Dataset
- CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
local dataset_base_args = { path: 'tau/commonsense_qa', choices_templates: ['{{ choices.text[0] }}', '{{ choices.text[1] }}', '{{ choices.text[2] }}', '{{ choices.text[3] }}', '{{ choices.text[4] }}'], answer_index_template: '{% if answerKey == "A" %}0{% elif answerKey == "B" %}1{% elif answerKey == "C" %}2{% elif answerKey == "D" %}3{% elif answerKey == "E" %}4{% endif %}', whitespace_before_choices: true, }; { class_path: 'MultipleChoice', init_args: { eval_dataset: { class_path: 'HFMultipleChoiceDataset', init_args: dataset_base_args { split: 'validation' }, }, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { dataset: { class_path: 'HFMultipleChoiceDataset', init_args: dataset_base_args { split: 'train' }, }, num_shots: 4, }, }, prompt_template: ||| {% for item in few_shot_data %} Question: {{ item.question }} Answer:{{ item.choices[item.answer_index] }} {% endfor %} Question: {{ question }} ||| + 'Answer:', }, }
hellaswag¶
Hellaswag is a dataset for physically situated commonsense reasoning. The dataset is constructed through adversarial filtering to make it challenging for models to perform well.
References:
- Hugging Face Dataset
- HellaSwag: Can a Machine Really Finish Your Sentence?
local dataset_base_args = { path: 'Rowan/hellaswag', choices_templates: ['{{ endings[0] }}', '{{ endings[1] }}', '{{ endings[2] }}', '{{ endings[3] }}'], answer_index_template: '{{ label }}', whitespace_before_choices: true, }; { class_path: 'MultipleChoice', init_args: { eval_dataset: { class_path: 'HFMultipleChoiceDataset', init_args: dataset_base_args { split: 'validation' }, }, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { dataset: { class_path: 'HFMultipleChoiceDataset', init_args: dataset_base_args { split: 'train' }, }, num_shots: 4, }, }, prompt_template: ||| {% for item in few_shot_data %} {{ item.ctx }}{{ item.choices[item.answer_index] }} {% endfor %} ||| + '{{ ctx }}', }, }
openbookqa¶
OpenBookQA contains questions that require multi-step reasoning, use of additional common and commonsense knowledge, and rich text comprehension.
References:
- Hugging Face Dataset
- Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
local dataset_base_args = { path: 'allenai/openbookqa', subset: 'main', choices_templates: ['{{ choices.text[0] }}', '{{ choices.text[1] }}', '{{ choices.text[2] }}', '{{ choices.text[3] }}'], answer_index_template: '{% if answerKey == "A" %}0{% elif answerKey == "B" %}1{% elif answerKey == "C" %}2{% elif answerKey == "D" %}3{% endif %}', whitespace_before_choices: true, }; { class_path: 'MultipleChoice', init_args: { eval_dataset: { class_path: 'HFMultipleChoiceDataset', init_args: dataset_base_args { split: 'test' }, }, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { dataset: { class_path: 'HFMultipleChoiceDataset', init_args: dataset_base_args { split: 'train' }, }, num_shots: 4, }, }, prompt_template: ||| {% for item in few_shot_data %} Question: {{ item.question_stem }} Answer:{{ item.choices[item.answer_index] }} {% endfor %} Question: {{ question_stem }} ||| + 'Answer:', }, }
piqa¶
The PIQA dataset introduces the task of physical commonsense reasoning and a corresponding benchmark dataset
References:
- Hugging Face Dataset
- PIQA: Reasoning about Physical Commonsense in Natural Language
local dataset_base_args = { path: 'ybisk/piqa', choices_templates: ['{{ sol1 }}', '{{ sol2 }}'], answer_index_template: '{{ label }}', whitespace_before_choices: true, dataset_kwargs: { trust_remote_code: true }, }; { class_path: 'MultipleChoice', init_args: { eval_dataset: { class_path: 'HFMultipleChoiceDataset', init_args: dataset_base_args { split: 'validation' }, }, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { dataset: { class_path: 'HFMultipleChoiceDataset', init_args: dataset_base_args { split: 'train' }, }, num_shots: 4, }, }, prompt_template: ||| {% for item in few_shot_data %} {{ item.goal }}{{ item.choices[item.answer_index] }} {% endfor %} ||| + '{{ goal }}', }, }
xwinograd_en¶
XWinograd is a multilingual collection of Winograd Schemas in six languages that can be used for evaluation of cross-lingual commonsense reasoning capabilities. This is an English subset of the dataset.
References:
- Hugging Face Dataset
- It’s All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning
{ class_path: 'MultipleChoice', init_args: { eval_dataset: { class_path: 'HFMultipleChoiceDataset', init_args: { path: 'Muennighoff/xwinograd', subset: 'en', split: 'test', choices_templates: [ '{{ option1 }}{{ sentence.split("_")[1] }}', '{{ option2 }}{{ sentence.split("_")[1] }}', ], answer_index_template: '{{ answer | int - 1 }}', input_templates: { context: '{{ sentence.split("_")[0] }}' }, }, }, prompt_template: '{{ context }}', }, }