En generation
babi¶
Synthetic question answering dataset with reasoning questions.
References:
- Hugging Face Dataset
- Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
local dataset_base_args = { class_path: 'HFGenerationDataset', init_args: { path: 'Muennighoff/babi', reference_template: '{{ answer }}', }, }; { class_path: 'Generation', init_args: { eval_dataset: dataset_base_args { init_args+: { split: 'validation' } }, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { dataset: dataset_base_args { init_args+: { split: 'train' } }, num_shots: 3, }, }, prompt_template: ||| {% for item in few_shot_data %} Passage: {{ item.passage | trim }} Question: {{ item.question }} Answer: "{{ item.references[0] }}" {% endfor %} Passage: {{ passage | trim }} Question: {{ question }} ||| + 'Answer: "', metrics: [ { class_path: 'CharF1' }, { class_path: 'ExactMatch' }, ], gen_kwargs: { max_new_tokens: 32, stop_sequences: ['"'] }, }, }
commonsense_qa¶
CommonsenseQA is a multiple-choice question answering dataset that requires different types of commonsense knowledge to predict the correct answers. This is a setup for generating answers based on the choices provided.
References:
- Hugging Face Dataset
- CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
local dataset_base_args = { class_path: 'HFGenerationDataset', init_args: { path: 'tau/commonsense_qa', reference_template: '{% set answer_index = choices.label.index(answerKey) %}{{ choices.text[answer_index] }}', }, }; { class_path: 'Generation', init_args: { eval_dataset: dataset_base_args { init_args+: { split: 'validation' } }, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { dataset: dataset_base_args { init_args+: { split: 'train' } }, num_shots: 2, }, }, prompt_template: ||| Choose the correct answer from the choices. {% for item in few_shot_data %} Choices: 0. "{{ item.choices.text[0] }}" 1. "{{ item.choices.text[1] }}" 2. "{{ item.choices.text[2] }}" 3. "{{ item.choices.text[3] }}" 4. "{{ item.choices.text[4] }}" Question: {{ item.question }} Answer: "{{ item.references[0] }}" {% endfor %} Choices: 0. "{{ choices.text[0] }}" 1. "{{ choices.text[1] }}" 2. "{{ choices.text[2] }}" 3. "{{ choices.text[3] }}" 4. "{{ choices.text[4] }}" Question: {{question}} ||| + 'Answer: "', metrics: [ { class_path: 'ExactMatch' }, ], gen_kwargs: { max_new_tokens: 40, stop_sequences: ['"'] }, }, }
gsm8k¶
GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
References:
- [Hugging Face Dataset](https://huggingface.co/datasets/gsm8k]
- Training Verifiers to Solve Math Word Problems
local dataset_base_args = { class_path: 'HFGenerationDataset', init_args: { path: 'gsm8k', subset: 'main', reference_template: '{{ answer | regex_replace("<<.*?>>", "") }}', }, }; { class_path: 'Generation', init_args: { eval_dataset: dataset_base_args { init_args+: { split: 'test' } }, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { dataset: dataset_base_args { init_args+: { split: 'train' } }, num_shots: 4, }, }, prompt_template: ||| {% for item in few_shot_data %} Q: {{ item.question }} A: {{ item.references[0] }} {% endfor %} Q: {{ question }} ||| + 'A:', metrics: [ { class_path: 'ExactMatch', init_args: { lm_output_processor: { class_path: 'RegexExtractor', init_args: { pattern: '-?[0-9.,]+' } }, reference_processor: { class_path: 'RegexExtractor', init_args: { pattern: '-?[0-9.,]+' } }, }, }, ], gen_kwargs: { max_new_tokens: 256, stop_sequences: ['Q:'] }, }, }
squad_v1¶
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage.
References:
- Hugging Face Dataset
- SQuAD: 100,000+ Questions for Machine Comprehension of Text
local dataset_base_args = { class_path: 'HFGenerationDataset', init_args: { path: 'rajpurkar/squad', reference_list_template: '{{ answers.text }}', }, }; { class_path: 'Generation', init_args: { eval_dataset: dataset_base_args { init_args+: { split: 'validation' } }, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { dataset: dataset_base_args { init_args+: { split: 'train' } }, num_shots: 2, }, }, prompt_template: ||| {% for item in few_shot_data %} Context: {{ item.context | trim }} Question: {{ item.question }} Answer: "{{ item.references[0] }}" {% endfor %} Context: {{ context | trim }} Question: {{ question }} ||| + 'Answer: "', metrics: [ { class_path: 'CharF1' }, { class_path: 'ExactMatch' }, ], gen_kwargs: { max_new_tokens: 32, stop_sequences: ['"'] }, }, }
trivia_qa¶
TriviaqQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaqQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.
References:
- Hugging Face Dataset
- TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
local dataset_base_args = { class_path: 'HFGenerationDataset', init_args: { path: 'trivia_qa', subset: 'rc.nocontext', reference_list_template: '{{ answer.aliases }}', }, }; { class_path: 'Generation', init_args: { eval_dataset: dataset_base_args { init_args+: { split: 'validation' } }, few_shot_generator: { class_path: 'RandomFewShotGenerator', init_args: { dataset: dataset_base_args { init_args+: { split: 'train' } }, num_shots: 0, }, }, prompt_template: ||| {% for item in few_shot_data %} Question: {{ item.question }} Answer: "{{ item.references[0] }}" {% endfor %} Question: {{ question }} ||| + 'Answer: "', metrics: [ { class_path: 'CharF1' }, { class_path: 'ExactMatch' }, ], gen_kwargs: { max_new_tokens: 32, stop_sequences: ['"'] }, }, }
twitter_sentiment¶
TSATC: Twitter Sentiment Analysis Training Corpus. This dataset is a preprocessed version of the original dataset. See the hugging face dataset page for more information.
References:
- Hugging Face Dataset
- Twitter Sentiment Analysis Training Corpus (Dataset)
local dataset_base_args = { class_path: 'HFGenerationDataset', init_args: { path: 'carblacac/twitter-sentiment-analysis', reference_template: "{{ ['Positive', 'Negative'][feeling] }}", }, }; { class_path: 'Generation', init_args: { eval_dataset: dataset_base_args { init_args+: { split: 'test' } }, few_shot_generator: { class_path: 'BalancedFewShotGenerator', init_args: { dataset: dataset_base_args { init_args+: { split: 'train' } }, num_shots: 4, }, }, prompt_template: ||| Classify the sentiment of the following tweet. {% for item in few_shot_data %} Tweet: {{ item.text }} Sentiment: `{{ item.references[0] }}` {% endfor %} Tweet: {{ text }} ||| + 'Sentiment: `', metrics: [ { class_path: 'ExactMatch' }, ], gen_kwargs: { max_new_tokens: 8, stop_sequences: ['`'] }, }, }