Skip to content

Code chat

mbpp_chat

Mostly Basic Python Problems (MBPP) is a dataset of crowd-sourced programming problems. This is a evaluation setup for chat LLMs.

References:

  • Hugging Face Dataset
  • Program Synthesis with Large Language Models
    local dataset_base_args = {
      class_path: 'HFChatDataset',
      init_args: {
        path: 'mbpp',
        subset: 'sanitized',
        input_template: std.stripChars(|||
          Generate a Python function that satisfies the following question and test cases.
          ## Question
          {{ prompt }}
          ## Test cases
          ```python
          {{ test_list | join('\n') }}
          ```
        |||, '\n'),
      },
    };
    
    {
      class_path: 'ChatResponse',
      init_args: {
        eval_dataset: dataset_base_args { init_args+: { split: 'test', reference_list_template: '{{ test_list | join("\n") }}' } },
        few_shot_generator: {
          class_path: 'RandomFewShotGenerator',
          init_args: {
            dataset: dataset_base_args { init_args+: { split: 'prompt', reference_template: '```python\n{{ code }}\n```' } },
            num_shots: 3,
          },
        },
        metrics: [
          { class_path: 'CodeEval', init_args: { lm_output_processor: { class_path: 'RegexExtractor', init_args: { pattern: '```python\n(.*?)\n```' } } } },
        ],
        gen_kwargs: { max_new_tokens: 512 },
        batch_size: 4,
      },
    }