Skip to content

En chat

mt-en

Multi-Turn Benchmark for large language models.

References:

  • Data Source
  • Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
    {
      class_path: 'ChatResponse',
      init_args: {
        eval_dataset: {
          class_path: 'ChatbotBench',
          init_args: {
            path_or_name: 'mt-en',
            ref_path_or_name: 'mt-en-ref-gpt4',
          },
        },
        metrics: [
          { class_path: 'OutputLengthStats' },
        ],
        gen_kwargs: { max_new_tokens: 1024 },
        batch_size: 4,
      },
    }
    

vicuna-en

Vicuna Benchmark for large language models.

References:

  • Data Source
    {
      class_path: 'ChatResponse',
      init_args: {
        eval_dataset: {
          class_path: 'ChatbotBench',
          init_args: {
            path_or_name: 'vicuna-en',
            ref_path_or_name: 'vicuna-en-ref-gpt4',
          },
        },
        metrics: [
          { class_path: 'OutputLengthStats' },
        ],
        gen_kwargs: { max_new_tokens: 1024 },
        batch_size: 4,
      },
    }