Design Principle¶

flexeval is designed according to the following principles:

Flexibility: flexeval should be flexible in terms of the evaluation setup and the language model to be evaluated.
Modularity: The core components of flexeval should be easily extensible and replaceable.
Clarity: The results of evaluation should be clear and easy to understand its configuration.
Reproducibility: flexeval should be reproducible, with the ability to save and load configurations and results.

To achieve flexibility and modularity, the core logic is implemented with abstract interfaces, and the concrete implementations are provided when running each CLI command.

Thanks to jsonargparse, we can transparently specify the configuration of every component either via CLI arguments or jsonnet config files. Thus, when you want to use your own module, all you have to do is implement a concrete class inheriting the right interface and specify it in the configuration, without modifying the existing code.

To achieve clarity and reproducibility, flexeval saves the configuration and the evaluation results in a directory specified by --save_dir. The resulting config.json file contains everything needed to replicate the evaluation, configuration of all modules, the version of flexeval and the installed packages.

It is often a case that a small preprocessing in the data affects the evaluation results significantly. We would like to the config file tells us what preprocessing is done without we need to dig into the code. Thus we recommend loading datasets using a generic class such as HFGenerationDataset or JsonlGenerationDataset and specifying a preprocessing using their parameters or Jinja2 templates in the configuration file.