How to¶ Configure Few-Shot Examples Evaluate with LLM Judges Implement Your Own Module Evaluate Multimodal Benchmarks