PairwiseJudge
PairwiseJudge ¶
Judge which model is better given two items.
The output is a tuple of the winner and the rationale.
Source code in flexeval/core/pairwise_comparison/judge/base.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
batch_judge
abstractmethod
¶
batch_judge(
batch_model_items: list[
tuple[dict[str, Any], dict[str, Any]]
],
) -> list[tuple[Winner, str]]
Judge which model is better given a batch of item pairs.
Parameters:
-
batch_model_items
(list[tuple[dict[str, Any], dict[str, Any]]]
) –A list of tuples, each containing two model items.
Source code in flexeval/core/pairwise_comparison/judge/base.py
28 29 30 31 32 33 34 35 36 37 38 |
|
Winner ¶
Enum class to indicate the winner of a pairwise comparison.
Source code in flexeval/core/pairwise_comparison/judge/base.py
8 9 10 11 12 13 14 15 16 17 18 19 |
|
__str__ ¶
__str__() -> str
Source code in flexeval/core/pairwise_comparison/judge/base.py
17 18 19 |
|
ChatLLMPairwiseJudge ¶
Pairwise judge using a chat language model to compare two model outputs.
Parameters:
-
language_model
(LanguageModel
) –The language model to use for pairwise comparison.
-
prompt_template
(PromptTemplate
) –The prompt template to embed the model outputs to be compared.
-
system_message
(str | PromptTemplate | None
, default:None
) –The system message to prepend to the chat messages.
Source code in flexeval/core/pairwise_comparison/judge/llm_judge.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
__init__ ¶
__init__(
language_model: LanguageModel,
prompt_template: PromptTemplate,
system_message: str | PromptTemplate | None = None,
) -> None
Source code in flexeval/core/pairwise_comparison/judge/llm_judge.py
24 25 26 27 28 29 30 31 32 |
|
batch_judge ¶
batch_judge(
batch_model_items: list[
tuple[dict[str, Any], dict[str, Any]]
],
) -> list[tuple[Winner, str]]
Source code in flexeval/core/pairwise_comparison/judge/llm_judge.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|