RewardModel
RewardModel ¶
Base class for reward models.
Source code in flexeval/core/reward_model/base.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
batch_judge
abstractmethod
¶
batch_judge(
batch_reward_bench_instances: list[RewardBenchInstance],
) -> tuple[list[bool], list[dict[str, Any]]]
Judge a batch of reward bench instances.
Parameters:
-
batch_reward_bench_instances
(list[RewardBenchInstance]
) –A list of tuples, each containing two model items.
Returns:
-
tuple[list[bool], list[dict[str, Any]]]
–tuple[list[bool], list[Any]]: A tuple with the following elements: - chosen_is_betters: Indicating whether each
chosen
item is considered better by the model. - judge_outputs: A list of outputs (rationale, score, etc....) from the model.
Source code in flexeval/core/reward_model/base.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
LogProbRewardModel ¶
A reward model that judges the quality of a response based on the log probability computed by the auto-regressive language model.
Source code in flexeval/core/reward_model/log_prob.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
__init__ ¶
__init__(language_model: LanguageModel) -> None
Source code in flexeval/core/reward_model/log_prob.py
16 17 |
|
batch_judge ¶
batch_judge(
batch_reward_bench_instances: list[RewardBenchInstance],
) -> tuple[list[bool], list[dict[str, Any]]]
Source code in flexeval/core/reward_model/log_prob.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
PairwiseJudgeRewardModel ¶
Pairwise judge using a chat language model to compare two model or human outputs. The reward model’s judgment is counted as correct only if it is order‑invariant: when given (A = chosen, B = rejected) it prefers A, and when the inputs are swapped (A = rejected, B = chosen) it prefers B.
Examples:
- ✅ Correct (order‑invariant):
- judge(prompt, A=chosen, B=rejected) → A
- judge(prompt, A=rejected, B=chosen) → B
- ❌ Incorrect (position bias; same answer regardless of order):
- judge(prompt, A=chosen, B=rejected) → A
- judge(prompt, A=rejected, B=chosen) → A
- ❌ Incorrect (both wrong):
- judge(prompt, A=chosen, B=rejected) → B
- judge(prompt, A=rejected, B=chosen) → A
Parameters:
-
language_model
(LanguageModel
) –The language model to use for pairwise comparison. This model is expected to output PairwiseChoice.
-
prompt_template
(PromptTemplate
) –The prompt template to embed the model outputs to be compared. Be sure to include {{prompt}}, {{answer_a}}, and {{answer_b}}.
-
system_message
(str | PromptTemplate | None
, default:None
) –The system message to prepend to the chat messages.
-
gen_kwargs
(dict[str, Any] | None
, default:None
) –Generation kwargs for the language model.
Source code in flexeval/core/reward_model/pairwise_judge_reward_model.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
|
__init__ ¶
__init__(
language_model: LanguageModel,
prompt_template: PromptTemplate,
system_message: str | PromptTemplate | None = None,
gen_kwargs: dict[str, Any] | None = None,
) -> None
Source code in flexeval/core/reward_model/pairwise_judge_reward_model.py
109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
batch_judge ¶
batch_judge(
batch_reward_bench_instances: list[RewardBenchInstance],
) -> tuple[list[bool], list[dict[str, Any]]]
Source code in flexeval/core/reward_model/pairwise_judge_reward_model.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
|
SequenceClassificationRewardModel ¶
Pairwise judge using a chat language model to compare two model or human outputs.
Source code in flexeval/core/reward_model/sequence_classification.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
__init__ ¶
__init__(
model: str,
model_kwargs: dict[str, Any] | None = None,
tokenizer: str | None = None,
tokenizer_kwargs: dict[str, Any] | None = None,
) -> None
Source code in flexeval/core/reward_model/sequence_classification.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
batch_judge ¶
batch_judge(
batch_reward_bench_instances: list[RewardBenchInstance],
) -> tuple[list[bool], list[dict[str, Any]]]
Source code in flexeval/core/reward_model/sequence_classification.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|