ChatDataset
ChatDataset ¶
A dataset holding ChatInstance
.
Source code in flexeval/core/chat_dataset/base.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
|
__len__
abstractmethod
¶
__len__() -> int
Returns the number of chat instances in the dataset.
Source code in flexeval/core/chat_dataset/base.py
98 99 100 101 102 103 |
|
__getitem__
abstractmethod
¶
__getitem__(i: int) -> ChatInstance
Returns the i-th chat instance.
Source code in flexeval/core/chat_dataset/base.py
105 106 107 108 109 110 |
|
__repr__ ¶
__repr__() -> str
Source code in flexeval/core/chat_dataset/base.py
112 113 |
|
ChatInstance
dataclass
¶
A dataclass representing a single chat that will be fed to a chat language model.
Source code in flexeval/core/chat_dataset/base.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
|
messages
instance-attribute
¶
messages: list[dict[str, Any]]
A list of messages in the chat. The format of messages typically follows OpenAI's Chat Completions API.
[
{
"role": "assistant",
"content": "Hello! How can I help you today?"
},
{
"role": "user",
"content": "I'd like to book a flight to Paris."
}
]
Tool-Calling message must follow the same format as the OpenAI ChatCompletion API. https://platform.openai.com/docs/guides/function-calling?api-mode=chat#defining-functions
{
"role": "assistant",
"content": "content", # `None` is also allowed if `tool_calls` exists.
"tool_calls": [
{
"id": "dummy1",
"function": {
"name": "search_web",
"arguments": "{"query": "flexeval developer"}" # Note that this is a json string, not a dictionary.
}
}
]
}
The results from tools should be represented as messages with the role "tool": ``` { "role": "tool", "tool_call_id": "dummy1", # Optional, models on OpenAI APIs requires this field. "name": "search_web", # Optional, Some HuggingFace models require this field. "content": "[{"title": "sbintuitions/flexeval: Flexible evaluation tool...", "description": "..."}]", }
tools
class-attribute
instance-attribute
¶
tools: list[dict[str, Any]] | None = None
A list of definitions of tools in the chat. The format of tools typically follows OpenAI's Chat Completion API Currently, only function calling (tools with type="function") is supported.
references
class-attribute
instance-attribute
¶
references: list[str] = field(default_factory=list)
A list of reference responses to the user's last message. The model's response will be evaluated against these references.
extra_info
class-attribute
instance-attribute
¶
extra_info: dict[str, Any] = field(default_factory=dict)
Extra information that can be used by passing to Metric
.
inputs
property
¶
inputs: list[dict[str, str]]
Alias for messages
.
This is used in FewShotGenerator
so that it can access the inputs with the same attribute name as
GenerationInstance
and MultipleChoiceInstance
.
__init__ ¶
__init__(
messages: list[dict[str, Any]],
tools: list[dict[str, Any]] | None = None,
references: list[str] = list(),
extra_info: dict[str, Any] = dict(),
) -> None
__post_init__ ¶
__post_init__() -> None
Source code in flexeval/core/chat_dataset/base.py
75 76 77 78 79 80 81 82 83 |
|
HFChatDataset ¶
Load ChatInstances from a Hugging Face dataset.
Parameters:
-
path
(str
) –The path to the Hugging Face dataset.
-
split
(str
) –The split of the dataset.
-
input_template
(str
) –A Jinja2 template for the user input.
-
subset
(str | None
, default:None
) –The subset of the dataset.
-
dataset_kwargs
(dict[str, Any] | None
, default:None
) –The keyword arguments to pass to the Hugging Face dataset.
Source code in flexeval/core/chat_dataset/template_based.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|
__init__ ¶
__init__(
path: str,
split: str,
input_template: str,
subset: str | None = None,
dataset_kwargs: dict[str, Any] | None = None,
reference_template: str | None = None,
reference_list_template: str | None = None,
extra_info_templates: dict[str, str] | None = None,
system_message_template: str | None = None,
tools: list[dict[str, Any]] | None = None,
data_range: tuple[int, int] | None = None,
keep_conditions: dict[str, str] | None = None,
remove_conditions: dict[str, str] | None = None,
) -> None
Source code in flexeval/core/chat_dataset/template_based.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|
JsonlChatDataset ¶
Load ChatInstances from a JSONL file.
Parameters:
-
path
(str
) –The path to the JSONL file.
Source code in flexeval/core/chat_dataset/template_based.py
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|
__init__ ¶
__init__(
path: str,
input_template: str,
reference_template: str | None = None,
reference_list_template: str | None = None,
extra_info_templates: dict[str, str] | None = None,
system_message_template: str | None = None,
tools: list[dict[str, Any]] | None = None,
data_range: tuple[int, int] | None = None,
keep_conditions: dict[str, str] | None = None,
remove_conditions: dict[str, str] | None = None,
) -> None
Source code in flexeval/core/chat_dataset/template_based.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|
TemplateChatDataset ¶
This class only supports single-turn chat.
Parameters:
-
items
(list[dict[str, Any]]
) –A list of items in a dict format. The "tools" key for each item can contain the list of function definitions. They should be in JSON Schema format as in the OpenAI Chat Completion API. https://platform.openai.com/docs/guides/function-calling?api-mode=chat#defining-functions
-
input_template
(str
) –A Jinja2 template for the user input.
-
reference_template
(str | None
, default:None
) –Specify the Jinja2 template to render the reference string if the dataset has a single reference.
-
reference_list_template
(str | None
, default:None
) –Specify the Jinja2 template to render a list of reference strings if the dataset has multiple references.
-
extra_info_templates
(dict[str, str] | None
, default:None
) –A dictionary of Jinja2 templates for extra information.
-
system_message_template
(str | None
, default:None
) –A Jinja2 template for the system message.
-
tools
(list[dict[str, Any]] | None
, default:None
) –Default tools to use for all chat instances. Individual items can override this by including their own "tools" key. Typically in JSON Schema format as in the OpenAI Chat Completion API for function calling.
-
data_range
(tuple[int, int] | None
, default:None
) –The range of data to use.
-
keep_conditions
(dict[str, str] | None
, default:None
) –A dictionary to indicate the condition to filter certain items. The key is a Jinja2 template string to embed the item into a string, and the value is the value to keep.
-
remove_conditions
(dict[str, str] | None
, default:None
) –A dictionary to indicate the condition to remove certain items. The key is a Jinja2 template string to embed the item into a string, and the value is the value to remove.
Source code in flexeval/core/chat_dataset/template_based.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
reference_template
instance-attribute
¶
reference_template = (
from_string(reference_template)
if reference_template
else None
)
reference_list_template
instance-attribute
¶
reference_list_template = (
from_string(reference_list_template)
if reference_list_template
else None
)
__init__ ¶
__init__(
items: list[dict[str, Any]],
input_template: str,
reference_template: str | None = None,
reference_list_template: str | None = None,
extra_info_templates: dict[str, str] | None = None,
system_message_template: str | None = None,
tools: list[dict[str, Any]] | None = None,
data_range: tuple[int, int] | None = None,
keep_conditions: dict[str, str] | None = None,
remove_conditions: dict[str, str] | None = None,
) -> None
Source code in flexeval/core/chat_dataset/template_based.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
__len__ ¶
__len__() -> int
Source code in flexeval/core/chat_dataset/template_based.py
90 91 |
|
__getitem__ ¶
__getitem__(i: int) -> ChatInstance
Source code in flexeval/core/chat_dataset/template_based.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
ChatbotBench ¶
This class loads data with the jsonl format used in chat evaluation benchmarks such as MT-Bench (Multi-turn Benchmark) or Vicuna QA Benchmark.
Example of a line from a jsonl file
{ "question_id": 00, "category": "writing", "turns": [ "Compose an engaging travel blog post about a recent trip to Hawaii.", "Rewrite your previous response. Start every sentence with the letter A." ] # 'tools' key is optional. # It should be in the same format as FunctionCalling in the OpenAI ChatCompletion API. # https://platform.openai.com/docs/guides/function-calling?api-mode=chat#defining-functions "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get current temperature for a given location.", "parameters": { "type": "object", "properties": { "location": {"type": "string", "description": "City and country e.g. Bogotá, Colombia"}, }, "required": ["location"], "additionalProperties": False}, "strict": True }, }, ], # 'system_message' key is optional. # If set, it will be inserted in the first turn as a system prompt "system_message": "You are a helpful assistant." }
Source code in flexeval/core/chat_dataset/chatbot_bench.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
need_ref_categories
instance-attribute
¶
need_ref_categories = need_ref_categories or [
"math",
"coding",
"reasoning",
]
__init__ ¶
__init__(
path_or_name: str,
ref_path_or_name: str | None = None,
need_ref_categories: list[str] | None = None,
load_only_first_n: int | None = None,
) -> None
Source code in flexeval/core/chat_dataset/chatbot_bench.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
__len__ ¶
__len__() -> int
Source code in flexeval/core/chat_dataset/chatbot_bench.py
101 102 |
|
__getitem__ ¶
__getitem__(i: int) -> ChatInstance
Source code in flexeval/core/chat_dataset/chatbot_bench.py
104 105 106 107 108 109 110 111 112 113 114 115 |
|
OpenAIMessagesDataset ¶
This class loads data with OpenAI-like format in jsonl file. The difference lies in that this class has 'tool_definition' field, in which available tools are listed.
Tool-Calling (Function-Calling) is supported in this class. It must follow the same format as the OpenAI ChatCompletion API. https://platform.openai.com/docs/guides/function-calling?api-mode=chat#defining-functions
Parameters:
-
file_path
(str | list[str] | None
, default:None
) –Path or list of paths to
.jsonl
file(s). -
message_key
(str
, default:'messages'
) –Key used to extract the list of messages from each JSON object.
-
tool_definitions_key
(str | None
, default:None
) –Key used to extract the list of tool definitions from each JSON object. Set to
None
(default) for data without tool_calls. -
drop_if_last_from_assistant
(bool
, default:False
) –If true, when the last utterance is given by assistant, drop it.
In Jsonl, each line must have a following structure:
{
'<message_key>': [
{
'role': 'user',
'content': 'こんにちは。元気が出る言葉を教えて下さい。'
},
{
'role': 'assistant',
'content': 'こんなのはどうでしょう。どんどんやってください!'
},
],
}
Example with tool-calling:
{
'<message_key>': [
{
'role': 'user',
'content': 'こんにちは。元気が出る偉人の言葉を教えて下さい。'
},
{
'role': 'assistant',
'content': '調べてみますね。',
'tool_calls': [
{
'id': 'dummy1',
'function': {
'name': 'web_search',
'arguments': '{"query": "元気が出る言葉 偉人"}',
}
}
]
}
],
'<tool_definitions_key>': [
{
"type": "function",
"function": {
"name": "web_search",
...
}
}
]
}
Source code in flexeval/core/chat_dataset/openai_messages.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
__init__ ¶
__init__(
file_path: str | None = None,
message_key: str = "messages",
tool_definitions_key: str | None = None,
drop_if_last_from_assistant: bool = False,
) -> None
Source code in flexeval/core/chat_dataset/openai_messages.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
__len__ ¶
__len__() -> int
Source code in flexeval/core/chat_dataset/openai_messages.py
101 102 |
|
__getitem__ ¶
__getitem__(idx: int) -> ChatInstance
Source code in flexeval/core/chat_dataset/openai_messages.py
104 105 |
|
SacreBleuChatDataset ¶
Load datasets from the sacrebleu library. The available datasets are defined in sacrebleu.DATASETS.
Source code in flexeval/core/chat_dataset/sacrebleu_dataset.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
__init__ ¶
__init__(name: str, langpair: str) -> None
Source code in flexeval/core/chat_dataset/sacrebleu_dataset.py
11 12 13 14 15 16 17 18 19 |
|
__len__ ¶
__len__() -> int
Source code in flexeval/core/chat_dataset/sacrebleu_dataset.py
21 22 |
|
__getitem__ ¶
__getitem__(i: int) -> ChatInstance
Source code in flexeval/core/chat_dataset/sacrebleu_dataset.py
24 25 26 27 28 29 |
|