TextDataset
TextDataset ¶
This class represents a dataset of text examples.
Source code in flexeval/core/text_dataset/base.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
__len__
abstractmethod
¶
__len__() -> int
Source code in flexeval/core/text_dataset/base.py
19 20 21 |
|
__getitem__
abstractmethod
¶
__getitem__(item: int) -> TextInstance
Source code in flexeval/core/text_dataset/base.py
23 24 25 |
|
__repr__ ¶
__repr__() -> str
Source code in flexeval/core/text_dataset/base.py
27 28 |
|
TextInstance
dataclass
¶
HFTextDataset ¶
This class represents a dataset of text examples loaded from Hugging Face datasets.
Parameters:
-
path
(str
) –The name of the dataset to load.
-
split
(str
) –The split of the dataset to load.
-
text_template
(str
) –A Jinja2 template for the text.
-
subset
(str | None
, default:None
) –The subset of the dataset to load.
-
keep_conditions
(dict[str, str] | None
, default:None
) –A dictionary to indicate the condition to filter certain items. The key is a Jinja2 template string to embed the item into a string, and the value is the value to keep.
-
remove_conditions
(dict[str, str] | None
, default:None
) –A dictionary to indicate the condition to remove certain items. The key is a Jinja2 template string to embed the item into a string, and the value is the value to remove.
-
dataset_kwargs
(dict[str, Any] | None
, default:None
) –Additional keyword arguments for
datasets.load_dataset
.
Source code in flexeval/core/text_dataset/hf.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
dataset
instance-attribute
¶
dataset = filter(
lambda x, t=filter_template, v=value_to_remove: render(
**x
)
!= v
)
__init__ ¶
__init__(
path: str,
split: str,
text_template: str,
prefix_template: str | None = None,
subset: str | None = None,
keep_conditions: dict[str, str] | None = None,
remove_conditions: dict[str, str] | None = None,
dataset_kwargs: dict[str, Any] | None = None,
) -> None
Source code in flexeval/core/text_dataset/hf.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
|
__len__ ¶
__len__() -> int
Source code in flexeval/core/text_dataset/hf.py
56 57 |
|
__getitem__ ¶
__getitem__(i: int) -> TextInstance
Source code in flexeval/core/text_dataset/hf.py
59 60 61 62 63 64 65 |
|
JsonlTextDataset ¶
This class represents a dataset of text examples loaded from a JSONL file.
Parameters:
-
path
(str | PathLike[str]
) –The path to the JSONL file.
-
field
(str
) –The field to extract from the JSONL file.
Source code in flexeval/core/text_dataset/jsonl.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
__init__ ¶
__init__(path: str | PathLike[str], field: str) -> None
Source code in flexeval/core/text_dataset/jsonl.py
20 21 22 23 24 25 |
|
__len__ ¶
__len__() -> int
Source code in flexeval/core/text_dataset/jsonl.py
27 28 |
|
__getitem__ ¶
__getitem__(item: int) -> TextInstance
Source code in flexeval/core/text_dataset/jsonl.py
30 31 |
|