Tokenizer
Tokenizer ¶
Tokenizer interface.
Tokenizers are used to split text into tokens.
Typically, this is used in Metric
that requires word-level statistics.
Source code in flexeval/core/tokenizer/base.py
6 7 8 9 10 11 12 13 14 15 16 |
|
tokenize
abstractmethod
¶
tokenize(text: str) -> list[str]
Source code in flexeval/core/tokenizer/base.py
14 15 16 |
|
MecabTokenizer ¶
MeCab tokenizer for Japanese text.
Source code in flexeval/core/tokenizer/mecab.py
6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
__init__ ¶
__init__() -> None
Source code in flexeval/core/tokenizer/mecab.py
11 12 13 14 |
|
tokenize ¶
tokenize(text: str) -> list[str]
Source code in flexeval/core/tokenizer/mecab.py
16 17 18 |
|
SacreBleuTokenizer ¶
A tokenizer imported from uses the sacrebleu library.
Parameters:
-
name
(str
) –The name of the tokenizer.
Source code in flexeval/core/tokenizer/sacrebleu_tokenizer.py
8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
__init__ ¶
__init__(name: str) -> None
Source code in flexeval/core/tokenizer/sacrebleu_tokenizer.py
16 17 |
|
tokenize ¶
tokenize(text: str) -> list[str]
Source code in flexeval/core/tokenizer/sacrebleu_tokenizer.py
19 20 |
|
TiktokenTokenizer ¶
Source code in flexeval/core/tokenizer/tiktoken_tokenizer.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
__init__ ¶
__init__(
tokenizer_name: str | None = None,
model_name: str | None = None,
) -> None
Source code in flexeval/core/tokenizer/tiktoken_tokenizer.py
9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
tokenize ¶
tokenize(text: str) -> list[str]
Source code in flexeval/core/tokenizer/tiktoken_tokenizer.py
23 24 25 |
|
TransformersTokenizer ¶
Source code in flexeval/core/tokenizer/transformers_tokenizer.py
10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
__init__ ¶
__init__(
path: str,
init_kwargs: dict[str, Any] | None = None,
tokenize_kwargs: dict[str, Any] | None = None,
) -> None
Source code in flexeval/core/tokenizer/transformers_tokenizer.py
11 12 13 14 15 16 17 18 19 |
|
tokenize ¶
tokenize(text: str) -> list[str]
Source code in flexeval/core/tokenizer/transformers_tokenizer.py
21 22 |
|
WhitespaceTokenizer ¶
A simple whitespace tokenizer.
Source code in flexeval/core/tokenizer/whitespace.py
6 7 8 9 10 11 12 |
|
tokenize ¶
tokenize(text: str) -> list[str]
Source code in flexeval/core/tokenizer/whitespace.py
11 12 |
|