Skip to content

Pull requests: huggingface/tokenizers

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Batch encode: simple lock-free scheduler
#2044 opened Apr 28, 2026 by sebpop Contributor Loading…
feat(NFC): skip Unicode pass for all-ASCII inputs
#2037 opened Apr 26, 2026 by KimBioInfoStudio Loading…
2 of 3 tasks
feat: SIMD ASCII fast path for Lowercase normalizer (~30-49x)
#2036 opened Apr 26, 2026 by KimBioInfoStudio Loading…
6 of 7 tasks
V0.23 release
#2032 opened Apr 24, 2026 by ArthurZucker Collaborator Loading…
Batch encode: lock-free work queue with dynamic window sizing
#2029 opened Apr 23, 2026 by sebpop Contributor Loading…
perf: skip alignment tracking in encode_fast normalization
#2022 opened Apr 10, 2026 by ArthurZucker Collaborator Loading…
node: bump version to 0.22.2 for release
#2009 opened Apr 4, 2026 by MayCXC Contributor Loading…
feat(pattern): parallel regex find_matches for large inputs
#2003 opened Mar 31, 2026 by McPatate Member Loading…
fix: skip serializing ByteLevel fields at their default value
#2001 opened Mar 30, 2026 by ArthurZucker Collaborator Loading…
Regex split parity
#1991 opened Mar 27, 2026 by ArthurZucker Collaborator Loading…
feat: add new faster whitespace split pretok
#1985 opened Mar 26, 2026 by McPatate Member Loading…
Implementing Parity-aware BPE
#1974 opened Mar 21, 2026 by cimeister Loading…
feat: add pcre2 as optional feature
#1959 opened Mar 2, 2026 by wheynelau Contributor Loading…
Add get_special_tokens and is_special_token methods
#1945 opened Feb 5, 2026 by ArthurZucker Collaborator Loading…
2 tasks done
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.