▲ 2 Byte-Level Tokenizers Unavoidably Enable LLMs to Generate Ill-Formed UTF-8 (arxiv.org) by PaulHoule | Dec 2, 2025 | 1 comments on HN Visit Link