#nlp
4 pages tagged with "nlp"
programming
- building a 150K-word english dictionary with llms: opengloss โ synthetic encyclopedic dictionary and semantic knowledge graph โ 537K sense definitions, 9.14M edges, and a live api, generated in under a week for under $1,000
- fast zero-dependency sentence splitting in python with nupunkt โ pure python and rust sentence boundary detection โ 91% precision, 10M+ chars/sec, zero dependencies, trained on legal text but used across domains
- tokenizer backend changes โ unified tokenizer backend system eliminating the fast/slow distinction in transformers v5.0
- transformers v5.0 โ hugging face transformers v5.0 release overview, features, and migration guide