Chared — Character encoding detection

Chared is a tool for detecting the character encoding of a text in a known language. The language of the text has to be specified as an input parameter so that correspondent language model can be used. The package contains models for a wide range of languages. In general, it should be more accurate than character encoding detection algorithms with no language constraints.

The project at Corpus tools (including the source)

Online demo: encoded documents comparison

Examples

Custom URL

Custom document upload

Acknowledgements

This software is developed at the Natural Language Processing Centre of Masaryk University in Brno with a financial support from PRESEMT and Lexical Computing Ltd, a corpus tool producer.

(c) 2011 Vit Suchomel and Jan Pomikalek