New algorithms introduced from 2009 to 2016.
Algorithms existing since before 2008
- Binarization using a fast labeling algorithm.
In binarization of grayscale images, 255 images are prepared by binarizing the luminance value with threshold values ranging from 1 to 255. A fast labeling process is applied to all of them to obtain a histogram of the number of labeled regions. If there are many regions, it is determined as "blur", and if there are few, it is determined as "smudge" to obtain the optimal threshold for binarization. This is a slow and high-quality binarization method, suitable for obtaining good 2-bit images even at slow speeds.
- Nested paragraph extraction.
It is now possible to perform nested paragraph extraction within a paragraph when extracting paragraphs or lines of different layouts.
- New connected string processing.
It is now possible to recognize connected strings of 3 or more characters as shown above.
|