ALOCR reference manual

ALOCR Ver 1.0 reference manual

Constants and Structures
1. Character Control
2. Recognition Result
Details
1. Types of Pattern Dictionaries
2. Construction of Pattern Dictionary Classes
3. User Dictionary Management
4. Dictionary Class:CJocrDict Reference
5. Construction of Pattern Classes
6. Feature Calculation of Patterns
7. Pattern Class:CJocrPattern Reference
8. Construction of Single Character Recognition Classes
  Updated on November 6, 2000 In addition to the license code, the license code file can now be specified.
9. Single Character Recognition
10. Single Character Recognition Class:CJocrRecognize Reference
11. Construction of Line Recognition Classes
12. Line Recognition
13. Line Recognition Class:CJocrLine Reference
14. Language Processing with Line Recognition Class:CJocrLang Reference
15. Paragraph Recognition Class:CJocrBlock Reference
  Updated on July 31, 2002 Skew can now be specified for paragraphs. It is possible to recognize italic text only for single-line text.
  Updated on August 31, 2005 Addition of new dictionaries
  Updated on August 31, 2005 New version recognition method. Significant improvement in recognition rate compared to the conventional method
  Updated on November 30, 2022 Deep learning support. Support for color image and grayscale image recognition. Significant improvement in recognition rate compared to the conventional method
Sample Programs
1. VC++ Sample Project

reference manual

1.Constant and Structure

1.1 Character Type Control

Character type is specified by logical OR of character type flags. Character type flags are defined in the header file ocrdef.h. The user of the library can specify the following flags. Character type is specified in the mrecognize member function of the character recognition class and the line recognition class. Character type is specified by two arguments in mrecognize. The second argument specifies semantic constraints such as ASCII symbols and file name symbols.

Character type flags that can be specified as the first argument of mrecognize

Normal symbol	SYMBOL_ETC	！：”’‘？＾＿￣＆＃＠´ ｀‘“
Number symbol	SYMBOL_NUMBER	＝＋＊＜＞／￥＄％±×÷≠≦≧
Middle dot	SYMBOL_NAKAGURO	・
Minus sign	SYMBOL_MINUS	－
Comma	SYMBOL_COMMA	，
Period	SYMBOL_PERIOD	．
Round brackets	SYMBOL_PAREN	（）
Brackets	SYMBOL_BRACE	｛｝［］--〔〕「」『』【】
Punctuation marks	SYMBOL_KUTOTEN	、。
Circle	SYMBOL_MARU	○
Extended bar	SYMBOL_NOBASBO	ー
Arabic numerals	CHAR_NUMBER	０～９
Uppercase alphabets	CHAR_ALPHABET_CAPITAL	Ａ－Ｚ
Lowercase alphabets	CHAR_ALPHABET_SMALL	ａ－ｚ
Uppercase katakana	CHAR_KATAKANA_CAPITAL	ア－ン
Lowercase katakana	CHAR_KATAKANA_SMALL	ァ－ヶ
Uppercase hiragana	CHAR_HIRAGANA_CAPITAL	あ－ん
Lowercase hiragana	CHAR_HIRAGANA_SMALL	ぁ－ょ
Kanji numerals	CHAR_KANJI_NUMBER	一二三四五六七八九十百千万億兆
Kanji characters	CHAR_KANJI

Contents

Possible flags that can be specified as the second argument of the mrecognize method:

Ascii symbols	ASCII_CHARSET
File name	FILE_CHARSET

The second argument is specified to further narrow down the character type specified in the first argument. For example, when using only ASCII symbols (=+*<>/$%±×÷≠≦≧) from numerical symbols, you would specify:

mrecognize(SYMBOL_NUMBER, ASCII_CHARSET);

Specifying alphabets and Arabic numerals will also include two-digit English letters and numbers such as "WA" or "12" as recognition targets. Two-digit alphanumeric characters will improve recognition rates for mixed recognition of kanji, kana, and alphanumeric characters, but it will have a reverse effect when it is known that the recognition target is only alphanumeric. In that case, you would specify:

mrecognize(CHAR_NUMBER | CHAR_ALPHABET_CAPITAL | CHAR_ALPHABET_SMALL, ASCII_CHARSET);

Also, the following macros are defined for character sets:

Macro Constants	Description	Definition
CHAR_SET_SYMBOL	Symbols	(SYMBOL_ETC \| SYMBOL_NUMBER \| SYMBOL_NAKAGURO \| SYMBOL_MINUS \| SYMBOL_MARU \| SYMBOL_NOBASBO)
CHAR_SET_KAKKO	Brackets	(SYMBOL_PAREN \| SYMBOL_BRACE)
CHAR_SET_TERMINAL	Punctuation marks	(SYMBOL_PERIOD \| SYMBOL_COMMA \| SYMBOL_KUTOTEN)
CHAR_SET_ALPHABET	All alphabets	(CHAR_ALPHABET_SMALL \| CHAR_ALPHABET_CAPITAL)
CHAR_SET_HIRAGANA	Hiragana	(CHAR_HIRAGANA_CAPITAL \| CHAR_HIRAGANA_SMALL)
CHAR_SET_KATAKANA	Katakana	(CHAR_KATAKANA_CAPITAL \| CHAR_KATAKANA_SMALL)
CHAR_SET_KANJI	All kanji characters	(CHAR_KANJI \| CHAR_KANJI_NUMBER)
CHAR_SET_ALL	All character types	(CHAR_SET_SYMBOL \| CHAR_SET_KAKKO \| CHAR_SET_TERMINAL \| CHAR_NUMBER \| CHAR_SET_ALPHABET \| CHAR_SET_HIRAGANA \| CHAR_SET_KATAKANA \| CHAR_SET_KANJI)

In general, the character types to be recognized are specified by taking the logical sum of these macros.

Table of Contents

1.2 Recognition Result

Table of Contents The recognition result is managed by the following structure.
Members without comments are undisclosed internal information.

// Pattern bounding rectangle
typedef struct {
    short       x1;         // Top left x coordinate of the pattern
    short       y1;         // Top left y coordinate of the pattern
    short       x2;         // Bottom right x coordinate of the pattern
    short       y2;         // Bottom right y coordinate of the pattern
} OCRRect;
// Recognition candidate
typedef struct {
    char*           code;       // Pointer to the pattern string
    unsigned char   score;      // Confidence level
    unsigned char   filler[3];  // padding
} Candidate;
//////////////////
// OCR result structure
// 128 bytes fixed
typedef struct {
    Candidate       cand[MAX_CAND]; // Candidate data MAX_CAND == 10 80 bytes
    OCRRect         area;           // Recognized area 8 bytes
    long            fieldtype;
    unsigned long   chartype;       // Character type of the recognition result
    unsigned long   maskchartype1;
    unsigned long   maskchartype2;
    unsigned long   space;          // Number of spaces following the characters in a line recognition
    unsigned char   cgravx;
    unsigned char   cgravy;
    unsigned char   morph;
    unsigned char   size;
    char            newcand[KEYSIZE_MAX];
} OCRResult;

If the recognition result is

OCRResult ocrresult;

ocrresult.cand[0] through ocrresult.cand[9] are candidates for the recognition result, and ocrresult.cand[0].code can be considered as the string representing the recognition result of the first candidate, which is the pattern.

Among the members of OCRResult, the most important member is the Candidate array that represents the string of the recognition result. The ALOCR library includes a maximum of 10 candidates in OCRResult.

The code member of the Candidate structure points to the string of the recognition result. In most cases, the code points to a single kanji character string. The content of this code points to a string loaded from a dictionary (or registered in a user dictionary). Therefore, please be careful when deleting the dictionary class, as the pointer may not be guaranteed.

Candidates are the first and second candidates for each pattern. Please note that joining the second candidate does not lead to the string recognition result of the second candidate.

If the number of candidates is less than 10, the destination of the code is not NULL but "". Furthermore, if the score is 0, it indicates that there are no candidates.

The score member of the Candidate structure indicates the confidence level of the candidate character. The confidence value ranges from 0 to 100. In the case of 0, it indicates that there are no candidates. Confidence levels for candidates range from 1 to 100, and the closer it is to 100, the more reliable it is as a recognition result.

Normally, the confidence level of the first candidate is 60 or higher, and if it is lower, there is a high possibility that the recognition did not go well.

Among the members of OCRResult, the area member indicates the enclosing rectangle of the recognized pattern (character). The top left of the image is the origin, with the right direction being the positive direction of the x-coordinate and the downward direction being the positive direction of the y-coordinate.

typedef struct {
	short 		x1;			// Top left x-coordinate of the pattern
	short 		y1;			// Top left y-coordinate of the pattern
	short 		x2;			// Bottom right x-coordinate of the pattern
	short 		y2;			// Bottom right y-coordinate of the pattern
} OCRRect;

The other members of OCRResult are the character type of the string pointed by cand[0].code. space represents the spacing between characters in a line recognition (the number of spaces when a kanji character is converted to two spaces). The other members are undisclosed internal information.

Table of Contents

2.Each Section

2.1 Types of Pattern Dictionary

Table of Contents There are three types of pattern dictionaries in terms of format:

Basic Dictionary Format (2)
Optional Dictionary Format (any number)
User Dictionary Format (0 or 1)

To recognize, two basic dictionaries are always necessary. Other dictionaries are not required. Optional dictionaries are also called differential dictionaries because they contain differential information with basic dictionaries. Differential dictionaries without combining basic dictionaries have no meaning.

Dictionary Name	Record File Name	Key File Name	Content
Basic Dictionaries
system	system.dbs	system.key	Required
systemfat	systemfat.dbs	systemfat.key	Required
Optional Dictionaries (Differential Dictionaries)
diff0	diff0.dbf	diff0.kef	Kaisho Font
diff1	diff1.dbf	diff1.kef	Blurry Characters
diff2	diff2.dbf	diff2.kef	Squished Characters
diff3	diff3.dbf	diff3.kef	Numbers
diff4	diff4.dbf	diff4.kef	Alphabets
diff5	diff5.dbf	diff5.kef	Hiragana
User Pattern Dictionary
Any name

Dictionary Newly Added on August 31st, 2005
Entries of the basic dictionary itself have been added with 54 additional second-level kanji. Additionally, information about radical, phonetic, phonetic-semantic compound has been added.
The effect of improving recognition rate of alphanumeric kana can be achieved by kana/ninja001.
If "日" looks similar to "ＩＩ" and is heavily blurry, adding optblur and blur can improve the recognition.
If "書" looks like "■" with added fur, adding optblot and blot can improve the recognition.
The general document recognition API to be released within the year 2005 will include assessment of blurriness and squishiness of documents.
Newly added pattern dictionaries

kana	kana.dbf	kana.kef	Addition of Hiragana, Katakana
optblur	optblur.dbf	optblur.kef	Severe Blurry Characters 1
blur	blur.dbf	blur.kef	Severe Blurry Characters 2
optblot	optblot.dbf	optblot.kef	Severe Squished Characters 1
blot	blot.dbf	blot.kef	Severe Squished Characters 2
ninja001	ninja001.dbf	ninja001.kef	Addition of Alphanumeric Characters

To achieve a reasonably practical recognition rate, we recommend using the basic dictionary plus the dictionaries diff3, diff4, and diff5. The diff0 font is only used for special documents such as business cards, so it is not usually included. Adding diff1 and diff2 can have a significant effect if the image quality of the text to be recognized is poor.

About User Dictionary

The format of the user dictionary is the same as the optional dictionary, but the only difference is that the image of the pattern itself is also stored for reference. Only one user dictionary can be used per dictionary class instance.

Also, within the lifetime of one dictionary class instance, you can usually only register a maximum of 1024 patterns. If you need to register more than 1024 patterns, you need to delete the instance first, and then generate and initialize it again.
In addition, since the user dictionary is always managed exclusively as long as an instance exists, only one user dictionary can be used on one machine at a time. To share a user dictionary among multiple instances of recognition classes, you need to share one dictionary class instance.

The maximum number of records in one dictionary is 8192. A typical dictionary contains records corresponding to several hundred to several thousand characters.
The larger the size of the dictionary, the longer it takes for recognition.
By the way, as of now (June 1999), the total number of records combining all the basic and optional dictionaries is about 10,000.

FILE_OPEN_ERROR	File not found. The file is locked (the dictionary is currently in use). An instance using the same user dictionary already exists. Another instance is loading the same system dictionary.
FILE_READ_ERROR	Read error (the dictionary may be corrupted).
FILE_SEEK_ERROR	Seek error (the dictionary may be corrupted).
MEMORY_SHORTAGE	Memory shortage.
FATAL_ERROR	Fatal error (no dictionary records could be loaded).

Class Name	CJocrDict
Header File	ocrdef.h ocrco.h cjocrstock.h cjocrdict98.h errcode.h

Class Name	CJocrPattern
Header File	ocrdef.h ocrco.h cjocrpat98.h errcode.h

Class Name	CJocrRecognize
Header File	ocrdef.h ocrco.h cjocrrec98.h errcode.h

Class Name	CJocrLine
Header File	ocrdef.h ocrco.h cjocrline98.h errcode.h

Class Name	CJocrLang
Header Files	ocrdef.h ocrco.h cjocrline98.h cjocrlang.h errcode.h

Class Name	CJocrBlock
Header File	ocrdef.h ocrco.h cjocrblock.h errcode.h

ALOCR Ver 1.0 reference manual

1.Constant and Structure

1.1 Character Type Control

1.2 Recognition Result

2.Each Section

2.1 Types of Pattern Dictionary

2.2 Construction of Pattern Dictionary Class

Code example for constructing a basic dictionary class

2.3 User Dictionary Management

2.4 Dictionary Class Reference

2.5 Constructing Pattern Classes

2.6 Pattern Feature Calculation

2.7 Pattern Class: CJocrPattern Reference

2.8 Building the Single Character Recognition Class

2.9 Character Recognition

2.10 Character Recognition Class: CJocrRecognize Reference

1. Building the Line Recognition Class

2.12 Line Recognition

2.13 Line Recognition Class Reference

2.14 Language Processing Line Recognition Class Reference

2.15 Paragraph Recognition Class Reference

3.1 Sample Project for VC++ (Included)