Normal symbol | SYMBOL_ETC | !:”’‘?^_ ̄&#@´ `‘“ |
Number symbol | SYMBOL_NUMBER | =+*<>/¥$%±×÷≠≦≧ |
Middle dot | SYMBOL_NAKAGURO | ・ |
Minus sign | SYMBOL_MINUS | - |
Comma | SYMBOL_COMMA | , |
Period | SYMBOL_PERIOD | . |
Round brackets | SYMBOL_PAREN | () |
Brackets | SYMBOL_BRACE | {}[]--〔〕「」『』【】 |
Punctuation marks | SYMBOL_KUTOTEN | 、。 |
Circle | SYMBOL_MARU | ○ |
Extended bar | SYMBOL_NOBASBO | ー |
Arabic numerals | CHAR_NUMBER | 0~9 |
Uppercase alphabets | CHAR_ALPHABET_CAPITAL | A-Z |
Lowercase alphabets | CHAR_ALPHABET_SMALL | a-z |
Uppercase katakana | CHAR_KATAKANA_CAPITAL | ア-ン |
Lowercase katakana | CHAR_KATAKANA_SMALL | ァ-ヶ |
Uppercase hiragana | CHAR_HIRAGANA_CAPITAL | あ-ん |
Lowercase hiragana | CHAR_HIRAGANA_SMALL | ぁ-ょ |
Kanji numerals | CHAR_KANJI_NUMBER | 一二三四五六七八九十百千万億兆 |
Kanji characters | CHAR_KANJI |
Ascii symbols | ASCII_CHARSET |
File name | FILE_CHARSET |
Macro Constants | Description | Definition |
---|---|---|
CHAR_SET_SYMBOL | Symbols | (SYMBOL_ETC | SYMBOL_NUMBER | SYMBOL_NAKAGURO | SYMBOL_MINUS | SYMBOL_MARU | SYMBOL_NOBASBO) |
CHAR_SET_KAKKO | Brackets | (SYMBOL_PAREN | SYMBOL_BRACE) |
CHAR_SET_TERMINAL | Punctuation marks | (SYMBOL_PERIOD | SYMBOL_COMMA | SYMBOL_KUTOTEN) |
CHAR_SET_ALPHABET | All alphabets | (CHAR_ALPHABET_SMALL | CHAR_ALPHABET_CAPITAL) |
CHAR_SET_HIRAGANA | Hiragana | (CHAR_HIRAGANA_CAPITAL | CHAR_HIRAGANA_SMALL) |
CHAR_SET_KATAKANA | Katakana | (CHAR_KATAKANA_CAPITAL | CHAR_KATAKANA_SMALL) |
CHAR_SET_KANJI | All kanji characters | (CHAR_KANJI | CHAR_KANJI_NUMBER) |
CHAR_SET_ALL | All character types | (CHAR_SET_SYMBOL | CHAR_SET_KAKKO | CHAR_SET_TERMINAL | CHAR_NUMBER | CHAR_SET_ALPHABET | CHAR_SET_HIRAGANA | CHAR_SET_KATAKANA | CHAR_SET_KANJI) |
// Pattern bounding rectangle typedef struct { short x1; // Top left x coordinate of the pattern short y1; // Top left y coordinate of the pattern short x2; // Bottom right x coordinate of the pattern short y2; // Bottom right y coordinate of the pattern } OCRRect; // Recognition candidate typedef struct { char* code; // Pointer to the pattern string unsigned char score; // Confidence level unsigned char filler[3]; // padding } Candidate; ////////////////// // OCR result structure // 128 bytes fixed typedef struct { Candidate cand[MAX_CAND]; // Candidate data MAX_CAND == 10 80 bytes OCRRect area; // Recognized area 8 bytes long fieldtype; unsigned long chartype; // Character type of the recognition result unsigned long maskchartype1; unsigned long maskchartype2; unsigned long space; // Number of spaces following the characters in a line recognition unsigned char cgravx; unsigned char cgravy; unsigned char morph; unsigned char size; char newcand[KEYSIZE_MAX]; } OCRResult;
typedef struct { short x1; // Top left x-coordinate of the pattern short y1; // Top left y-coordinate of the pattern short x2; // Bottom right x-coordinate of the pattern short y2; // Bottom right y-coordinate of the pattern } OCRRect;
Basic Dictionaries | |||
Dictionary Name | Record File Name | Key File Name | Content |
---|---|---|---|
system | system.dbs | system.key | Required |
systemfat | systemfat.dbs | systemfat.key | Required |
Optional Dictionaries (Differential Dictionaries) | |||
diff0 | diff0.dbf | diff0.kef | Kaisho Font |
diff1 | diff1.dbf | diff1.kef | Blurry Characters |
diff2 | diff2.dbf | diff2.kef | Squished Characters |
diff3 | diff3.dbf | diff3.kef | Numbers |
diff4 | diff4.dbf | diff4.kef | Alphabets |
diff5 | diff5.dbf | diff5.kef | Hiragana |
User Pattern Dictionary | |||
Any name |
kana | kana.dbf | kana.kef | Addition of Hiragana, Katakana |
optblur | optblur.dbf | optblur.kef | Severe Blurry Characters 1 |
blur | blur.dbf | blur.kef | Severe Blurry Characters 2 |
optblot | optblot.dbf | optblot.kef | Severe Squished Characters 1 |
blot | blot.dbf | blot.kef | Severe Squished Characters 2 |
ninja001 | ninja001.dbf | ninja001.kef | Addition of Alphanumeric Characters |
About User Dictionary |
The format of the user dictionary is the same as the optional dictionary, but the only difference is that the image of the pattern itself is also stored for reference. Only one user dictionary can be used per dictionary class instance. Also, within the lifetime of one dictionary class instance, you can usually only register a maximum of 1024 patterns. If you need to register more than 1024 patterns, you need to delete the instance first, and then generate and initialize it again. In addition, since the user dictionary is always managed exclusively as long as an instance exists, only one user dictionary can be used on one machine at a time. To share a user dictionary among multiple instances of recognition classes, you need to share one dictionary class instance. The maximum number of records in one dictionary is 8192. A typical dictionary contains records corresponding to several hundred to several thousand characters. The larger the size of the dictionary, the longer it takes for recognition. By the way, as of now (June 1999), the total number of records combining all the basic and optional dictionaries is about 10,000. |
FILE_OPEN_ERROR | File not found. The file is locked (the dictionary is currently in use). An instance using the same user dictionary already exists. Another instance is loading the same system dictionary. |
FILE_READ_ERROR | Read error (the dictionary may be corrupted). |
FILE_SEEK_ERROR | Seek error (the dictionary may be corrupted). |
MEMORY_SHORTAGE | Memory shortage. |
FATAL_ERROR | Fatal error (no dictionary records could be loaded). |
#include "ocrdef.h" #include "ocrco.h" #include "cjocrstock.h" #include "cjocrdict98.h" #include "errcode.h" ..... // 1...Create an instance of the dictionary class CJocrDict* pjocrdict = new CJocrDict; // 2...Set up the basic dictionary pjocrdict->msetsystemdict("\\dic\\feature\\system"); pjocrdict->msetsystemdict("\\dic\\feature\\systemfat"); // 3...Set up the optional dictionary pjocrdict->msetdiffdict("\\dic\\feature\\diff1"); pjocrdict->msetdiffdict("\\dic\\feature\\diff2"); pjocrdict->msetdiffdict("\\dic\\feature\\diff3"); pjocrdict->msetdiffdict("\\dic\\feature\\diff4"); pjocrdict->msetdiffdict("\\dic\\feature\\diff5"); // 5...Set up the user dictionary (optional) m_pJocrDict->msetuserdict("\\dic\\feature\\userpat"); // 6...Load the dictionary i1 = pjocrdict->mloaddict(); if(i1 < 0) { Error (defined in errcode.h) FILE_OPEN_ERROR The dictionary could not be found FILE_READ_ERROR The dictionary could not be read FILE_SEEK_ERROR Unable to seek to the dictionary record MEMORY_SHORTAGE Out of memory (no further processing) FATAL_ERROR Unrecoverable error (no further processing) } ......Recognition, registration, deletion, reference // Delete the instance delete pjocrdict;
int i1 = pjocrdict->mput("漢",pattern); if(i1 < 0) { Error; FILE_SEEK_ERROR FILE_WRITE_ERROR }
int i1 = pjocrdict->mseek("漢"); if(i1 < 0) { Error; } if(i1 == 1) { // Delete i1 = pjocrdict->mdel(); if(i1 < 0) { Error; } }
int i1 = pjocrdict->mseeknext(); // keysize is a variable used for input and output // It represents the maximum read buffer size when inputting // and the actual read size when outputting unsigned long keysize = KEYSIZE_MAX; char keybuffer[KEYSIZE_MAX]; if(i1 == 1) { i1 = pjocrdict->mgetkey(keysize,keybuffer); if(i1 < 0) { Error; } else { // REG_FONT_SIZE is the normalized font size in bytes // It is resized and saved as REG_FONT_SIZE during user dictionary registration // In ALOCR Ver.1.0, the normalized font is 48x48 pixels, which is 288 bytes // REG_FONT_WIDTH....48 // REG_FONT_HEIGHT....48 // REG_FONT_SIZE....48*48/8(8 pixels = 1 byte) // recordsize is a variable used for input and output // It represents the maximum read buffer size when inputting // and the actual read size when outputting unsigned long recordsize = REG_FONT_SIZE; char record[REG_FONT_SIZE]; i1 = mgetpattern(recordsize,record); if(i1 < 0) { Error; } // The obtained pattern is a bitmap of REG_FONT_WIDTH × REG_FONT_HEIGHT pixels. } } else if(i1 == 0) { No more records; } else { Error; }
Class Name | CJocrDict |
Header File | ocrdef.h ocrco.h cjocrstock.h cjocrdict98.h errcode.h |
Class Name | CJocrPattern |
Header File | ocrdef.h ocrco.h cjocrpat98.h errcode.h |
// 1...Create an instance of the pattern class CJocrPattern* pattern = new CJocrPattern; // 2...Allocate memory i1 = pattern->mallocmemory(); if(i1 < 0) {Display error message; delete pattern;} ..... delete pattern;
typedef struct { unsigned char* top; // Address of the start of the image data short width; // Width of the image data (in byte boundaries, in pixels) short height; // Height of the image data (in pixels) } OCRBuffer; typedef struct { short x1; // Top-left x-coordinate of the pattern short y1; // Top-left y-coordinate of the pattern short x2; // Bottom-right x-coordinate of the pattern short y2; // Bottom-right y-coordinate of the pattern } OCRRect;
Class Name | CJocrRecognize |
Header File | ocrdef.h ocrco.h cjocrrec98.h errcode.h |
// 1...Create an instance of the recognition class // Construct an instance by specifying a 20-digit code supplied by the library license distributor.
// Updated on November 6, 2000 // Alternatively, specify the path to a license code file supplied by the library license distributor.
CJocrRecognize* precognize = new CJocrRecognize("ABCDEFGHJKLMNPQ23456"); // For license code files CJocrRecognize* precognize = new CJocrRecognize("C:\\Program Files\\Foo\\jocr.kcd"); // 2...Pattern setting // Let pattern be an instance of the CJocrPattern class // .....Generate and initialize the pattern precognize->msetpatter(pattern); // 3...Dictionary setting // Let pjocrdict be an instance of the CJocrDict class // .....Generate and initialize pjocrdict precognize->msetdict(pjocrdict); // 4...Memory allocation i1 = precognize->mallocmemory(); if(i1 < 0) { MEMORY_SHORTAGE....Insufficient memory (defined in errcode.h) Display error message;delete pattern; } .... Repeat single character recognition process .... delete pattern;
pattern->mrecognize(CHAR_SET_ALL); // Get recognition result // For OCRResult, please refer to section 2-2 in the reference manual. OCRResult aresult; mgetresult(&aresult);
Class Name | CJocrLine |
Header File | ocrdef.h ocrco.h cjocrline98.h errcode.h |
// 1...Create an instance of the line class CJocrLine* pjocrline = new CJocrLine; // 2...Set an instance of the pattern class to the line class (pattern already constructed elsewhere) // Execution of mallocmemory is required pjocrline->msetpattern(pattern); // 3...Set an instance of the recognition class to the line class (precognize already constructed elsewhere) // Execution of msetpattern, msetdict, and mallocmemory is required pjocrline->msetrecognize(precognize); // 4...Initialize the document // Call this whenever the image buffer for recognition changes typedef struct { unsigned char* top; // Starting address of the image data short width; // Width of the image data (byte alignment, pixel units) short height; // Height of the image data (pixel units) } OCRBuffer; OCRBuffer aocrbuffer; aocrbuffer.top = ...; // Buffer address aocrbuffer.width = ...; // Buffer width (pixel units, multiple of 8) aocrbuffer.height = ...; // Buffer height (pixel units) int i1 = pjocrline->msetdocument(&aocrbuffer); if(i1 < 0) { // MEMORY_SHORTAGE....Insufficient memory Display error message; delete pjocrline; } pjocrline->msetdpi(400); // Resolution set to 400dpi Repeat line recognition. Call msetdocument when the image buffer for recognition changes. delete pjocrline;
// Line settings OCRRect aocrrect; aocrrect.x1; // Top-left X coordinate of the bounding rectangle of the line aocrrect.y1; // Top-left Y coordinate of the bounding rectangle of the line aocrrect.x2; // Bottom-right X coordinate of the bounding rectangle of the line aocrrect.y2; // Bottom-right Y coordinate of the bounding rectangle of the line #if Horizontal writing msetlineuser(&aocrrect,HORIZONTAL_LINE); // Horizontal writing #else msetlineuser(&aocrrect,VERTICAL_LINE); // Vertical writing #endif // Line recognition i1 = pjocrline->mrecognize(CHAR_SET_ALL); if(i1 < 0) {Display error message;} else { int resultnum; OCRResult pocrresult[resultnum]; // Get results i1 = pjocrline->mgetresult(resultnum,pocrresult); if(i1 < 0) {Display error message;} }
Class Name | CJocrLang |
Header Files | ocrdef.h ocrco.h cjocrline98.h cjocrlang.h errcode.h |
Class Name | CJocrBlock |
Header File | ocrdef.h ocrco.h cjocrblock.h errcode.h |