![]() |
Leptonica
1.82.0
Image processing and image analysis suite
|
Go to the source code of this file.
Data Structures | |
struct | L_Recog |
struct | L_Rch |
struct | L_Rcha |
struct | L_Rdid |
Macros | |
#define | RECOG_VERSION_NUMBER 2 |
Typedefs | |
typedef struct L_Recog | L_RECOG |
typedef struct L_Rch | L_RCH |
typedef struct L_Rcha | L_RCHA |
typedef struct L_Rdid | L_RDID |
Enumerations | |
enum | { L_UNKNOWN = 0 , L_ARABIC_NUMERALS = 1 , L_LC_ROMAN_NUMERALS = 2 , L_UC_ROMAN_NUMERALS = 3 , L_LC_ALPHA = 4 , L_UC_ALPHA = 5 } |
enum | { L_USE_ALL_TEMPLATES = 0 , L_USE_AVERAGE_TEMPLATES = 1 } |
This is a simple utility for training and recognizing individual machine-printed text characters. It is designed to be adapted to a particular set of character images; e.g., from a book. There are two methods of training the recognizer. In the most simple, a set of bitmaps has been labeled by some means, such a generic OCR program. This is input either one template at a time or as a pixa of templates, to a function that creates a recog. If in a pixa, the text string label must be embedded in the text field of each pix. If labeled data is not available, we start with a bootstrap recognizer (BSR) that has labeled data from a variety of sources. These images are scaled, typically to a fixed height, and then fed similarly scaled unlabeled images from the source (e.g., book), and the BSR attempts to identify them. All images that have a high enough correlation score with one of the templates in the BSR are emitted in a pixa, which now holds unscaled and labeled templates from the source. This is the generator for a book adapted recognizer (BAR). The pixa should always be thought of as the primary structure. It is the generator for the recog, because a recog is built from a pixa of unscaled images. New image templates can be added to a recog as long as it is in training mode. Once training is finished, to add templates it is necessary to extract the generating pixa, add templates to that pixa, and make a new recog. Similarly, we do not join two recog; instead, we simply join their generating pixa, and make a recog from that. To remove outliers from a pixa of labeled pix, make a recog, determine the outliers, and generate a new pixa with the outliers removed. The outliers are determined by building special templates for each character set that are scaled averages of the individual templates. Then a correlation score is found between each template and the averaged templates. There are two implementations; outliers are determined as either: (1) a template having a correlation score with its class average that is below a threshold, or (2) a template having a correlation score with its class average that is smaller than the correlation score with the average of another class. Outliers are removed from the generating pixa. Scaled averaging is only performed for determining outliers and for splitting characters; it is never used in a trained recognizer for identifying unlabeled samples. Two methods using averaged templates are provided for splitting touching characters: (1) greedy matching (2) document image decoding (DID) The DID method is the default. It is about 5x faster and possibly more accurate. Once a BAR has been made, unlabeled sample images are identified by finding the individual template in the BAR with highest correlation. The input images and images in the BAR can be represented in two ways: (1) as scanned, binarized to 1 bpp (2) as a width-normalized outline formed by thinning to a skeleton and then dilating by a fixed amount. The recog can be serialized to file and read back. The serialized version holds the templates used for correlation (which may have been modified by scaling and turning into lines from the unscaled templates), plus, for arbitrary character sets, the UTF8 representation and the lookup table mapping from the character representation to index. Why do we not use averaged templates for recognition? Letterforms can take on significantly different shapes (eg., the letters 'a' and 'g'), and it makes no sense to average these. The previous version of this utility allowed multiple recognizers to exist, but this is an unnecessary complication if recognition is done on all samples instead of on averages.
Definition in file recog.h.
anonymous enum |