Leptonica  1.82.0
Image processing and image analysis suite
flipdetect.c File Reference
#include <math.h>
#include "allheaders.h"

Go to the source code of this file.

Functions

static void pixDebugFlipDetect (const char *filename, PIX *pixs, PIX *pixhm, l_int32 enable)
 
PIXpixOrientCorrect (PIX *pixs, l_float32 minupconf, l_float32 minratio, l_float32 *pupconf, l_float32 *pleftconf, l_int32 *protation, l_int32 debug)
 
l_ok pixOrientDetect (PIX *pixs, l_float32 *pupconf, l_float32 *pleftconf, l_int32 mincount, l_int32 debug)
 
l_ok makeOrientDecision (l_float32 upconf, l_float32 leftconf, l_float32 minupconf, l_float32 minratio, l_int32 *porient, l_int32 debug)
 
l_ok pixUpDownDetect (PIX *pixs, l_float32 *pconf, l_int32 mincount, l_int32 npixels, l_int32 debug)
 
l_ok pixMirrorDetect (PIX *pixs, l_float32 *pconf, l_int32 mincount, l_int32 debug)
 

Variables

static const char * textsel1
 
static const char * textsel2
 
static const char * textsel3
 
static const char * textsel4
 
static const l_int32 DefaultMinUpDownCount = 70
 
static const l_float32 DefaultMinUpDownConf = 8.0
 
static const l_float32 DefaultMinUpDownRatio = 2.5
 
static const l_int32 DefaultMinMirrorFlipCount = 100
 
static const l_float32 DefaultMinMirrorFlipConf = 5.0
 

Detailed Description


     High-level interface for detection and correction
         PIX         *pixOrientCorrect()

     Page orientation detection (pure rotation by 90 degree increments):
         l_int32      pixOrientDetect()
         l_int32      makeOrientDecision()
         l_int32      pixUpDownDetect()

     Page mirror detection (flip 180 degrees about line in plane of image):
         l_int32      pixMirrorDetect()

     Static debug helper
         static void  pixDebugFlipDetect()

 ===================================================================

 Page transformation detection:

 Once a page is deskewed, there are 8 possible states that it
 can be in, shown symbolically below.  Suppose state 0 is correct.

     0: correct     1          2          3
     +------+   +------+   +------+   +------+
     | **** |   | *    |   | **** |   |    * |
     | *    |   | *    |   |    * |   |    * |
     | *    |   | **** |   |    * |   | **** |
     +------+   +------+   +------+   +------+

        4          5          6          7
     +-----+    +-----+    +-----+    +-----+
     | *** |    |   * |    | *** |    | *   |
     |   * |    |   * |    | *   |    | *   |
     |   * |    |   * |    | *   |    | *   |
     |   * |    | *** |    | *   |    | *** |
     +-----+    +-----+    +-----+    +-----+

 Each of the other seven can be derived from state 0 by applying some
 combination of a 90 degree clockwise rotation, a flip about
 a horizontal line, and a flip about a vertical line,
 all abbreviated as:
     R = Rotation (about a line perpendicular to the image)
     H = Horizontal flip (about a vertical line in the plane of the image)
     V = Vertical flip (about a horizontal line in the plane of the image)

 We get these transformations:
     RHV
     000  -> 0
     001  -> 1
     010  -> 2
     011  -> 3
     100  -> 4
     101  -> 5
     110  -> 6
     111  -> 7

 Note that in four of these, the sum of H and V is 1 (odd).
 For these four, we have a change in parity (handedness) of
 the image, and the transformation cannot be performed by
 rotation about a vertical line out of the page.   Under
 rotation R, the set of 8 transformations decomposes into
 two subgroups linking {0, 3, 4, 7} and {1, 2, 5, 6} independently.

 pixOrientDetect() tests for a pure rotation (0, 90, 180, 270 degrees).
 It doesn't change parity.

 pixMirrorDetect() tests for a horizontal flip about the vertical axis.
 It changes parity.

 The landscape/portrait rotation can be detected in two ways:

   (1) Compute the deskew confidence for an image segment,
       both as is and rotated 90 degrees  (see skew.c).

   (2) Compute the ascender/descender signal for the image,
       both as is and rotated 90 degrees  (implemented here).

 The ascender/descender signal is useful for determining text
 orientation in Roman alphabets because the incidence of letters
 with straight-line ascenders (b, d, h, k, l, 't') outnumber
 those with descenders ('g', p, q).  The letters 't' and 'g'
 will respond variably to the filter, depending on the type face.

 What about the mirror image situations?  These aren't common
 unless you're dealing with film, for example.
 But you can reliably test if the image has undergone a
 parity-changing flip once about some axis in the plane
 of the image, using pixMirrorDetect*().  This works ostensibly by
 counting the number of characters with ascenders that
 stick out to the left and right of the ascender.  Characters
 that are not mirror flipped are more likely to extend to the
 right (b, h, k) than to the left (d).  Of course, that is for
 text that is rightside-up.  So before you apply the mirror
 test, it is necessary to insure that the text has the ascenders
 going up, and not down or to the left or right.  But here's
 what *really* happens.  It turns out that the pre-filtering before
 the hit-miss transform (HMT) is crucial, and surprisingly, when
 the pre-filtering is chosen to generate a large signal, the majority
 of the signal comes from open regions of common lower-case
 letters such as 'e', 'c' and 'f'.

 The set of operations you actually use depends on your prior knowledge:

 (1) If the page is known to be either rightside-up or upside-down, use
     either pixOrientDetect() with pleftconf = NULL, or
     pixUpDownDetect().

 (2) If any of the four orientations are possible, use pixOrientDetect().

 (3) If the text is horizontal and rightside-up, the only remaining
     degree of freedom is a left-right mirror flip: use pixMirrorDetect().

 (4) If you have a relatively large amount of numbers on the page,
     use the slower pixUpDownDetect().

 We summarize the full orientation and mirror flip detection process:

 (1) First determine which of the four 90 degree rotations
     causes the text to be rightside-up.  This can be done
     with either skew confidence or the pixOrientDetect()
     signals.  For the latter, see the table for pixOrientDetect().

 (2) Then, with ascenders pointing up, apply pixMirrorDetect().
     In the normal situation the confidence confidence will be
     large and positive.  However, if mirror flipped, the
     confidence will be large and negative.

 A high-level interface, pixOrientCorrect() combines the detection
 of the orientation with the rotation decision and the rotation itself.

 For pedagogical reasons, we have included a dwa implementation of
 this functionality, in flipdetectdwa.c.notused.  It shows by example
 how to make a dwa implementation of an application that uses binary
 morphological operations.  It is faster than the rasterop implementation,
 but not by a large amount.

 Finally, use can be made of programs such as exiftool and convert to
 read exif camera orientation data in jpeg files and conditionally rotate.
 Here is an example shell script, made by Dan9er:
 ==================================================================
 #!/bin/sh
 #   orientByExif.sh
 #   Dependencies: exiftool (exiflib) and convert (ImageMagick)
 #   Note: if there is no exif orientation data in the jpeg file,
 #         this simply copies the input file.
 #
 if [[ -z $(command -v exiftool) || -z $(command -v convert) ]]; then
     echo "You need to install dependencies; e.g.:"
     echo "   sudo apt install libimage-exiftool-perl"
     echo "   sudo apt install imagemagick"
     exit 1
 fi
 if [[ $# != 2 ]]; then
     echo "Syntax: orientByExif infile outfile"
     exit 2
 fi
 if [[ ${1: -4} != ".jpg" ]]; then
     echo "File is not a jpeg"
     exit 3
 fi
 if [[ $(exiftool -s3 -n -Orientation "$1") = 1 ]]; then
     echo "Image is already upright"
     exit 0
 fi
 convert "$1" -auto-orient "$2"
 echo "Done"
 exit 0
 ==================================================================

Definition in file flipdetect.c.

Function Documentation

◆ makeOrientDecision()

l_ok makeOrientDecision ( l_float32  upconf,
l_float32  leftconf,
l_float32  minupconf,
l_float32  minratio,
l_int32 *  porient,
l_int32  debug 
)

makeOrientDecision()

Parameters
[in]upconfnonzero
[in]leftconfnonzero
[in]minupconfminimum value for which a decision can be made
[in]minratiominimum conf ratio required for a decision
[out]porienttext orientation enum {0,1,2,3,4}
[in]debug1 for debug output; 0 otherwise
Returns
0 if OK, 1 on error
Notes:
     (1) This can be run after pixOrientDetect()
     (2) Both upconf and leftconf must be nonzero; otherwise the
         orientation cannot be determined.
     (3) The abs values of the input confidences are compared to
         minupconf.
     (4) The abs value of the largest of (upconf/leftconf) and
         (leftconf/upconf) is compared with minratio.
     (5) Input 0.0 for the default values for minupconf and minratio.
     (6) The return value of orient is interpreted thus:
           L_TEXT_ORIENT_UNKNOWN:  not enough evidence to determine
           L_TEXT_ORIENT_UP:       text rightside-up
           L_TEXT_ORIENT_LEFT:     landscape, text up facing left
           L_TEXT_ORIENT_DOWN:     text upside-down
           L_TEXT_ORIENT_RIGHT:    landscape, text up facing right

Definition at line 464 of file flipdetect.c.

References L_TEXT_ORIENT_UNKNOWN.

◆ pixMirrorDetect()

l_ok pixMirrorDetect ( PIX pixs,
l_float32 *  pconf,
l_int32  mincount,
l_int32  debug 
)

pixMirrorDetect()

Parameters
[in]pixs1 bpp, deskewed, English text
[out]pconfconfidence that text is not LR mirror reversed
[in]mincountmin number of left + right; use 0 for default
[in]debug1 for debug output; 0 otherwise
Returns
0 if OK, 1 on error
Notes:
     (1) For this test, it is necessary that the text is horizontally
         oriented, with ascenders going up.
     (2) conf is the normalized difference between the number of
         right and left facing characters with ascenders.
         Left-facing are {d}; right-facing are {b, h, k}.
         At least that was the expectation.  In practice, we can
         really just say that it is the normalized difference in
         hits using two specific hit-miss filters, textsel1 and textsel2,
         after the image has been suitably pre-filtered so that
         these filters are effective.  See (4) for what's really happening.
     (3) A large positive conf value indicates normal text, whereas
         a large negative conf value means the page is mirror reversed.
     (4) The implementation is a bit tricky.  The general idea is
         to fill the x-height part of characters, but not the space
         between them, before doing the HMT.  This is done by
         finding pixels added using two different operations -- a
         horizontal close and a vertical dilation -- and adding
         the intersection of these sets to the original.  It turns
         out that the original intuition about the signal was largely
         in error: much of the signal for right-facing characters
         comes from the lower part of common x-height characters, like
         the e and c, that remain open after these operations.
         So it's important that the operations to close the x-height
         parts of the characters are purposely weakened sufficiently
         to allow these characters to remain open.  The wonders
         of morphology!

Definition at line 718 of file flipdetect.c.

◆ pixOrientCorrect()

PIX* pixOrientCorrect ( PIX pixs,
l_float32  minupconf,
l_float32  minratio,
l_float32 *  pupconf,
l_float32 *  pleftconf,
l_int32 *  protation,
l_int32  debug 
)

pixOrientCorrect()

Parameters
[in]pixs1 bpp, deskewed, English text, 150 - 300 ppi
[in]minupconfminimum value for which a decision can be made
[in]minratiominimum conf ratio required for a decision
[out]pupconf[optional] ; use NULL to skip
[out]pleftconf[optional] ; use NULL to skip
[out]protation[optional] ; use NULL to skip
[in]debug1 for debug output; 0 otherwise
Returns
pixd may be rotated by 90, 180 or 270; null on error
Notes:
     (1) Simple top-level function to detect if Roman text is in
         reading orientation, and to rotate the image accordingly if not.
     (2) Returns a copy if no rotation is needed.
     (3) See notes for pixOrientDetect() and pixOrientDecision().
         Use 0.0 for default values for minupconf and minratio
     (4) Optional output of intermediate confidence results and
         the rotation performed on pixs.

Definition at line 274 of file flipdetect.c.

◆ pixOrientDetect()

l_ok pixOrientDetect ( PIX pixs,
l_float32 *  pupconf,
l_float32 *  pleftconf,
l_int32  mincount,
l_int32  debug 
)

pixOrientDetect()

Parameters
[in]pixs1 bpp, deskewed, English text, 150 - 300 ppi
[out]pupconf[optional] ; may be NULL
[out]pleftconf[optional] ; may be NULL
[in]mincountmin number of up + down; use 0 for default
[in]debug1 for debug output; 0 otherwise
Returns
0 if OK, 1 on error
Notes:
     (1) See "Measuring document image skew and orientation"
         Dan S. Bloomberg, Gary E. Kopec and Lakshmi Dasari
         IS&T/SPIE EI'95, Conference 2422: Document Recognition II
         pp 302-316, Feb 6-7, 1995, San Jose, CA
     (2) upconf is the normalized difference between up ascenders
         and down ascenders.  The image is analyzed without rotation
         for being rightside-up or upside-down.  Set &upconf to null
         to skip this operation.
     (3) leftconf is the normalized difference between up ascenders
         and down ascenders in the image after it has been
         rotated 90 degrees clockwise.  With that rotation, ascenders
         projecting to the left in the source image will project up
         in the rotated image.  We compute this by rotating 90 degrees
         clockwise and testing for up and down ascenders.  Set
         &leftconf to null to skip this operation.
     (4) Note that upconf and leftconf are not linear measures of
         confidence, e.g., in a range between 0 and 100.  They
         measure how far you are out on the tail of a (presumably)
         normal distribution.  For example, a confidence of 10 means
         that it is nearly certain that the difference did not
         happen at random.  However, these values must be interpreted
         cautiously, taking into consideration the estimated prior
         for a particular orientation or mirror flip.   The up-down
         signal is very strong if applied to text with ascenders
         up and down, and relatively weak for text at 90 degrees,
         but even at 90 degrees, the difference can look significant.
         For example, suppose the ascenders are oriented horizontally,
         but the test is done vertically.  Then upconf can
         be < -MIN_CONF_FOR_UP_DOWN, suggesting the text may be
         upside-down.  However, if instead the test were done
         horizontally, leftconf will be very much larger
         (in absolute value), giving the correct orientation.
     (5) If you compute both upconf and leftconf, and there is
         sufficient signal, the following table determines the
         cw angle necessary to rotate pixs so that the text is
         rightside-up:
            0 deg :           upconf >> 1,    abs(upconf) >> abs(leftconf)
            90 deg :          leftconf >> 1,  abs(leftconf) >> abs(upconf)
            180 deg :         upconf << -1,   abs(upconf) >> abs(leftconf)
            270 deg :         leftconf << -1, abs(leftconf) >> abs(upconf)
     (6) One should probably not interpret the direction unless
         there are a sufficient number of counts for both orientations,
         in which case neither upconf nor leftconf will be 0.0.
     (7) This algorithm will fail on some images, such as tables,
         where most of the characters are numbers and appear as
         uppercase, but there are some repeated words that give a
         biased signal.  It may be advisable to run a table detector
         first (e.g., pixDecideIfTable()), and not run the orientation
         detector if it is a table.
     (8) Uses rasterop implementation of HMT.

Definition at line 405 of file flipdetect.c.

◆ pixUpDownDetect()

l_ok pixUpDownDetect ( PIX pixs,
l_float32 *  pconf,
l_int32  mincount,
l_int32  npixels,
l_int32  debug 
)

pixUpDownDetect()

Parameters
[in]pixs1 bpp, deskewed, English text, 150 - 300 ppi
[out]pconfconfidence that text is rightside-up
[in]mincountmin number of up + down; use 0 for default
[in]npixelsnumber of pixels removed from each side of word box
[in]debug1 for debug output; 0 otherwise
Returns
0 if OK, 1 on error
Notes:
     (1) See pixOrientDetect() for other details.
     (2) The detected confidence conf is the normalized difference
         between the number of detected up and down ascenders,
         assuming that the text is either rightside-up or upside-down
         and not rotated at a 90 degree angle.
     (3) The typical mode of operation is npixels == 0.
         If npixels > 0, this removes HMT matches at the
         beginning and ending of "words."  This is useful for
         pages that may have mostly digits, because if npixels == 0,
         leading "1" and "3" digits can register as having
         ascenders or descenders, and "7" digits can match descenders.
         Consequently, a page image of only digits may register
         as being upside-down.
     (4) We want to count the number of instances found using the HMT.
         An expensive way to do this would be to count the
         number of connected components.  A cheap way is to do a rank
         reduction cascade that reduces each component to a single
         pixel, and results (after two or three 2x reductions)
         in one pixel for each of the original components.
         After the reduction, you have a much smaller pix over
         which to count pixels.  We do only 2 reductions, because
         this function is designed to work for input pix between
         150 and 300 ppi, and an 8x reduction on a 150 ppi image
         is going too far -- components will get merged.

Definition at line 558 of file flipdetect.c.

Variable Documentation

◆ textsel1

const char* textsel1
static
Initial value:
= "x oo "
"x oOo "
"x o "
"x "
"xxxxxx"

Definition at line 209 of file flipdetect.c.

◆ textsel2

const char* textsel2
static
Initial value:
= " oo x"
" oOo x"
" o x"
" x"
"xxxxxx"

Definition at line 215 of file flipdetect.c.

◆ textsel3

const char* textsel3
static
Initial value:
= "xxxxxx"
"x "
"x o "
"x oOo "
"x oo "

Definition at line 221 of file flipdetect.c.

◆ textsel4

const char* textsel4
static
Initial value:
= "xxxxxx"
" x"
" o x"
" oOo x"
" oo x"

Definition at line 227 of file flipdetect.c.