|
static void | pixDebugFlipDetect (const char *filename, PIX *pixs, PIX *pixhm, l_int32 enable) |
|
PIX * | pixOrientCorrect (PIX *pixs, l_float32 minupconf, l_float32 minratio, l_float32 *pupconf, l_float32 *pleftconf, l_int32 *protation, l_int32 debug) |
|
l_ok | pixOrientDetect (PIX *pixs, l_float32 *pupconf, l_float32 *pleftconf, l_int32 mincount, l_int32 debug) |
|
l_ok | makeOrientDecision (l_float32 upconf, l_float32 leftconf, l_float32 minupconf, l_float32 minratio, l_int32 *porient, l_int32 debug) |
|
l_ok | pixUpDownDetect (PIX *pixs, l_float32 *pconf, l_int32 mincount, l_int32 npixels, l_int32 debug) |
|
l_ok | pixMirrorDetect (PIX *pixs, l_float32 *pconf, l_int32 mincount, l_int32 debug) |
|
High-level interface for detection and correction
PIX *pixOrientCorrect()
Page orientation detection (pure rotation by 90 degree increments):
l_int32 pixOrientDetect()
l_int32 makeOrientDecision()
l_int32 pixUpDownDetect()
Page mirror detection (flip 180 degrees about line in plane of image):
l_int32 pixMirrorDetect()
Static debug helper
static void pixDebugFlipDetect()
===================================================================
Page transformation detection:
Once a page is deskewed, there are 8 possible states that it
can be in, shown symbolically below. Suppose state 0 is correct.
0: correct 1 2 3
+------+ +------+ +------+ +------+
| **** | | * | | **** | | * |
| * | | * | | * | | * |
| * | | **** | | * | | **** |
+------+ +------+ +------+ +------+
4 5 6 7
+-----+ +-----+ +-----+ +-----+
| *** | | * | | *** | | * |
| * | | * | | * | | * |
| * | | * | | * | | * |
| * | | *** | | * | | *** |
+-----+ +-----+ +-----+ +-----+
Each of the other seven can be derived from state 0 by applying some
combination of a 90 degree clockwise rotation, a flip about
a horizontal line, and a flip about a vertical line,
all abbreviated as:
R = Rotation (about a line perpendicular to the image)
H = Horizontal flip (about a vertical line in the plane of the image)
V = Vertical flip (about a horizontal line in the plane of the image)
We get these transformations:
RHV
000 -> 0
001 -> 1
010 -> 2
011 -> 3
100 -> 4
101 -> 5
110 -> 6
111 -> 7
Note that in four of these, the sum of H and V is 1 (odd).
For these four, we have a change in parity (handedness) of
the image, and the transformation cannot be performed by
rotation about a vertical line out of the page. Under
rotation R, the set of 8 transformations decomposes into
two subgroups linking {0, 3, 4, 7} and {1, 2, 5, 6} independently.
pixOrientDetect() tests for a pure rotation (0, 90, 180, 270 degrees).
It doesn't change parity.
pixMirrorDetect() tests for a horizontal flip about the vertical axis.
It changes parity.
The landscape/portrait rotation can be detected in two ways:
(1) Compute the deskew confidence for an image segment,
both as is and rotated 90 degrees (see skew.c).
(2) Compute the ascender/descender signal for the image,
both as is and rotated 90 degrees (implemented here).
The ascender/descender signal is useful for determining text
orientation in Roman alphabets because the incidence of letters
with straight-line ascenders (b, d, h, k, l, 't') outnumber
those with descenders ('g', p, q). The letters 't' and 'g'
will respond variably to the filter, depending on the type face.
What about the mirror image situations? These aren't common
unless you're dealing with film, for example.
But you can reliably test if the image has undergone a
parity-changing flip once about some axis in the plane
of the image, using pixMirrorDetect*(). This works ostensibly by
counting the number of characters with ascenders that
stick out to the left and right of the ascender. Characters
that are not mirror flipped are more likely to extend to the
right (b, h, k) than to the left (d). Of course, that is for
text that is rightside-up. So before you apply the mirror
test, it is necessary to insure that the text has the ascenders
going up, and not down or to the left or right. But here's
what *really* happens. It turns out that the pre-filtering before
the hit-miss transform (HMT) is crucial, and surprisingly, when
the pre-filtering is chosen to generate a large signal, the majority
of the signal comes from open regions of common lower-case
letters such as 'e', 'c' and 'f'.
The set of operations you actually use depends on your prior knowledge:
(1) If the page is known to be either rightside-up or upside-down, use
either pixOrientDetect() with pleftconf = NULL, or
pixUpDownDetect().
(2) If any of the four orientations are possible, use pixOrientDetect().
(3) If the text is horizontal and rightside-up, the only remaining
degree of freedom is a left-right mirror flip: use pixMirrorDetect().
(4) If you have a relatively large amount of numbers on the page,
use the slower pixUpDownDetect().
We summarize the full orientation and mirror flip detection process:
(1) First determine which of the four 90 degree rotations
causes the text to be rightside-up. This can be done
with either skew confidence or the pixOrientDetect()
signals. For the latter, see the table for pixOrientDetect().
(2) Then, with ascenders pointing up, apply pixMirrorDetect().
In the normal situation the confidence confidence will be
large and positive. However, if mirror flipped, the
confidence will be large and negative.
A high-level interface, pixOrientCorrect() combines the detection
of the orientation with the rotation decision and the rotation itself.
For pedagogical reasons, we have included a dwa implementation of
this functionality, in flipdetectdwa.c.notused. It shows by example
how to make a dwa implementation of an application that uses binary
morphological operations. It is faster than the rasterop implementation,
but not by a large amount.
Finally, use can be made of programs such as exiftool and convert to
read exif camera orientation data in jpeg files and conditionally rotate.
Here is an example shell script, made by Dan9er:
==================================================================
#!/bin/sh
# orientByExif.sh
# Dependencies: exiftool (exiflib) and convert (ImageMagick)
# Note: if there is no exif orientation data in the jpeg file,
# this simply copies the input file.
#
if [[ -z $(command -v exiftool) || -z $(command -v convert) ]]; then
echo "You need to install dependencies; e.g.:"
echo " sudo apt install libimage-exiftool-perl"
echo " sudo apt install imagemagick"
exit 1
fi
if [[ $# != 2 ]]; then
echo "Syntax: orientByExif infile outfile"
exit 2
fi
if [[ ${1: -4} != ".jpg" ]]; then
echo "File is not a jpeg"
exit 3
fi
if [[ $(exiftool -s3 -n -Orientation "$1") = 1 ]]; then
echo "Image is already upright"
exit 0
fi
convert "$1" -auto-orient "$2"
echo "Done"
exit 0
==================================================================
Definition in file flipdetect.c.
l_ok pixOrientDetect |
( |
PIX * |
pixs, |
|
|
l_float32 * |
pupconf, |
|
|
l_float32 * |
pleftconf, |
|
|
l_int32 |
mincount, |
|
|
l_int32 |
debug |
|
) |
| |
pixOrientDetect()
- Parameters
-
[in] | pixs | 1 bpp, deskewed, English text, 150 - 300 ppi |
[out] | pupconf | [optional] ; may be NULL |
[out] | pleftconf | [optional] ; may be NULL |
[in] | mincount | min number of up + down; use 0 for default |
[in] | debug | 1 for debug output; 0 otherwise |
- Returns
- 0 if OK, 1 on error
Notes:
(1) See "Measuring document image skew and orientation"
Dan S. Bloomberg, Gary E. Kopec and Lakshmi Dasari
IS&T/SPIE EI'95, Conference 2422: Document Recognition II
pp 302-316, Feb 6-7, 1995, San Jose, CA
(2) upconf is the normalized difference between up ascenders
and down ascenders. The image is analyzed without rotation
for being rightside-up or upside-down. Set &upconf to null
to skip this operation.
(3) leftconf is the normalized difference between up ascenders
and down ascenders in the image after it has been
rotated 90 degrees clockwise. With that rotation, ascenders
projecting to the left in the source image will project up
in the rotated image. We compute this by rotating 90 degrees
clockwise and testing for up and down ascenders. Set
&leftconf to null to skip this operation.
(4) Note that upconf and leftconf are not linear measures of
confidence, e.g., in a range between 0 and 100. They
measure how far you are out on the tail of a (presumably)
normal distribution. For example, a confidence of 10 means
that it is nearly certain that the difference did not
happen at random. However, these values must be interpreted
cautiously, taking into consideration the estimated prior
for a particular orientation or mirror flip. The up-down
signal is very strong if applied to text with ascenders
up and down, and relatively weak for text at 90 degrees,
but even at 90 degrees, the difference can look significant.
For example, suppose the ascenders are oriented horizontally,
but the test is done vertically. Then upconf can
be < -MIN_CONF_FOR_UP_DOWN, suggesting the text may be
upside-down. However, if instead the test were done
horizontally, leftconf will be very much larger
(in absolute value), giving the correct orientation.
(5) If you compute both upconf and leftconf, and there is
sufficient signal, the following table determines the
cw angle necessary to rotate pixs so that the text is
rightside-up:
0 deg : upconf >> 1, abs(upconf) >> abs(leftconf)
90 deg : leftconf >> 1, abs(leftconf) >> abs(upconf)
180 deg : upconf << -1, abs(upconf) >> abs(leftconf)
270 deg : leftconf << -1, abs(leftconf) >> abs(upconf)
(6) One should probably not interpret the direction unless
there are a sufficient number of counts for both orientations,
in which case neither upconf nor leftconf will be 0.0.
(7) This algorithm will fail on some images, such as tables,
where most of the characters are numbers and appear as
uppercase, but there are some repeated words that give a
biased signal. It may be advisable to run a table detector
first (e.g., pixDecideIfTable()), and not run the orientation
detector if it is a table.
(8) Uses rasterop implementation of HMT.
Definition at line 405 of file flipdetect.c.