Leptonica  1.82.0
Image processing and image analysis suite
partition.c File Reference
#include "allheaders.h"

Go to the source code of this file.

Data Structures

struct  PartitionElement
 

Macros

#define OUTPUT_HEAP_STATS   0
 

Typedefs

typedef struct PartitionElement PARTEL
 

Functions

static PARTELpartelCreate (BOX *box)
 
static void partelDestroy (PARTEL **ppartel)
 
static l_int32 partelSetSize (PARTEL *partel, l_int32 sortflag)
 
static BOXAboxaGenerateSubboxes (BOX *box, BOXA *boxa, l_int32 maxperim, l_float32 fract)
 
static BOXboxaSelectPivotBox (BOX *box, BOXA *boxa, l_int32 maxperim, l_float32 fract)
 
static l_int32 boxCheckIfOverlapIsBig (BOX *box, BOXA *boxa, l_float32 maxoverlap)
 
BOXAboxaGetWhiteblocks (BOXA *boxas, BOX *box, l_int32 sortflag, l_int32 maxboxes, l_float32 maxoverlap, l_int32 maxperim, l_float32 fract, l_int32 maxpops)
 
BOXAboxaPruneSortedOnOverlap (BOXA *boxas, l_float32 maxoverlap)
 

Variables

static const l_int32 DefaultMaxPops = 20000
 

Detailed Description


     Whitespace block extraction
         BOXA            *boxaGetWhiteblocks()

     Helpers
         static PARTEL   *partelCreate()
         static void      partelDestroy()
         static l_int32   partelSetSize()
         static BOXA     *boxaGenerateSubboxes()
         static BOX      *boxaSelectPivotBox()
         static l_int32   boxaCheckIfOverlapIsSmall()
         BOXA            *boxaPruneSortedOnOverlap()

Definition in file partition.c.

Function Documentation

◆ boxaGenerateSubboxes()

static BOXA * boxaGenerateSubboxes ( BOX box,
BOXA boxa,
l_int32  maxperim,
l_float32  fract 
)
static

boxaGenerateSubboxes()

Parameters
[in]boxregion to be split into up to four overlapping subregions
[in]boxaboxes of rectangles intersecting the box
[in]maxperimmaximum half-perimeter for which pivot is selected by proximity to box centroid
[in]fractfraction of box diagonal that is an acceptable distance from the box centroid to select the pivot
Returns
boxa of four or less overlapping subrectangles of the box, or NULL on error

Definition at line 408 of file partition.c.

References boxaAddBox(), boxaCreate(), boxaSelectPivotBox(), boxCreate(), boxDestroy(), boxGetGeometry(), and L_INSERT.

◆ boxaGetWhiteblocks()

BOXA* boxaGetWhiteblocks ( BOXA boxas,
BOX box,
l_int32  sortflag,
l_int32  maxboxes,
l_float32  maxoverlap,
l_int32  maxperim,
l_float32  fract,
l_int32  maxpops 
)

boxaGetWhiteblocks()

Parameters
[in]boxastyp. a set of bounding boxes of fg components
[in]boxinitial region; typically including all boxes in boxas; if null, it computes the region to include all boxes in boxas
[in]sortflagL_SORT_BY_WIDTH, L_SORT_BY_HEIGHT, L_SORT_BY_MIN_DIMENSION, L_SORT_BY_MAX_DIMENSION, L_SORT_BY_PERIMETER, L_SORT_BY_AREA
[in]maxboxesmax number of output whitespace boxes; e.g., 100
[in]maxoverlapmaximum fractional overlap of a box by any of the larger boxes; e.g., 0.2
[in]maxperimmaximum half-perimeter, in pixels, for which pivot is selected by proximity to box centroid; e.g., 200
[in]fractfraction of box diagonal that is an acceptable distance from the box centroid to select the pivot; e.g., 0.2
[in]maxpopsmax number of pops from the heap; use 0 as default
Returns
boxa of sorted whitespace boxes, or NULL on error
Notes:
     (1) This uses the elegant Breuel algorithm, found in "Two
         Geometric Algorithms for Layout Analysis", 2002,
         url: "citeseer.ist.psu.edu/breuel02two.html".
         It starts with the bounding boxes (b.b.) of the connected
         components (c.c.) in a region, along with the rectangle
         representing that region.  It repeatedly divides the
         rectangle into four maximal rectangles that exclude a
         pivot rectangle, sorting them in a priority queue
         according to one of the six sort flags.  It returns a boxa
         of the "largest" set that have no intersection with boxes
         from the input boxas.
     (2) If box == NULL, the initial region is the minimal region
         that includes the origin and every box in boxas.
     (3) maxboxes is the maximum number of whitespace boxes that will
         be returned.  The actual number will depend on the image
         and the values chosen for maxoverlap and maxpops.  In many
         cases, the actual number will be 'maxboxes'.
     (4) maxoverlap allows pruning of whitespace boxes depending on
         the overlap.  To avoid all pruning, use maxoverlap = 1.0.
         To select only boxes that have no overlap with each other
         (maximal pruning), choose maxoverlap = 0.0.
         Otherwise, no box can have more than the 'maxoverlap' fraction
         of its area overlapped by any larger (in the sense of the
         sortflag) box.
     (5) Choose maxperim (actually, maximum half-perimeter) to
         represent a c.c. that is small enough so that you don't care
         about the white space that could be inside of it.  For all such
         c.c., the pivot for 'quadfurcation' of a rectangle is selected
         as having a reasonable proximity to the rectangle centroid.
     (6) Use fract in the range [0.0 ... 1.0].  Set fract = 0.0
         to choose the small box nearest the centroid as the pivot.
         If you choose fract > 0.0, it is suggested that you call
         boxaPermuteRandom() first, to permute the boxes (see usage below).
         This should reduce the search time for each of the pivot boxes.
     (7) Choose maxpops to be the maximum number of rectangles that
         are popped from the heap.  This is an indirect way to limit the
         execution time.  Use 0 for default (a fairly large number).
         At any time, you can expect the heap to contain about
         2.5 times as many boxes as have been popped off.
     (8) The output result is a sorted set of overlapping
         boxes, constrained by 'maxboxes', 'maxoverlap' and 'maxpops'.
     (9) The main defect of the method is that it abstracts out the
         actual components, retaining only the b.b. for analysis.
         Consider a component with a large b.b.  If this is chosen
         as a pivot, all white space inside is immediately taken
         out of consideration.  Furthermore, even if it is never chosen
         as a pivot, as the partitioning continues, at no time will
         any of the whitespace inside this component be part of a
         rectangle with zero overlapping boxes.  Thus, the interiors
         of all boxes are necessarily excluded from the union of
         the returned whitespace boxes.
    (10) It should be noted that the algorithm puts a large number
         of partels on the queue.  Setting a limit of X partels to
         remove from the queue, one typically finds that there will be
         several times that number (say, 2X - 3X) left on the queue.
         For an efficient algorithm to find the largest white or
         or black rectangles, without permitting them to overlap,
         see pixFindLargeRectangles().
    (11) USAGE: One way to accommodate to this weakness is to remove such
         large b.b. before starting the computation.  For example,
         if 'box' is an input image region containing 'boxa' b.b. of c.c.:

                  // Faster pivot choosing
              boxaPermuteRandom(boxa, boxa);

                  // Remove anything either large width or height
              boxat = boxaSelectBySize(boxa, maxwidth, maxheight,
                                       L_SELECT_IF_BOTH, L_SELECT_IF_LT,
                                       NULL);

              boxad = boxaGetWhiteblocks(boxat, box, type, maxboxes,
                                         maxoverlap, maxperim, fract,
                                         maxpops);

         The result will be rectangular regions of "white space" that
         extend into (and often through) the excluded components.
    (11) As a simple example, suppose you wish to find the columns on a page.
         First exclude large c.c. that may block the columns, and then call:

              boxad = boxaGetWhiteblocks(boxa, box, L_SORT_BY_HEIGHT,
                                         20, 0.15, 200, 0.2, 2000);

         to get the 20 tallest boxes with no more than 0.15 overlap
         between a box and any of the taller ones, and avoiding the
         use of any c.c. with a b.b. half perimeter greater than 200
         as a pivot.

Definition at line 192 of file partition.c.

References L_SORT_BY_AREA, L_SORT_BY_HEIGHT, L_SORT_BY_MAX_DIMENSION, L_SORT_BY_MIN_DIMENSION, L_SORT_BY_PERIMETER, and L_SORT_BY_WIDTH.

◆ boxaPruneSortedOnOverlap()

BOXA* boxaPruneSortedOnOverlap ( BOXA boxas,
l_float32  maxoverlap 
)

boxaPruneSortedOnOverlap()

Parameters
[in]boxassorted by size in decreasing order
[in]maxoverlapmaximum fractional overlap of a box by any of the larger boxes
Returns
boxad pruned, or NULL on error
Notes:
     (1) This selectively removes smaller boxes when they are overlapped
         by any larger box by more than the input 'maxoverlap' fraction.
     (2) To avoid all pruning, use maxoverlap = 1.0.  To select only
         boxes that have no overlap with each other (maximal pruning),
         set maxoverlap = 0.0.
     (3) If there are no boxes in boxas, returns an empty boxa.

Definition at line 621 of file partition.c.

References boxaAddBox(), boxaCopy(), boxaCreate(), boxaGetBox(), boxaGetCount(), boxDestroy(), boxOverlapFraction(), L_CLONE, L_COPY, and L_INSERT.

◆ boxaSelectPivotBox()

static BOX * boxaSelectPivotBox ( BOX box,
BOXA boxa,
l_int32  maxperim,
l_float32  fract 
)
static

boxaSelectPivotBox()

Parameters
[in]boxcontaining box; to be split by the pivot box
[in]boxaboxes of rectangles, from which 1 is to be chosen
[in]maxperimmaximum half-perimeter for which pivot is selected by proximity to box centroid
[in]fractfraction of box diagonal that is an acceptable distance from the box centroid to select the pivot
Returns
box pivot box for subdivision into 4 rectangles, or NULL on error
Notes:
     (1) This is a tricky piece that wasn't discussed in the
         Breuel's 2002 paper.
     (2) Selects a box from boxa whose centroid is reasonably close to
         the centroid of the containing box (xc, yc) and whose
         half-perimeter does not exceed the maxperim value.
     (3) If there are no boxes in the boxa that are small enough,
         then it selects the smallest of the larger boxes,
         without reference to its location in the containing box.
     (4) If a small box has a centroid at a distance from the
         centroid of the containing box that is not more than
         the fraction 'fract' of the diagonal of the containing
         box, that box is chosen as the pivot, terminating the
         search for the nearest small box.
     (5) Use fract in the range [0.0 ... 1.0].  Set fract = 0.0
         to choose the small box nearest the centroid.
     (6) Choose maxperim to represent a connected component that is
         small enough so that you don't care about the white space
         that could be inside of it.

Definition at line 486 of file partition.c.

References boxaGetBox(), boxaGetBoxGeometry(), boxaGetCount(), boxDestroy(), boxGetCenter(), boxGetGeometry(), L_CLONE, and L_COPY.

Referenced by boxaGenerateSubboxes().

◆ boxCheckIfOverlapIsBig()

static l_int32 boxCheckIfOverlapIsBig ( BOX box,
BOXA boxa,
l_float32  maxoverlap 
)
static

boxCheckIfOverlapIsBig()

Parameters
[in]boxto be tested
[in]boxaof boxes already stored
[in]maxoverlapmaximum fractional overlap of the input box by any of the boxes in boxa
Returns
0 if box has small overlap with every box in boxa; 1 otherwise or on error

Definition at line 566 of file partition.c.

References boxaGetBox(), boxaGetCount(), boxDestroy(), boxOverlapFraction(), and L_CLONE.

◆ partelCreate()

static PARTEL * partelCreate ( BOX box)
static

partelCreate()

Parameters
[in]boxregion; inserts a copy
Returns
partel, or NULL on error

Definition at line 316 of file partition.c.

References boxCopy().

◆ partelDestroy()

static void partelDestroy ( PARTEL **  ppartel)
static

partelDestroy()

Parameters
[in,out]ppartelcontents will be set to null before returning
Returns
void

Definition at line 333 of file partition.c.

References boxaDestroy(), and boxDestroy().

◆ partelSetSize()

static l_int32 partelSetSize ( PARTEL partel,
l_int32  sortflag 
)
static

partelSetSize()

Parameters
[in]partel
[in]sortflagL_SORT_BY_WIDTH, L_SORT_BY_HEIGHT, L_SORT_BY_MIN_DIMENSION, L_SORT_BY_MAX_DIMENSION, L_SORT_BY_PERIMETER, L_SORT_BY_AREA
Returns
0 if OK, 1 on error

Definition at line 365 of file partition.c.

References boxGetGeometry(), L_SORT_BY_AREA, L_SORT_BY_HEIGHT, L_SORT_BY_MAX_DIMENSION, L_SORT_BY_MIN_DIMENSION, L_SORT_BY_PERIMETER, and L_SORT_BY_WIDTH.