Class MultipassTermFilteredPresearcher


  • public class MultipassTermFilteredPresearcher
    extends TermFilteredPresearcher
    A TermFilteredPresearcher that indexes queries multiple times, with terms collected from different routes through a querytree. Each route will produce a set of terms that are *sufficient* to select the query, and are indexed into a separate, suffixed field.

    Incoming documents are then converted to a set of Disjunction queries over each suffixed field, and these queries are combined into a conjunction query, such that the document's set of terms must match a term from each route.

    This allows filtering out of documents that contain one half of a two-term phrase query, for example. The query "hello world" will be indexed twice, once under 'hello' and once under 'world'. A document containing the terms "hello there" would match the first field, but not the second, and so would not be selected for matching.

    The number of passes the presearcher makes is configurable. More passes will improve the selected/matched ratio, but will take longer to index and will use more RAM.

    A minimum weight can we set for terms to be chosen for the second and subsequent passes. This allows users to avoid indexing stopwords, for example.

    • Field Detail

      • passes

        private final int passes
      • minWeight

        private final float minWeight
    • Constructor Detail

      • MultipassTermFilteredPresearcher

        public MultipassTermFilteredPresearcher​(int passes,
                                                float minWeight,
                                                TermWeightor weightor,
                                                java.util.List<CustomQueryHandler> queryHandlers,
                                                java.util.Set<java.lang.String> filterFields)
        Construct a new MultipassTermFilteredPresearcher
        Parameters:
        passes - the number of times a query should be indexed
        minWeight - the minimum weight a querytree should be advanced over
        weightor - the TreeWeightor to use
        queryHandlers - a list of custom query handlers
        filterFields - a set of fields to use as filters
      • MultipassTermFilteredPresearcher

        public MultipassTermFilteredPresearcher​(int passes)
        Construct a new MultipassTermFilteredPresearcher using TermFilteredPresearcher.DEFAULT_WEIGHTOR

        Note that this will be constructed with a minimum advance weight of zero

        Parameters:
        passes - the number of times a query should be indexed