Class ShingleAnalyzerWrapper

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public final class ShingleAnalyzerWrapper
    extends AnalyzerWrapper
    A ShingleAnalyzerWrapper wraps a ShingleFilter around another Analyzer.

    A shingle is another name for a token based n-gram.

    Since:
    3.1
    • Field Detail

      • delegate

        private final Analyzer delegate
      • maxShingleSize

        private final int maxShingleSize
      • minShingleSize

        private final int minShingleSize
      • tokenSeparator

        private final java.lang.String tokenSeparator
      • outputUnigrams

        private final boolean outputUnigrams
      • outputUnigramsIfNoShingles

        private final boolean outputUnigramsIfNoShingles
      • fillerToken

        private final java.lang.String fillerToken
    • Constructor Detail

      • ShingleAnalyzerWrapper

        public ShingleAnalyzerWrapper​(Analyzer defaultAnalyzer)
      • ShingleAnalyzerWrapper

        public ShingleAnalyzerWrapper​(Analyzer defaultAnalyzer,
                                      int maxShingleSize)
      • ShingleAnalyzerWrapper

        public ShingleAnalyzerWrapper​(Analyzer defaultAnalyzer,
                                      int minShingleSize,
                                      int maxShingleSize)
      • ShingleAnalyzerWrapper

        public ShingleAnalyzerWrapper​(Analyzer delegate,
                                      int minShingleSize,
                                      int maxShingleSize,
                                      java.lang.String tokenSeparator,
                                      boolean outputUnigrams,
                                      boolean outputUnigramsIfNoShingles,
                                      java.lang.String fillerToken)
        Creates a new ShingleAnalyzerWrapper
        Parameters:
        delegate - Analyzer whose TokenStream is to be filtered
        minShingleSize - Min shingle (token ngram) size
        maxShingleSize - Max shingle size
        tokenSeparator - Used to separate input stream tokens in output shingles
        outputUnigrams - Whether or not the filter shall pass the original tokens to the output stream
        outputUnigramsIfNoShingles - Overrides the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? Note that if outputUnigrams==true, then unigrams are always output, regardless of whether any shingles are available.
        fillerToken - filler token to use when positionIncrement is more than 1
      • ShingleAnalyzerWrapper

        public ShingleAnalyzerWrapper()
      • ShingleAnalyzerWrapper

        public ShingleAnalyzerWrapper​(int minShingleSize,
                                      int maxShingleSize)
    • Method Detail

      • getMaxShingleSize

        public int getMaxShingleSize()
        The max shingle (token ngram) size
        Returns:
        The max shingle (token ngram) size
      • getMinShingleSize

        public int getMinShingleSize()
        The min shingle (token ngram) size
        Returns:
        The min shingle (token ngram) size
      • getTokenSeparator

        public java.lang.String getTokenSeparator()
      • isOutputUnigrams

        public boolean isOutputUnigrams()
      • isOutputUnigramsIfNoShingles

        public boolean isOutputUnigramsIfNoShingles()
      • getFillerToken

        public java.lang.String getFillerToken()
      • getWrappedAnalyzer

        public final Analyzer getWrappedAnalyzer​(java.lang.String fieldName)
        Description copied from class: AnalyzerWrapper
        Retrieves the wrapped Analyzer appropriate for analyzing the field with the given name
        Specified by:
        getWrappedAnalyzer in class AnalyzerWrapper
        Parameters:
        fieldName - Name of the field which is to be analyzed
        Returns:
        Analyzer for the field with the given name. Assumed to be non-null
      • wrapComponents

        protected Analyzer.TokenStreamComponents wrapComponents​(java.lang.String fieldName,
                                                                Analyzer.TokenStreamComponents components)
        Description copied from class: AnalyzerWrapper
        Wraps / alters the given TokenStreamComponents, taken from the wrapped Analyzer, to form new components. It is through this method that new TokenFilters can be added by AnalyzerWrappers. By default, the given components are returned.
        Overrides:
        wrapComponents in class AnalyzerWrapper
        Parameters:
        fieldName - Name of the field which is to be analyzed
        components - TokenStreamComponents taken from the wrapped Analyzer
        Returns:
        Wrapped / altered TokenStreamComponents.