Java Code Examples for org.apache.lucene.analysis.shingle.ShingleFilter#setTokenSeparator()

The following examples show how to use org.apache.lucene.analysis.shingle.ShingleFilter#setTokenSeparator() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: ShingleTokenFilterFactory.java    From crate with Apache License 2.0 6 votes vote down vote up
@Override
public TokenStream create(TokenStream tokenStream) {
    ShingleFilter filter = new ShingleFilter(tokenStream, minShingleSize, maxShingleSize);
    filter.setOutputUnigrams(outputUnigrams);
    filter.setOutputUnigramsIfNoShingles(outputUnigramsIfNoShingles);
    filter.setTokenSeparator(tokenSeparator);
    filter.setFillerToken(fillerToken);
    if (outputUnigrams || (minShingleSize != maxShingleSize)) {
        /**
         * We disable the graph analysis on this token stream
         * because it produces shingles of different size.
         * Graph analysis on such token stream is useless and dangerous as it may create too many paths
         * since shingles of different size are not aligned in terms of positions.
         */
        filter.addAttribute(DisableGraphAttribute.class);
    }
    return filter;
}
 
Example 2
Source File: ShingleTokenFilterFactory.java    From Elasticsearch with Apache License 2.0 5 votes vote down vote up
@Override
public TokenStream create(TokenStream tokenStream) {
    ShingleFilter filter = new ShingleFilter(tokenStream, minShingleSize, maxShingleSize);
    filter.setOutputUnigrams(outputUnigrams);
    filter.setOutputUnigramsIfNoShingles(outputUnigramsIfNoShingles);
    filter.setTokenSeparator(tokenSeparator);
    filter.setFillerToken(fillerToken);
    return filter;
}