Java Code Examples for org.apache.lucene.analysis.WordlistLoader#getWordSet()

The following examples show how to use org.apache.lucene.analysis.WordlistLoader#getWordSet() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.

Example 1

Source File: TokenSearch.java From datawave with Apache License 2.0

6 votes

/**
 * Load stopwords from the specified file located in the classpath.
 * <p>
 * If a directory name is specified, e.g: <code>tmp/stopwords.txt</code> that path will be used when searching for the resource. Otherwise, the package
 * contianing the DefaultTokenSearch class may be used.
 * <p>
 * The current thread's context classloader will be used to load the specified filename as a resource.
 * 
 * @param filename
 *            the filename containing the stoplist to load, located using the rules described above.
 * @return a lucene {@code CharArraySet} containing the stopwords. This is configured to be case insensitive.
 * @throws IOException
 *             if there is a problem finding or loading the specified stop word file..
 */
public static CharArraySet loadStopWords(String filename) throws IOException {
    Closer closer = Closer.create();
    try {
        CharArraySet stopSet = new CharArraySet(16, true /* ignore case */);
        String pkg = Factory.class.getPackage().getName().replace('.', '/');
        String resource = filename.indexOf("/") > -1 ? filename : (pkg + "/" + filename);
        InputStream resourceStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(resource);
        logger.info("Loading stopwords file " + filename + " from resource " + resource);
        if (resourceStream == null) {
            throw new FileNotFoundException("Unable to load stopword file as resource " + filename);
        }
        Reader reader = IOUtils.getDecodingReader(resourceStream, StandardCharsets.UTF_8);
        closer.register(reader);
        CharArraySet set = WordlistLoader.getWordSet(reader, "#", stopSet);
        logger.info("Loaded " + set.size() + " stopwords from " + filename + " (" + resource + ")");
        return set;
    } finally {
        closer.close();
    }
}

Example 2

Source File: ClassicAnalyzer.java From projectforge-webapp with GNU General Public License v3.0

2 votes

/** Builds an analyzer with the stop words from the given file.
 * @see WordlistLoader#getWordSet(File)
 * @param matchVersion Lucene version to match See {@link
 * <a href="#version">above</a>}
 * @param stopwords File to read stop words from */
public ClassicAnalyzer(final Version matchVersion, final File stopwords) throws IOException {
  this(matchVersion, WordlistLoader.getWordSet(stopwords));
}

Example 3

Source File: ClassicAnalyzer.java From projectforge-webapp with GNU General Public License v3.0

2 votes

/** Builds an analyzer with the stop words from the given reader.
 * @see WordlistLoader#getWordSet(Reader)
 * @param matchVersion Lucene version to match See {@link
 * <a href="#version">above</a>}
 * @param stopwords Reader to read stop words from */
public ClassicAnalyzer(final Version matchVersion, final Reader stopwords) throws IOException {
  this(matchVersion, WordlistLoader.getWordSet(stopwords));
}

Example 4

Source File: StandardAnalyzer.java From projectforge-webapp with GNU General Public License v3.0

2 votes

/** Builds an analyzer with the stop words from the given file.
 * @see WordlistLoader#getWordSet(File)
 * @param matchVersion Lucene version to match See {@link
 * <a href="#version">above</a>}
 * @param stopwords File to read stop words from */
public StandardAnalyzer(final Version matchVersion, final File stopwords) throws IOException {
  this(matchVersion, WordlistLoader.getWordSet(stopwords));
}

Example 5

Source File: StandardAnalyzer.java From projectforge-webapp with GNU General Public License v3.0

2 votes

/** Builds an analyzer with the stop words from the given reader.
 * @see WordlistLoader#getWordSet(Reader)
 * @param matchVersion Lucene version to match See {@link
 * <a href="#version">above</a>}
 * @param stopwords Reader to read stop words from */
public StandardAnalyzer(final Version matchVersion, final Reader stopwords) throws IOException {
  this(matchVersion, WordlistLoader.getWordSet(stopwords));
}