Java Code Examples for com.aliasi.tokenizer.TokenizerFactory#tokenizer()

The following examples show how to use com.aliasi.tokenizer.TokenizerFactory#tokenizer() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source File: Chapter2.java    From Natural-Language-Processing-with-Java-Second-Edition with MIT License 5 votes vote down vote up
private static void usingLingPipeTokenizers() {
//        String paragraph = "sample text string";
        char text[] = paragraph.toCharArray();
        TokenizerFactory tokenizerFactory = IndoEuropeanTokenizerFactory.INSTANCE;
        com.aliasi.tokenizer.Tokenizer tokenizer = tokenizerFactory.tokenizer(
                text, 0, text.length);
        for (String token : tokenizer) {
            System.out.println(token);
        }
    }
 
Example 2
Source File: TweetHandler.java    From Java-for-Data-Science with MIT License 5 votes vote down vote up
public TweetHandler removeStopWords() {
    TokenizerFactory tokenizerFactory
            = IndoEuropeanTokenizerFactory.INSTANCE;
    tokenizerFactory = new EnglishStopTokenizerFactory(tokenizerFactory);
    Tokenizer tokens = tokenizerFactory.tokenizer(
            this.text.toCharArray(), 0, this.text.length());
    StringBuilder buffer = new StringBuilder();
    for (String word : tokens) {
        buffer.append(word + " ");
    }
    this.text = buffer.toString();
    return this;
}
 
Example 3
Source File: SimpleStringCleaning.java    From Java-for-Data-Science with MIT License 5 votes vote down vote up
public static void removeStopWithLing(String text){
	//******************EXAMPLE WITH ling pipe *******************************************************************************************
	//mention lower vs upper case
	out.println(text);
	text = text.toLowerCase().trim();
	TokenizerFactory fact = IndoEuropeanTokenizerFactory.INSTANCE;
	fact = new EnglishStopTokenizerFactory(fact);
	Tokenizer tok = fact.tokenizer(text.toCharArray(), 0, text.length());
	for(String word : tok){
		out.print(word + " ");
	}
}