com.hankcs.hanlp.tokenizer.NLPTokenizer Java Examples

The following examples show how to use com.hankcs.hanlp.tokenizer.NLPTokenizer. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: Segment.java    From AHANLP with Apache License 2.0 5 votes vote down vote up
/**
   * 分词断句
   * @param segType 分词器类型(Standard 或 NLP)
   * @param shortest 是否断句为最细的子句(将逗号、分号也视作分隔符)
   * @param content 文本
   * @param filterStopWord 滤掉停用词
   * @return 句子列表,每个句子由一个单词列表组成
   */
  public static List<List<Term>> seg2sentence(String segType, boolean shortest, String content, boolean filterStopWord) {
  	List<List<Term>> results = null;
  	if ("Standard".equals(segType) || "标准分词".equals(segType)) {
  		results = StandardTokenizer.seg2sentence(content, shortest);
  	} else if ("NLP".equals(segType) || "NLP分词".equals(segType)) {
  		results = NLPTokenizer.seg2sentence(content, shortest);
  	} else {
  		throw new IllegalArgumentException(String.format("非法参数 segType == %s", segType));
  	}
  	if (filterStopWord)
  		for (List<Term> res : results)
  			CoreStopWordDictionary.apply(res);
return results;
  }
 
Example #2
Source File: Segment.java    From AHANLP with Apache License 2.0 3 votes vote down vote up
/**
 * NLP分词<br>
 * 感知机分词<br>
 * 执行词性标注和命名实体识别,更重视准确率
 * @param content 文本
 * @param filterStopWord 滤掉停用词
 * @return 分词结果
 */
public static List<Term> NLPSegment(String content, boolean filterStopWord) {
    List<Term> result = NLPTokenizer.segment(content);
    if (filterStopWord)
    	CoreStopWordDictionary.apply(result);
    return result;
}