opennlp.tools.langdetect.LanguageDetectorME Java Examples

The following examples show how to use opennlp.tools.langdetect.LanguageDetectorME. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: LanguageDetectorAndTrainingDataUnitTest.java    From tutorials with MIT License 6 votes vote down vote up
@Test
public void givenLanguageDictionary_whenLanguageDetect_thenLanguageIsDetected() throws FileNotFoundException, IOException {
    InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/models/DoccatSample.txt"));
    ObjectStream lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
    LanguageDetectorSampleStream sampleStream = new LanguageDetectorSampleStream(lineStream);
    TrainingParameters params = new TrainingParameters();
    params.put(TrainingParameters.ITERATIONS_PARAM, 100);
    params.put(TrainingParameters.CUTOFF_PARAM, 5);
    params.put("DataIndexer", "TwoPass");
    params.put(TrainingParameters.ALGORITHM_PARAM, "NAIVEBAYES");

    LanguageDetectorModel model = LanguageDetectorME.train(sampleStream, params, new LanguageDetectorFactory());

    LanguageDetector ld = new LanguageDetectorME(model);
    Language[] languages = ld.predictLanguages("estava em uma marcenaria na Rua Bruno");
    
    assertThat(Arrays.asList(languages)).extracting("lang", "confidence").contains(tuple("pob", 0.9999999950605625),
             tuple("ita", 4.939427661577956E-9), tuple("spa", 9.665954064665144E-15),
            tuple("fra", 8.250349924885834E-25));
}
 
Example #2
Source File: OpenNLPLangDetectUpdateProcessor.java    From lucene-solr with Apache License 2.0 5 votes vote down vote up
@Override
protected List<DetectedLanguage> detectLanguage(Reader solrDocReader) {
  List<DetectedLanguage> languages = new ArrayList<>();
  String content = SolrInputDocumentReader.asString(solrDocReader);
  if (content.length() != 0) {
    LanguageDetectorME ldme = new LanguageDetectorME(model);
    Language[] langs = ldme.predictLanguages(content);
    for(Language language: langs){
      languages.add(new DetectedLanguage(ISO639_MAP.get(language.getLang()), language.getConfidence()));
    }
  } else {
    log.debug("No input text to detect language from, returning empty list");
  }
  return languages;
}
 
Example #3
Source File: LanguageDetector.java    From newsleak with GNU Affero General Public License v3.0 5 votes vote down vote up
@Override
public void initialize(UimaContext context) throws ResourceInitializationException {
	super.initialize(context);
	supportedLanguages = getSupportedLanguages();
	languageDetector = new LanguageDetectorME(languageDetectorResource.getModel());
	logger = context.getLogger();
}