org.deeplearning4j.text.tokenization.tokenizer.TokenPreProcess Java Examples

The following examples show how to use org.deeplearning4j.text.tokenization.tokenizer.TokenPreProcess. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: WordVectorSerializer.java    From deeplearning4j with Apache License 2.0 6 votes vote down vote up
protected static TokenizerFactory getTokenizerFactory(VectorsConfiguration configuration) {
    if (configuration == null)
        return null;

    if (configuration.getTokenizerFactory() != null && !configuration.getTokenizerFactory().isEmpty()) {
        try {
            TokenizerFactory factory =
                            (TokenizerFactory) Class.forName(configuration.getTokenizerFactory()).newInstance();

            if (configuration.getTokenPreProcessor() != null && !configuration.getTokenPreProcessor().isEmpty()) {
                TokenPreProcess preProcessor =
                                (TokenPreProcess) Class.forName(configuration.getTokenPreProcessor()).newInstance();
                factory.setTokenPreProcessor(preProcessor);
            }

            return factory;

        } catch (Exception e) {
            log.error("Can't instantiate saved TokenizerFactory: {}", configuration.getTokenizerFactory());
        }
    }
    return null;
}
 
Example #2
Source File: EndingPreProcessorTest.java    From deeplearning4j with Apache License 2.0 5 votes vote down vote up
@Test
public void testPreProcessor() {
    TokenPreProcess preProcess = new EndingPreProcessor();
    String endingTest = "ending";
    assertEquals("end", preProcess.preProcess(endingTest));

}
 
Example #3
Source File: CompositePreProcessor.java    From deeplearning4j with Apache License 2.0 5 votes vote down vote up
@Override
public String preProcess(String token) {
    String s = token;
    for(TokenPreProcess tpp : preProcessors){
        s = tpp.preProcess(s);
    }
    return s;
}
 
Example #4
Source File: KoreanTokenizerFactory.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public TokenPreProcess getTokenPreProcessor() {
    return this.preProcess;
}
 
Example #5
Source File: PosUimaTokenizerFactory.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess preProcessor) {
    this.tokenPreProcess = preProcessor;
}
 
Example #6
Source File: KoreanTokenizer.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess tokenPreProcess) {
    this.preProcess = tokenPreProcess;
}
 
Example #7
Source File: WekaTokenizer.java    From wekaDeeplearning4j with GNU General Public License v3.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess tokenPreProcessor) {
  this.tokenPreProcess = tokenPreProcessor;
}
 
Example #8
Source File: UimaTokenizerFactory.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess preProcessor) {
    this.preProcess = preProcessor;
}
 
Example #9
Source File: UimaTokenizer.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess tokenPreProcessor) {
    this.preProcess = tokenPreProcessor;
}
 
Example #10
Source File: PosUimaTokenizer.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(@NonNull TokenPreProcess tokenPreProcessor) {
    this.preProcessor = tokenPreProcessor;
}
 
Example #11
Source File: EmbeddedStemmingPreprocessor.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
public EmbeddedStemmingPreprocessor(@NonNull TokenPreProcess preProcess) {
    this.preProcessor = preProcess;
}
 
Example #12
Source File: ChineseTokenizerFactory.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess tokenPreProcess) {
    this.tokenPreProcess = tokenPreProcess;
}
 
Example #13
Source File: ChineseTokenizerFactory.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public TokenPreProcess getTokenPreProcessor() {
    return tokenPreProcess;
}
 
Example #14
Source File: ChineseTokenizer.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess tokenPreProcessor) {
    this.tokenPreProcess = tokenPreProcessor;
}
 
Example #15
Source File: DefaultTokenizerFactory.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess preProcessor) {
    this.tokenPreProcess = preProcessor;
}
 
Example #16
Source File: NGramTokenizerFactory.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess preProcessor) {
    this.preProcess = preProcessor;
}
 
Example #17
Source File: BertWordPieceTokenizerFactory.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
/**
 * @param vocab                   Vocabulary, as a navigable map
 * @param preTokenizePreProcessor The preprocessor that should be used on the raw strings, before splitting
 */
public BertWordPieceTokenizerFactory(NavigableMap<String, Integer> vocab, TokenPreProcess preTokenizePreProcessor) {
    this.vocab = vocab;
    this.preTokenizePreProcessor = preTokenizePreProcessor;
}
 
Example #18
Source File: CompositePreProcessor.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
public CompositePreProcessor(@NonNull TokenPreProcess... preProcessors){
    Preconditions.checkState(preProcessors.length > 0, "No preprocessors were specified (empty input)");
    this.preProcessors = Arrays.asList(preProcessors);
}
 
Example #19
Source File: CompositePreProcessor.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
public CompositePreProcessor(@NonNull Collection<? extends TokenPreProcess> preProcessors){
    Preconditions.checkState(!preProcessors.isEmpty(), "No preprocessors were specified (empty input)");
    this.preProcessors = new ArrayList<>(preProcessors);
}
 
Example #20
Source File: JapaneseTokenizerFactory.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess preProcessor) {
    this.preProcessor = preProcessor;
}
 
Example #21
Source File: JapaneseTokenizerFactory.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public TokenPreProcess getTokenPreProcessor() {
    return this.preProcessor;
}
 
Example #22
Source File: JapaneseTokenizer.java    From deeplearning4j with Apache License 2.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess tokenPreProcessor) {
    this.preProcessor = tokenPreProcessor;
}
 
Example #23
Source File: TweetNLPTokenizer.java    From wekaDeeplearning4j with GNU General Public License v3.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess tokenPreProcessor) {
  this.tokenPreProcess = tokenPreProcessor;
}
 
Example #24
Source File: NGramTokenizerFactoryImpl.java    From wekaDeeplearning4j with GNU General Public License v3.0 4 votes vote down vote up
@Override
public TokenPreProcess getTokenPreProcessor() {
  return tokenPreProcess;
}
 
Example #25
Source File: NGramTokenizerFactoryImpl.java    From wekaDeeplearning4j with GNU General Public License v3.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess preProcessor) {
  this.tokenPreProcess = preProcessor;
}
 
Example #26
Source File: CharacterNGramTokenizerFactoryImpl.java    From wekaDeeplearning4j with GNU General Public License v3.0 4 votes vote down vote up
@Override
public TokenPreProcess getTokenPreProcessor() {
  return tokenPreProcess;
}
 
Example #27
Source File: CharacterNGramTokenizerFactoryImpl.java    From wekaDeeplearning4j with GNU General Public License v3.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess preProcessor) {
  this.tokenPreProcess = preProcessor;
}
 
Example #28
Source File: TweetNLPTokenizerFactoryImpl.java    From wekaDeeplearning4j with GNU General Public License v3.0 4 votes vote down vote up
@Override
public TokenPreProcess getTokenPreProcessor() {
  return tokenPreProcess;
}
 
Example #29
Source File: TweetNLPTokenizerFactoryImpl.java    From wekaDeeplearning4j with GNU General Public License v3.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess preProcessor) {
  this.tokenPreProcess = preProcessor;
}
 
Example #30
Source File: TweetNLPTokenizer.java    From wekaDeeplearning4j with GNU General Public License v3.0 4 votes vote down vote up
@Override
public void setTokenPreProcessor(TokenPreProcess tokenPreProcessor) {
  this.tokenPreProcess = tokenPreProcessor;
}