org.apache.spark.ml.feature.NGram Java Examples

The following examples show how to use org.apache.spark.ml.feature.NGram. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: NGramBuilder.java    From vn.vitk with GNU General Public License v3.0 6 votes vote down vote up
/**
 * Creates a n-gram data frame from text lines.
 * @param lines
 * @return a n-gram data frame.
 */
DataFrame createNGramDataFrame(JavaRDD<String> lines) {
	JavaRDD<Row> rows = lines.map(new Function<String, Row>(){
		private static final long serialVersionUID = -4332903997027358601L;
		
		@Override
		public Row call(String line) throws Exception {
			return RowFactory.create(Arrays.asList(line.split("\\s+")));
		}
	});
	StructType schema = new StructType(new StructField[] {
			new StructField("words",
					DataTypes.createArrayType(DataTypes.StringType), false,
					Metadata.empty()) });
	DataFrame wordDF = new SQLContext(jsc).createDataFrame(rows, schema);
	// build a bigram language model
	NGram transformer = new NGram().setInputCol("words")
			.setOutputCol("ngrams").setN(2);
	DataFrame ngramDF = transformer.transform(wordDF);
	ngramDF.show(10, false);
	return ngramDF;
}
 
Example #2
Source File: JavaNGramExample.java    From SparkDemo with MIT License 5 votes vote down vote up
public static void main(String[] args) {
  SparkSession spark = SparkSession
    .builder()
    .appName("JavaNGramExample")
    .getOrCreate();

  // $example on$
  List<Row> data = Arrays.asList(
    RowFactory.create(0, Arrays.asList("Hi", "I", "heard", "about", "Spark")),
    RowFactory.create(1, Arrays.asList("I", "wish", "Java", "could", "use", "case", "classes")),
    RowFactory.create(2, Arrays.asList("Logistic", "regression", "models", "are", "neat"))
  );

  StructType schema = new StructType(new StructField[]{
    new StructField("id", DataTypes.IntegerType, false, Metadata.empty()),
    new StructField(
      "words", DataTypes.createArrayType(DataTypes.StringType), false, Metadata.empty())
  });

  Dataset<Row> wordDataFrame = spark.createDataFrame(data, schema);

  NGram ngramTransformer = new NGram().setN(2).setInputCol("words").setOutputCol("ngrams");

  Dataset<Row> ngramDataFrame = ngramTransformer.transform(wordDataFrame);
  ngramDataFrame.select("ngrams").show(false);
  // $example off$

  spark.stop();
}
 
Example #3
Source File: NGramConverter.java    From jpmml-sparkml with GNU Affero General Public License v3.0 5 votes vote down vote up
@Override
public List<Feature> encodeFeatures(SparkMLEncoder encoder){
	NGram transformer = getTransformer();

	DocumentFeature documentFeature = (DocumentFeature)encoder.getOnlyFeature(transformer.getInputCol());

	return Collections.singletonList(documentFeature);
}
 
Example #4
Source File: NGramConverter.java    From jpmml-sparkml with GNU Affero General Public License v3.0 4 votes vote down vote up
public NGramConverter(NGram transformer){
	super(transformer);
}