Python pyspark.ml.classification.RandomForestClassifier() Examples

The following are code examples for showing how to use pyspark.ml.classification.RandomForestClassifier(). They are from open source Python projects. You can vote up the examples you like or vote down the ones you don't like.

Example 1
Project: spark-mlpipeline-for-ctr   Author: chenxinye   File: spark_mlpipeline.py    MIT License 5 votes vote down vote up
def rf_cv(self):
        """randomforest training
        """
        if self.mode == 'fast':
            _numTrees = [round(i*len(self.fe_col)) for i in [2]]
            _maxDepth = [round(i*len(self.fe_col)) for i in [0.07]]
            
        elif self.mode == 'full':
            _numTrees = [round(i*len(self.fe_col)) for i in [2,3]]
            _maxDepth = [round(i*len(self.fe_col)) for i in [0.05,0.07]]
            
        RFclassifier = cl.RandomForestClassifier(
            labelCol='label',
            featuresCol = 'features'
        )
        
        grid = tune.ParamGridBuilder().addGrid(
                                RFclassifier.numTrees, _numTrees
                                ).addGrid(
                                RFclassifier.maxDepth, _maxDepth
                                ).build()

        tvs = tune.TrainValidationSplit(
            estimator = RFclassifier,
            estimatorParamMaps = grid, 
            evaluator = self.evaluator
        )
        
        self.rfModel = tvs.fit(self.train_data)
        self.rf_cv_results = self.rfModel.transform(self.test_data)
        self.rfscore = self.evaluator.evaluate(self.rf_cv_results, {self.evaluator.metricName: 'areaUnderROC'})
        
        if self.verbose:
            print("AUC score is:", self.rfscore)
            #print("Area Under PR is:",self.evaluator.evaluate(self.rf_cv_results, {self.evaluator.metricName: 'areaUnderPR'}))

        pass