burlap.behavior.valuefunction.ValueFunction Java Examples

The following examples show how to use burlap.behavior.valuefunction.ValueFunction. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: MADynamicProgramming.java    From burlap with Apache License 2.0 6 votes vote down vote up
/**
 * Initializes all the main datstructres of the value function valueFunction
 * @param domain the domain in which to perform planning
 * @param agentDefinitions the definitions of the agents involved in the planning problem.
 * @param jointRewardFunction the joint reward function
 * @param terminalFunction the terminal state function
 * @param discount the discount factor
 * @param hashingFactory the state hashing factorying to use to lookup Q-values for individual states
 * @param vInit the value function initialization function to use
 * @param backupOperator the solution concept backup operator to use.
 */
public void initMAVF(SGDomain domain, List<SGAgentType> agentDefinitions, JointRewardFunction jointRewardFunction, TerminalFunction terminalFunction,
					 double discount, HashableStateFactory hashingFactory, ValueFunction vInit, SGBackupOperator backupOperator){

	this.domain = domain;
	this.jointModel = domain.getJointActionModel();
	this.jointRewardFunction = jointRewardFunction;
	this.terminalFunction = terminalFunction;
	this.discount = discount;
	this.hashingFactory = hashingFactory;
	this.vInit = vInit;
	this.backupOperator = backupOperator;
	
	
	this.setAgentDefinitions(agentDefinitions);
	
}
 
Example #2
Source File: Main.java    From cs7641-assignment4 with MIT License 5 votes vote down vote up
/**
 * This method takes care of visualizing the grid, rewards, and specific policy on a nice
 * BURLAP-predefined GUI. I found this very useful to understand how the algorithm was working.
 */
private static void visualize(Problem map, ValueFunction valueFunction, Policy policy, State initialState, SADomain domain, HashableStateFactory hashingFactory, String title) {
	List<State> states = StateReachability.getReachableStates(initialState, domain, hashingFactory);
	ValueFunctionVisualizerGUI gui = GridWorldDomain.getGridWorldValueFunctionVisualization(states, map.getWidth(), map.getWidth(), valueFunction, policy);
	gui.setTitle(title);
	gui.setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE);
	gui.initGUI();
}
 
Example #3
Source File: ValueFunctionVisualizerGUI.java    From burlap with Apache License 2.0 5 votes vote down vote up
/**
 * Initializes the visualizer GUI.
 * @param states the states whose value should be rendered.
 * @param svp the value function state visualizer to use.
 * @param valueFunction the valueFunction that can return the state values.
 */
public ValueFunctionVisualizerGUI(List <State> states, StateValuePainter svp, ValueFunction valueFunction){
	this.statesToVisualize = states;
	this.svp = svp;
	this.visualizer = new MultiLayerRenderer();
	this.vfLayer = new ValueFunctionRenderLayer(statesToVisualize, svp, valueFunction);
	this.pLayer = new PolicyRenderLayer(states, null, null);
	
	this.visualizer.addRenderLayer(vfLayer);
	this.visualizer.addRenderLayer(pLayer);
	
}
 
Example #4
Source File: ValueFunctionVisualizerGUI.java    From burlap with Apache License 2.0 5 votes vote down vote up
/**
 * A method for creating common 2D arrow glyped value function and policy visualization. The value of states
 * will be represented by colored cells from red (lowest value) to blue (highest value). North-south-east-west
 * actions will be rendered with arrows using {@link burlap.behavior.singleagent.auxiliary.valuefunctionvis.common.ArrowActionGlyph}
 * objects. The GUI will not be launched by default; call the {@link #initGUI()} on the returned object to start it.
 * @param states the states whose value should be rendered.
 * @param valueFunction the valueFunction that can return the state values.
 * @param p the policy to render
 * @param xVar the variable key for the x variable
 * @param yVar the variable key for the y variable
 * @param xRange xRange the range of the x variable
 * @param yRange the range of the y variable
 * @param xWidth the width of each rendered state within the x domain
 * @param yWidth the width of the each rendered state within the y domain
 * @param northActionName the name of the north action
 * @param southActionName the name of the south action
 * @param eastActionName the name of the east action
 * @param westActionName the name of the west action
 * @return a {@link burlap.behavior.singleagent.auxiliary.valuefunctionvis.ValueFunctionVisualizerGUI}
 */
public static ValueFunctionVisualizerGUI createGridWorldBasedValueFunctionVisualizerGUI(List <State> states, ValueFunction valueFunction, Policy p,
															 Object xVar, Object yVar,
															 VariableDomain xRange,
															 VariableDomain yRange,
															 double xWidth,
															 double yWidth,
															 String northActionName,
															 String southActionName,
															 String eastActionName,
															 String westActionName){


	StateValuePainter2D svp = new StateValuePainter2D();
	svp.setXYKeys(xVar, yVar, xRange, yRange, xWidth, yWidth);


	PolicyGlyphPainter2D spp = ArrowActionGlyph.getNSEWPolicyGlyphPainter(xVar, yVar, xRange, yRange, xWidth, yWidth,
			northActionName, southActionName, eastActionName, westActionName);

	ValueFunctionVisualizerGUI gui = new ValueFunctionVisualizerGUI(states, svp, valueFunction);
	gui.setSpp(spp);
	gui.setPolicy(p);
	gui.setBgColor(Color.GRAY);


	return gui;

}
 
Example #5
Source File: TDLambda.java    From burlap with Apache License 2.0 5 votes vote down vote up
/**
 * Initializes the algorithm.
 * @param gamma the discount factor
 * @param hashingFactory the state hashing factory to use for hashing states and performing equality checks. 
 * @param learningRate the learning rate that affects how quickly the estimated value function is adjusted.
 * @param vinit a method of initializing the value function for previously unvisited states.
 * @param lambda indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
 */
public TDLambda(double gamma, HashableStateFactory hashingFactory, double learningRate, ValueFunction vinit, double lambda) {
	this.gamma = gamma;
	this.hashingFactory = hashingFactory;
	
	this.learningRate = new ConstantLR(learningRate);
	vInitFunction = vinit;
	this.lambda = lambda;
	
	
	vIndex = new HashMap<HashableState, VValue>();
}
 
Example #6
Source File: AnalysisRunner.java    From omscs-cs7641-machine-learning-assignment-4 with GNU Lesser General Public License v3.0 5 votes vote down vote up
public void simpleValueFunctionVis(ValueFunction valueFunction, Policy p, 
		State initialState, Domain domain, HashableStateFactory hashingFactory, String title){

	List<State> allStates = StateReachability.getReachableStates(initialState,
			(SADomain)domain, hashingFactory);
	ValueFunctionVisualizerGUI gui = GridWorldDomain.getGridWorldValueFunctionVisualization(
			allStates, valueFunction, p);
	gui.setTitle(title);
	gui.initGUI();

}
 
Example #7
Source File: Main.java    From cs7641-assignment4 with MIT License 5 votes vote down vote up
/**
 * Here is where the magic happens. In this method is where I loop through the specific number
 * of episodes (iterations) and run the specific algorithm. To keep things nice and clean, I use
 * this method to run all three algorithms. The specific details are specified through the
 * PlannerFactory interface.
 * 
 * This method collects all the information from the algorithm and packs it in an Analysis
 * instance that later gets dumped on the console.
 */
private static void runAlgorithm(Analysis analysis, Problem problem, SADomain domain, HashableStateFactory hashingFactory, State initialState, PlannerFactory plannerFactory, Algorithm algorithm) {
	ConstantStateGenerator constantStateGenerator = new ConstantStateGenerator(initialState);
	SimulatedEnvironment simulatedEnvironment = new SimulatedEnvironment(domain, constantStateGenerator);
	Planner planner = null;
	Policy policy = null;
	for (int episodeIndex = 1; episodeIndex <= problem.getNumberOfIterations(algorithm); episodeIndex++) {
		long startTime = System.nanoTime();
		planner = plannerFactory.createPlanner(episodeIndex, domain, hashingFactory, simulatedEnvironment);
		policy = planner.planFromState(initialState);

		/*
		 * If we haven't converged, following the policy will lead the agent wandering around
		 * and it might never reach the goal. To avoid this, we need to set the maximum number
		 * of steps to take before terminating the policy rollout. I decided to set this maximum
		 * at the number of grid locations in our map (width * width). This should give the
		 * agent plenty of room to wander around.
		 * 
		 * The smaller this number is, the faster the algorithm will run.
		 */
		int maxNumberOfSteps = problem.getWidth() * problem.getWidth();

		Episode episode = PolicyUtils.rollout(policy, initialState, domain.getModel(), maxNumberOfSteps);
		analysis.add(episodeIndex, episode.rewardSequence, episode.numTimeSteps(), (long) (System.nanoTime() - startTime) / 1000000);
	}

	if (algorithm == Algorithm.QLearning && USE_LEARNING_EXPERIMENTER) {
		learningExperimenter(problem, (LearningAgent) planner, simulatedEnvironment);
	}

	if (SHOW_VISUALIZATION && planner != null && policy != null) {
		visualize(problem, (ValueFunction) planner, policy, initialState, domain, hashingFactory, algorithm.getTitle());
	}
}
 
Example #8
Source File: VITutorial.java    From burlap_examples with MIT License 5 votes vote down vote up
public VITutorial(SADomain domain, double gamma,
				  HashableStateFactory hashingFactory, ValueFunction vinit, int numIterations){
	this.solverInit(domain, gamma, hashingFactory);
	this.vinit = vinit;
	this.numIterations = numIterations;
	this.valueFunction = new HashMap<HashableState, Double>();
}
 
Example #9
Source File: BasicBehavior.java    From burlap_examples with MIT License 4 votes vote down vote up
public void manualValueFunctionVis(ValueFunction valueFunction, Policy p){

		List<State> allStates = StateReachability.getReachableStates(initialState, domain, hashingFactory);

		//define color function
		LandmarkColorBlendInterpolation rb = new LandmarkColorBlendInterpolation();
		rb.addNextLandMark(0., Color.RED);
		rb.addNextLandMark(1., Color.BLUE);

		//define a 2D painter of state values, specifying which attributes correspond to the x and y coordinates of the canvas
		StateValuePainter2D svp = new StateValuePainter2D(rb);
		svp.setXYKeys("agent:x", "agent:y", new VariableDomain(0, 11), new VariableDomain(0, 11), 1, 1);

		//create our ValueFunctionVisualizer that paints for all states
		//using the ValueFunction source and the state value painter we defined
		ValueFunctionVisualizerGUI gui = new ValueFunctionVisualizerGUI(allStates, svp, valueFunction);

		//define a policy painter that uses arrow glyphs for each of the grid world actions
		PolicyGlyphPainter2D spp = new PolicyGlyphPainter2D();
		spp.setXYKeys("agent:x", "agent:y", new VariableDomain(0, 11), new VariableDomain(0, 11), 1, 1);

		spp.setActionNameGlyphPainter(GridWorldDomain.ACTION_NORTH, new ArrowActionGlyph(0));
		spp.setActionNameGlyphPainter(GridWorldDomain.ACTION_SOUTH, new ArrowActionGlyph(1));
		spp.setActionNameGlyphPainter(GridWorldDomain.ACTION_EAST, new ArrowActionGlyph(2));
		spp.setActionNameGlyphPainter(GridWorldDomain.ACTION_WEST, new ArrowActionGlyph(3));
		spp.setRenderStyle(PolicyGlyphPainter2D.PolicyGlyphRenderStyle.DISTSCALED);


		//add our policy renderer to it
		gui.setSpp(spp);
		gui.setPolicy(p);

		//set the background color for places where states are not rendered to grey
		gui.setBgColor(Color.GRAY);

		//start it
		gui.initGUI();



	}
 
Example #10
Source File: BasicBehavior.java    From burlap_examples with MIT License 4 votes vote down vote up
public void qLearningExample(String outputPath){

		LearningAgent agent = new QLearning(domain, 0.99, hashingFactory, 0., 1.);

		//run learning for 50 episodes
		for(int i = 0; i < 50; i++){
			Episode e = agent.runLearningEpisode(env);

			e.write(outputPath + "ql_" + i);
			System.out.println(i + ": " + e.maxTimeStep());

			//reset environment for next learning episode
			env.resetEnvironment();
		}

		simpleValueFunctionVis((ValueFunction)agent, new GreedyQPolicy((QProvider) agent));

	}
 
Example #11
Source File: BasicBehavior.java    From burlap_examples with MIT License 3 votes vote down vote up
public void simpleValueFunctionVis(ValueFunction valueFunction, Policy p){

		List<State> allStates = StateReachability.getReachableStates(initialState, domain, hashingFactory);
		ValueFunctionVisualizerGUI gui = GridWorldDomain.getGridWorldValueFunctionVisualization(allStates, 11, 11, valueFunction, p);
		gui.initGUI();

	}
 
Example #12
Source File: BoundedRTDP.java    From burlap with Apache License 2.0 3 votes vote down vote up
/**
 * Initializes.
 * @param domain the domain in which to plan
 * @param gamma the discount factor
 * @param hashingFactory the state hashing factor to use
 * @param lowerVInit the value function lower bound initialization
 * @param upperVInit the value function upper bound initialization
 * @param maxDiff the max permitted difference in value function margin to permit planning termination. This value is also used to prematurely stop a rollout if the next state's margin is under this value.
 * @param maxRollouts the maximum number of rollouts permitted before planning is forced to terminate. If set to -1 then there is no limit.
 */
public BoundedRTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory,
				   ValueFunction lowerVInit, ValueFunction upperVInit, double maxDiff, int maxRollouts){
	this.DPPInit(domain, gamma, hashingFactory);
	this.lowerVInit = lowerVInit;
	this.upperVInit = upperVInit;
	this.maxDiff = maxDiff;
	this.maxRollouts = maxRollouts;

}
 
Example #13
Source File: ARTDP.java    From burlap with Apache License 2.0 3 votes vote down vote up
/**
 * Initializes using the provided model algorithm and a Boltzmann policy with a fixed temperature of 0.1. 
 * @param domain the domain
 * @param gamma the discount factor
 * @param hashingFactory the state hashing factory to use for the tabular model and the planning
 * @param model the model algorithm to use
 * @param vInit the constant value function initialization to use; should be optimisitc.
 */
public ARTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory, LearnedModel model, ValueFunction vInit){
	
	this.solverInit(domain, gamma, hashingFactory);
	
	this.model = model;
	
	//initializing the value function planning mechanisms to use our model and not the real world
	this.modelPlanner = new DynamicProgramming();
	this.modelPlanner.DPPInit(domain, gamma, hashingFactory);
	this.policy = new BoltzmannQPolicy(this, 0.1);
	
	
}
 
Example #14
Source File: ARTDP.java    From burlap with Apache License 2.0 3 votes vote down vote up
/**
 * Initializes using a tabular model of the world and a Boltzmann policy with a fixed temperature of 0.1. 
 * @param domain the domain
 * @param gamma the discount factor
 * @param hashingFactory the state hashing factory to use for the tabular model and the planning
 * @param vInit the value function initialization to use; should be optimisitc.
 */
public ARTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory, ValueFunction vInit){
	
	this.solverInit(domain, gamma, hashingFactory);
	
	this.model = new TabularModel(domain, hashingFactory, 1);
	
	//initializing the value function planning mechanisms to use our model and not the real world
	this.modelPlanner = new DynamicProgramming();
	this.modelPlanner.DPPInit(domain, gamma, hashingFactory);
	this.modelPlanner.setModel(this.model);
	this.policy = new BoltzmannQPolicy(this, 0.1);
	
	
}
 
Example #15
Source File: TimeIndexedTDLambda.java    From burlap with Apache License 2.0 3 votes vote down vote up
/**
 * Initializes the algorithm.
 * @param gamma the discount factor
 * @param hashingFactory the state hashing factory to use for hashing states and performing equality checks. 
 * @param learningRate the learning rate that affects how quickly the estimated value function is adjusted.
 * @param vinit a method of initializing the value function for previously unvisited states.
 * @param lambda indicates the strength of eligibility traces. Use 1 for Monte-carlo-like traces and 0 for single step backups
 * @param maxEpisodeSize the maximum number of steps possible in an episode
 */
public TimeIndexedTDLambda(double gamma, HashableStateFactory hashingFactory, double learningRate, ValueFunction vinit, double lambda, int maxEpisodeSize) {
	super(gamma, hashingFactory, learningRate, vinit, lambda);
	
	this.maxEpisodeSize = maxEpisodeSize;
	this.vTIndex = new ArrayList<Map<HashableState,VValue>>();
	
}
 
Example #16
Source File: BasicBehavior.java    From burlap_examples with MIT License 3 votes vote down vote up
public void valueIterationExample(String outputPath){

		Planner planner = new ValueIteration(domain, 0.99, hashingFactory, 0.001, 100);
		Policy p = planner.planFromState(initialState);

		PolicyUtils.rollout(p, initialState, domain.getModel()).write(outputPath + "vi");

		simpleValueFunctionVis((ValueFunction)planner, p);
		//manualValueFunctionVis((ValueFunction)planner, p);

	}
 
Example #17
Source File: MAValueIteration.java    From burlap with Apache License 2.0 3 votes vote down vote up
/**
 * Initializes.
 * @param domain the domain in which to perform planing
 * @param agentDefinitions the agents involved in the planning problem
 * @param jointRewardFunction the joint reward function
 * @param terminalFunction the terminal state function
 * @param discount the discount
 * @param hashingFactory the hashing factory to use for storing states
 * @param vInit the state value initialization function to use.
 * @param backupOperator the backup operator that defines the solution concept being solved
 * @param maxDelta the threshold that causes VI to terminate when the max Q-value change is less than it
 * @param maxIterations the maximum number of iterations allowed
 */
public MAValueIteration(SGDomain domain, List<SGAgentType> agentDefinitions, JointRewardFunction jointRewardFunction, TerminalFunction terminalFunction,
						double discount, HashableStateFactory hashingFactory, ValueFunction vInit, SGBackupOperator backupOperator, double maxDelta, int maxIterations){
	
	this.initMAVF(domain, agentDefinitions, jointRewardFunction, terminalFunction, discount, hashingFactory, vInit, backupOperator);
	this.maxDelta = maxDelta;
	this.maxIterations = maxIterations;
	
}
 
Example #18
Source File: MAValueIteration.java    From burlap with Apache License 2.0 3 votes vote down vote up
/**
 * Initializes.
 * @param domain the domain in which to perform planing
 * @param jointRewardFunction the joint reward function
 * @param terminalFunction the terminal state function
 * @param discount the discount
 * @param hashingFactory the hashing factory to use for storing states
 * @param qInit the q-value initialization function to use.
 * @param backupOperator the backup operator that defines the solution concept being solved
 * @param maxDelta the threshold that causes VI to terminate when the max Q-value change is less than it
 * @param maxIterations the maximum number of iterations allowed
 */
public MAValueIteration(SGDomain domain, JointRewardFunction jointRewardFunction, TerminalFunction terminalFunction,
						double discount, HashableStateFactory hashingFactory, ValueFunction qInit, SGBackupOperator backupOperator, double maxDelta, int maxIterations){
	
	this.initMAVF(domain, null, jointRewardFunction, terminalFunction, discount, hashingFactory, qInit, backupOperator);
	this.maxDelta = maxDelta;
	this.maxIterations = maxIterations;
	
}
 
Example #19
Source File: VanillaDiffVinit.java    From burlap with Apache License 2.0 2 votes vote down vote up
/**
 * Initializes.
 * @param vinit The vanilla unparameterized value function initialization
 * @param rf the differentiable reward function that defines the total parameter space
 */
public VanillaDiffVinit(ValueFunction vinit, DifferentiableRF rf) {
	this.vinit = vinit;
	this.rf = rf;
}
 
Example #20
Source File: SparseSampling.java    From burlap with Apache License 2.0 2 votes vote down vote up
/**
 * Sets the {@link ValueFunction} object to use for settting the value of leaf nodes.
 * @param vinit the {@link ValueFunction} object to use for settting the value of leaf nodes.
 */
public void setValueForLeafNodes(ValueFunction vinit){
	this.vinit = vinit;
}
 
Example #21
Source File: SupervisedVFA.java    From burlap with Apache License 2.0 2 votes vote down vote up
/**
 * Uses supervised learning (regression) to learn a value function approximation of the input training data.
 * @param trainingData the training data to fit.
 * @return a {@link burlap.behavior.valuefunction.ValueFunction} that fits the training data.
 */
ValueFunction train(List<SupervisedVFAInstance> trainingData);
 
Example #22
Source File: DynamicProgramming.java    From burlap with Apache License 2.0 2 votes vote down vote up
/**
 * Sets the value function initialization to use.
 * @param vfInit the object that defines how to initializes the value function.
 */
public void setValueFunctionInitialization(ValueFunction vfInit){
	this.valueInitializer = vfInit;
}
 
Example #23
Source File: DynamicProgramming.java    From burlap with Apache License 2.0 2 votes vote down vote up
/**
 * Returns the value initialization function used.
 * @return the value initialization function used.
 */
public ValueFunction getValueFunctionInitialization(){
	return this.valueInitializer;
}
 
Example #24
Source File: ValueFunctionRenderLayer.java    From burlap with Apache License 2.0 2 votes vote down vote up
/**
 * Initializes the visualizer.
 * @param states the states whose value should be rendered.
 * @param svp the value function state visualizer to use.
 * @param valueFunction the valueFunction that can return the state values.
 */
public ValueFunctionRenderLayer(Collection <State> states, StateValuePainter svp, ValueFunction valueFunction){
	this.statesToVisualize = states;
	this.svp = svp;
	this.valueFunction = valueFunction;
}
 
Example #25
Source File: GridWorldDomain.java    From burlap with Apache License 2.0 2 votes vote down vote up
/**
 * Creates and returns a {@link burlap.behavior.singleagent.auxiliary.valuefunctionvis.ValueFunctionVisualizerGUI}
 * object for a grid world. The value of states
 * will be represented by colored cells from red (lowest value) to blue (highest value). North-south-east-west
 * actions will be rendered with arrows using {@link burlap.behavior.singleagent.auxiliary.valuefunctionvis.common.ArrowActionGlyph}
 * objects. The GUI will not be launched by default; call the
 * {@link burlap.behavior.singleagent.auxiliary.valuefunctionvis.ValueFunctionVisualizerGUI#initGUI()}
 * on the returned object to start it.
 * @param states the states whose value should be rendered.
 * @param maxX the maximum value in the x dimension
 * @param maxY the maximum value in the y dimension
 * @param valueFunction the value Function that can return the state values.
 * @param p the policy to render
 * @return a gridworld-based {@link burlap.behavior.singleagent.auxiliary.valuefunctionvis.ValueFunctionVisualizerGUI} object.
 */
public static ValueFunctionVisualizerGUI getGridWorldValueFunctionVisualization(List <State> states, int maxX, int maxY, ValueFunction valueFunction, Policy p){
	return ValueFunctionVisualizerGUI.createGridWorldBasedValueFunctionVisualizerGUI(states, valueFunction, p,
			new OOVariableKey(CLASS_AGENT, VAR_X), new OOVariableKey(CLASS_AGENT, VAR_Y), new VariableDomain(0, maxX), new VariableDomain(0, maxY), 1, 1,
			ACTION_NORTH, ACTION_SOUTH, ACTION_EAST, ACTION_WEST);
}
 
Example #26
Source File: RTDP.java    From burlap with Apache License 2.0 1 votes vote down vote up
/**
 * Initializes. The value function will be initialized to vInit by default everywhere and will use a greedy policy with random tie breaks
 * for performing rollouts. Use the {@link #setValueFunctionInitialization(burlap.behavior.valuefunction.ValueFunction)} method
 * to change the value function initialization and the {@link #setRollOutPolicy(Policy)} method to change the rollout policy to something else. vInit
 * should be set to something optimistic like VMax to ensure convergence.
 * @param domain the domain in which to plan
 * @param gamma the discount factor
 * @param hashingFactory the state hashing factor to use
 * @param vInit the object which defines how the value function will be initialized for each individual state.
 * @param numRollouts the number of rollouts to perform when planning is started.
 * @param maxDelta when the maximum change in the value function from a rollout is smaller than this value, planning will terminate.
 * @param maxDepth the maximum depth/length of a rollout before it is terminated and Bellman updates are performed.
 */
public RTDP(SADomain domain, double gamma, HashableStateFactory hashingFactory, ValueFunction vInit, int numRollouts, double maxDelta, int maxDepth){
	
	this.DPPInit(domain, gamma, hashingFactory);
	
	this.numRollouts = numRollouts;
	this.maxDelta = maxDelta;
	this.maxDepth = maxDepth;
	this.rollOutPolicy = new GreedyQPolicy(this);
	
	this.valueInitializer = vInit;
	
}