Java Code Examples for org.apache.hadoop.mapreduce.TaskInputOutputContext#getOutputCommitter()

The following examples show how to use org.apache.hadoop.mapreduce.TaskInputOutputContext#getOutputCommitter() . You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.

Example 1

Source File: FileOutputFormat.java From hadoop with Apache License 2.0

3 votes

/**
 *  Get the {@link Path} to the task's temporary output directory 
 *  for the map-reduce job
 *  
 * <b id="SideEffectFiles">Tasks' Side-Effect Files</b>
 * 
 * <p>Some applications need to create/write-to side-files, which differ from
 * the actual job-outputs.
 * 
 * <p>In such cases there could be issues with 2 instances of the same TIP 
 * (running simultaneously e.g. speculative tasks) trying to open/write-to the
 * same file (path) on HDFS. Hence the application-writer will have to pick 
 * unique names per task-attempt (e.g. using the attemptid, say 
 * <tt>attempt_200709221812_0001_m_000000_0</tt>), not just per TIP.</p> 
 * 
 * <p>To get around this the Map-Reduce framework helps the application-writer 
 * out by maintaining a special 
 * <tt>${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid}</tt> 
 * sub-directory for each task-attempt on HDFS where the output of the 
 * task-attempt goes. On successful completion of the task-attempt the files 
 * in the <tt>${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid}</tt> (only) 
 * are <i>promoted</i> to <tt>${mapreduce.output.fileoutputformat.outputdir}</tt>. Of course, the 
 * framework discards the sub-directory of unsuccessful task-attempts. This 
 * is completely transparent to the application.</p>
 * 
 * <p>The application-writer can take advantage of this by creating any 
 * side-files required in a work directory during execution 
 * of his task i.e. via 
 * {@link #getWorkOutputPath(TaskInputOutputContext)}, and
 * the framework will move them out similarly - thus she doesn't have to pick 
 * unique paths per task-attempt.</p>
 * 
 * <p>The entire discussion holds true for maps of jobs with 
 * reducer=NONE (i.e. 0 reduces) since output of the map, in that case, 
 * goes directly to HDFS.</p> 
 * 
 * @return the {@link Path} to the task's temporary output directory 
 * for the map-reduce job.
 */
public static Path getWorkOutputPath(TaskInputOutputContext<?,?,?,?> context
                                     ) throws IOException, 
                                              InterruptedException {
  FileOutputCommitter committer = (FileOutputCommitter) 
    context.getOutputCommitter();
  return committer.getWorkPath();
}

Example 2

Source File: FileOutputFormat.java From big-c with Apache License 2.0

3 votes

/**
 *  Get the {@link Path} to the task's temporary output directory 
 *  for the map-reduce job
 *  
 * <b id="SideEffectFiles">Tasks' Side-Effect Files</b>
 * 
 * <p>Some applications need to create/write-to side-files, which differ from
 * the actual job-outputs.
 * 
 * <p>In such cases there could be issues with 2 instances of the same TIP 
 * (running simultaneously e.g. speculative tasks) trying to open/write-to the
 * same file (path) on HDFS. Hence the application-writer will have to pick 
 * unique names per task-attempt (e.g. using the attemptid, say 
 * <tt>attempt_200709221812_0001_m_000000_0</tt>), not just per TIP.</p> 
 * 
 * <p>To get around this the Map-Reduce framework helps the application-writer 
 * out by maintaining a special 
 * <tt>${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid}</tt> 
 * sub-directory for each task-attempt on HDFS where the output of the 
 * task-attempt goes. On successful completion of the task-attempt the files 
 * in the <tt>${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid}</tt> (only) 
 * are <i>promoted</i> to <tt>${mapreduce.output.fileoutputformat.outputdir}</tt>. Of course, the 
 * framework discards the sub-directory of unsuccessful task-attempts. This 
 * is completely transparent to the application.</p>
 * 
 * <p>The application-writer can take advantage of this by creating any 
 * side-files required in a work directory during execution 
 * of his task i.e. via 
 * {@link #getWorkOutputPath(TaskInputOutputContext)}, and
 * the framework will move them out similarly - thus she doesn't have to pick 
 * unique paths per task-attempt.</p>
 * 
 * <p>The entire discussion holds true for maps of jobs with 
 * reducer=NONE (i.e. 0 reduces) since output of the map, in that case, 
 * goes directly to HDFS.</p> 
 * 
 * @return the {@link Path} to the task's temporary output directory 
 * for the map-reduce job.
 */
public static Path getWorkOutputPath(TaskInputOutputContext<?,?,?,?> context
                                     ) throws IOException, 
                                              InterruptedException {
  FileOutputCommitter committer = (FileOutputCommitter) 
    context.getOutputCommitter();
  return committer.getWorkPath();
}

Example 3

Source File: FileOutputFormat.java From RDFS with Apache License 2.0

3 votes

/**
 *  Get the {@link Path} to the task's temporary output directory 
 *  for the map-reduce job
 *  
 * <h4 id="SideEffectFiles">Tasks' Side-Effect Files</h4>
 * 
 * <p>Some applications need to create/write-to side-files, which differ from
 * the actual job-outputs.
 * 
 * <p>In such cases there could be issues with 2 instances of the same TIP 
 * (running simultaneously e.g. speculative tasks) trying to open/write-to the
 * same file (path) on HDFS. Hence the application-writer will have to pick 
 * unique names per task-attempt (e.g. using the attemptid, say 
 * <tt>attempt_200709221812_0001_m_000000_0</tt>), not just per TIP.</p> 
 * 
 * <p>To get around this the Map-Reduce framework helps the application-writer 
 * out by maintaining a special 
 * <tt>${mapred.output.dir}/_temporary/_${taskid}</tt> 
 * sub-directory for each task-attempt on HDFS where the output of the 
 * task-attempt goes. On successful completion of the task-attempt the files 
 * in the <tt>${mapred.output.dir}/_temporary/_${taskid}</tt> (only) 
 * are <i>promoted</i> to <tt>${mapred.output.dir}</tt>. Of course, the 
 * framework discards the sub-directory of unsuccessful task-attempts. This 
 * is completely transparent to the application.</p>
 * 
 * <p>The application-writer can take advantage of this by creating any 
 * side-files required in a work directory during execution 
 * of his task i.e. via 
 * {@link #getWorkOutputPath(TaskInputOutputContext)}, and
 * the framework will move them out similarly - thus she doesn't have to pick 
 * unique paths per task-attempt.</p>
 * 
 * <p>The entire discussion holds true for maps of jobs with 
 * reducer=NONE (i.e. 0 reduces) since output of the map, in that case, 
 * goes directly to HDFS.</p> 
 * 
 * @return the {@link Path} to the task's temporary output directory 
 * for the map-reduce job.
 */
public static Path getWorkOutputPath(TaskInputOutputContext<?,?,?,?> context
                                     ) throws IOException, 
                                              InterruptedException {
  FileOutputCommitter committer = (FileOutputCommitter) 
    context.getOutputCommitter();
  return committer.getWorkPath();
}

Example 4

Source File: FileOutputFormat.java From hadoop-gpu with Apache License 2.0

3 votes

/**
 *  Get the {@link Path} to the task's temporary output directory 
 *  for the map-reduce job
 *  
 * <h4 id="SideEffectFiles">Tasks' Side-Effect Files</h4>
 * 
 * <p>Some applications need to create/write-to side-files, which differ from
 * the actual job-outputs.
 * 
 * <p>In such cases there could be issues with 2 instances of the same TIP 
 * (running simultaneously e.g. speculative tasks) trying to open/write-to the
 * same file (path) on HDFS. Hence the application-writer will have to pick 
 * unique names per task-attempt (e.g. using the attemptid, say 
 * <tt>attempt_200709221812_0001_m_000000_0</tt>), not just per TIP.</p> 
 * 
 * <p>To get around this the Map-Reduce framework helps the application-writer 
 * out by maintaining a special 
 * <tt>${mapred.output.dir}/_temporary/_${taskid}</tt> 
 * sub-directory for each task-attempt on HDFS where the output of the 
 * task-attempt goes. On successful completion of the task-attempt the files 
 * in the <tt>${mapred.output.dir}/_temporary/_${taskid}</tt> (only) 
 * are <i>promoted</i> to <tt>${mapred.output.dir}</tt>. Of course, the 
 * framework discards the sub-directory of unsuccessful task-attempts. This 
 * is completely transparent to the application.</p>
 * 
 * <p>The application-writer can take advantage of this by creating any 
 * side-files required in a work directory during execution 
 * of his task i.e. via 
 * {@link #getWorkOutputPath(TaskInputOutputContext)}, and
 * the framework will move them out similarly - thus she doesn't have to pick 
 * unique paths per task-attempt.</p>
 * 
 * <p>The entire discussion holds true for maps of jobs with 
 * reducer=NONE (i.e. 0 reduces) since output of the map, in that case, 
 * goes directly to HDFS.</p> 
 * 
 * @return the {@link Path} to the task's temporary output directory 
 * for the map-reduce job.
 */
public static Path getWorkOutputPath(TaskInputOutputContext<?,?,?,?> context
                                     ) throws IOException, 
                                              InterruptedException {
  FileOutputCommitter committer = (FileOutputCommitter) 
    context.getOutputCommitter();
  return committer.getWorkPath();
}