org.apache.hadoop.hdfs.util.Canceler Java Examples

The following examples show how to use org.apache.hadoop.hdfs.util.Canceler. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: TransferFsImage.java    From big-c with Apache License 2.0 6 votes vote down vote up
/**
 * Requests that the NameNode download an image from this node.  Allows for
 * optional external cancelation.
 *
 * @param fsName the http address for the remote NN
 * @param conf Configuration
 * @param storage the storage directory to transfer the image from
 * @param nnf the NameNodeFile type of the image
 * @param txid the transaction ID of the image to be uploaded
 * @param canceler optional canceler to check for abort of upload
 * @throws IOException if there is an I/O error or cancellation
 */
public static void uploadImageFromStorage(URL fsName, Configuration conf,
    NNStorage storage, NameNodeFile nnf, long txid, Canceler canceler)
    throws IOException {
  URL url = new URL(fsName, ImageServlet.PATH_SPEC);
  long startTime = Time.monotonicNow();
  try {
    uploadImage(url, conf, storage, nnf, txid, canceler);
  } catch (HttpPutFailedException e) {
    if (e.getResponseCode() == HttpServletResponse.SC_CONFLICT) {
      // this is OK - this means that a previous attempt to upload
      // this checkpoint succeeded even though we thought it failed.
      LOG.info("Image upload with txid " + txid + 
          " conflicted with a previous image upload to the " +
          "same NameNode. Continuing...", e);
      return;
    } else {
      throw e;
    }
  }
  double xferSec = Math.max(
      ((float) (Time.monotonicNow() - startTime)) / 1000.0, 0.001);
  LOG.info("Uploaded image with txid " + txid + " to namenode at " + fsName
      + " in " + xferSec + " seconds");
}
 
Example #2
Source File: KeyValueContainerCheck.java    From hadoop-ozone with Apache License 2.0 6 votes vote down vote up
/**
 * full checks comprise scanning all metadata inside the container.
 * Including the KV database. These checks are intrusive, consume more
 * resources compared to fast checks and should only be done on Closed
 * or Quasi-closed Containers. Concurrency being limited to delete
 * workflows.
 * <p>
 * fullCheck is a superset of fastCheck
 *
 * @return true : integrity checks pass, false : otherwise.
 */
public boolean fullCheck(DataTransferThrottler throttler, Canceler canceler) {
  boolean valid;

  try {
    valid = fastCheck();
    if (valid) {
      scanData(throttler, canceler);
    }
  } catch (IOException e) {
    handleCorruption(e);
    valid = false;
  }

  return valid;
}
 
Example #3
Source File: TestStandbyCheckpoints.java    From hadoop with Apache License 2.0 6 votes vote down vote up
/**
 * Test for the case when the SBN is configured to checkpoint based
 * on a time period, but no transactions are happening on the
 * active. Thus, it would want to save a second checkpoint at the
 * same txid, which is a no-op. This test makes sure this doesn't
 * cause any problem.
 */
@Test(timeout = 300000)
public void testCheckpointWhenNoNewTransactionsHappened()
    throws Exception {
  // Checkpoint as fast as we can, in a tight loop.
  cluster.getConfiguration(1).setInt(
      DFSConfigKeys.DFS_NAMENODE_CHECKPOINT_PERIOD_KEY, 0);
  cluster.restartNameNode(1);
  nn1 = cluster.getNameNode(1);
 
  FSImage spyImage1 = NameNodeAdapter.spyOnFsImage(nn1);
  
  // We shouldn't save any checkpoints at txid=0
  Thread.sleep(1000);
  Mockito.verify(spyImage1, Mockito.never())
    .saveNamespace((FSNamesystem) Mockito.anyObject());
 
  // Roll the primary and wait for the standby to catch up
  HATestUtil.waitForStandbyToCatchUp(nn0, nn1);
  Thread.sleep(2000);
  
  // We should make exactly one checkpoint at this new txid. 
  Mockito.verify(spyImage1, Mockito.times(1)).saveNamespace(
      (FSNamesystem) Mockito.anyObject(), Mockito.eq(NameNodeFile.IMAGE),
      (Canceler) Mockito.anyObject());
}
 
Example #4
Source File: TestStandbyCheckpoints.java    From big-c with Apache License 2.0 6 votes vote down vote up
/**
 * Test for the case when the SBN is configured to checkpoint based
 * on a time period, but no transactions are happening on the
 * active. Thus, it would want to save a second checkpoint at the
 * same txid, which is a no-op. This test makes sure this doesn't
 * cause any problem.
 */
@Test(timeout = 300000)
public void testCheckpointWhenNoNewTransactionsHappened()
    throws Exception {
  // Checkpoint as fast as we can, in a tight loop.
  cluster.getConfiguration(1).setInt(
      DFSConfigKeys.DFS_NAMENODE_CHECKPOINT_PERIOD_KEY, 0);
  cluster.restartNameNode(1);
  nn1 = cluster.getNameNode(1);
 
  FSImage spyImage1 = NameNodeAdapter.spyOnFsImage(nn1);
  
  // We shouldn't save any checkpoints at txid=0
  Thread.sleep(1000);
  Mockito.verify(spyImage1, Mockito.never())
    .saveNamespace((FSNamesystem) Mockito.anyObject());
 
  // Roll the primary and wait for the standby to catch up
  HATestUtil.waitForStandbyToCatchUp(nn0, nn1);
  Thread.sleep(2000);
  
  // We should make exactly one checkpoint at this new txid. 
  Mockito.verify(spyImage1, Mockito.times(1)).saveNamespace(
      (FSNamesystem) Mockito.anyObject(), Mockito.eq(NameNodeFile.IMAGE),
      (Canceler) Mockito.anyObject());
}
 
Example #5
Source File: TransferFsImage.java    From hadoop with Apache License 2.0 6 votes vote down vote up
/**
 * Requests that the NameNode download an image from this node.  Allows for
 * optional external cancelation.
 *
 * @param fsName the http address for the remote NN
 * @param conf Configuration
 * @param storage the storage directory to transfer the image from
 * @param nnf the NameNodeFile type of the image
 * @param txid the transaction ID of the image to be uploaded
 * @param canceler optional canceler to check for abort of upload
 * @throws IOException if there is an I/O error or cancellation
 */
public static void uploadImageFromStorage(URL fsName, Configuration conf,
    NNStorage storage, NameNodeFile nnf, long txid, Canceler canceler)
    throws IOException {
  URL url = new URL(fsName, ImageServlet.PATH_SPEC);
  long startTime = Time.monotonicNow();
  try {
    uploadImage(url, conf, storage, nnf, txid, canceler);
  } catch (HttpPutFailedException e) {
    if (e.getResponseCode() == HttpServletResponse.SC_CONFLICT) {
      // this is OK - this means that a previous attempt to upload
      // this checkpoint succeeded even though we thought it failed.
      LOG.info("Image upload with txid " + txid + 
          " conflicted with a previous image upload to the " +
          "same NameNode. Continuing...", e);
      return;
    } else {
      throw e;
    }
  }
  double xferSec = Math.max(
      ((float) (Time.monotonicNow() - startTime)) / 1000.0, 0.001);
  LOG.info("Uploaded image with txid " + txid + " to namenode at " + fsName
      + " in " + xferSec + " seconds");
}
 
Example #6
Source File: FSImage.java    From big-c with Apache License 2.0 5 votes vote down vote up
/**
 * Save FSimage in the legacy format. This is not for NN consumption,
 * but for tools like OIV.
 */
public void saveLegacyOIVImage(FSNamesystem source, String targetDir,
    Canceler canceler) throws IOException {
  FSImageCompression compression =
      FSImageCompression.createCompression(conf);
  long txid = getLastAppliedOrWrittenTxId();
  SaveNamespaceContext ctx = new SaveNamespaceContext(source, txid,
      canceler);
  FSImageFormat.Saver saver = new FSImageFormat.Saver(ctx);
  String imageFileName = NNStorage.getLegacyOIVImageFileName(txid);
  File imageFile = new File(targetDir, imageFileName);
  saver.save(imageFile, compression);
  archivalManager.purgeOldLegacyOIVImages(targetDir, txid);
}
 
Example #7
Source File: SaveNamespaceContext.java    From big-c with Apache License 2.0 5 votes vote down vote up
SaveNamespaceContext(
    FSNamesystem sourceNamesystem,
    long txid,
    Canceler canceller) {
  this.sourceNamesystem = sourceNamesystem;
  this.txid = txid;
  this.canceller = canceller;
}
 
Example #8
Source File: TestFSImageWithSnapshot.java    From hadoop with Apache License 2.0 5 votes vote down vote up
/** Save the fsimage to a temp file */
private File saveFSImageToTempFile() throws IOException {
  SaveNamespaceContext context = new SaveNamespaceContext(fsn, txid,
      new Canceler());
  FSImageFormatProtobuf.Saver saver = new FSImageFormatProtobuf.Saver(context);
  FSImageCompression compression = FSImageCompression.createCompression(conf);
  File imageFile = getImageFile(testDir, txid);
  fsn.readLock();
  try {
    saver.save(imageFile, compression);
  } finally {
    fsn.readUnlock();
  }
  return imageFile;
}
 
Example #9
Source File: TransferFsImage.java    From big-c with Apache License 2.0 5 votes vote down vote up
private static void writeFileToPutRequest(Configuration conf,
    HttpURLConnection connection, File imageFile, Canceler canceler)
    throws FileNotFoundException, IOException {
  connection.setRequestProperty(CONTENT_TYPE, "application/octet-stream");
  connection.setRequestProperty(CONTENT_TRANSFER_ENCODING, "binary");
  OutputStream output = connection.getOutputStream();
  FileInputStream input = new FileInputStream(imageFile);
  try {
    copyFileToStream(output, imageFile, input,
        ImageServlet.getThrottler(conf), canceler);
  } finally {
    IOUtils.closeStream(input);
    IOUtils.closeStream(output);
  }
}
 
Example #10
Source File: ContainerDataScanner.java    From hadoop-ozone with Apache License 2.0 5 votes vote down vote up
public ContainerDataScanner(ContainerScrubberConfiguration conf,
                            ContainerController controller,
                            HddsVolume volume) {
  this.controller = controller;
  this.volume = volume;
  dataScanInterval = conf.getDataScanInterval();
  throttler = new HddsDataTransferThrottler(conf.getBandwidthPerVolume());
  canceler = new Canceler();
  metrics = ContainerDataScrubberMetrics.create(volume.toString());
  setName(String.format(NAME_FORMAT, volume));
  setDaemon(true);
}
 
Example #11
Source File: FSImage.java    From hadoop with Apache License 2.0 5 votes vote down vote up
/**
 * Save the contents of the FS image to a new image file in each of the
 * current storage directories.
 */
public synchronized void saveNamespace(FSNamesystem source, NameNodeFile nnf,
    Canceler canceler) throws IOException {
  assert editLog != null : "editLog must be initialized";
  LOG.info("Save namespace ...");
  storage.attemptRestoreRemovedStorage();

  boolean editLogWasOpen = editLog.isSegmentOpen();
  
  if (editLogWasOpen) {
    editLog.endCurrentLogSegment(true);
  }
  long imageTxId = getLastAppliedOrWrittenTxId();
  if (!addToCheckpointing(imageTxId)) {
    throw new IOException(
        "FS image is being downloaded from another NN at txid " + imageTxId);
  }
  try {
    try {
      saveFSImageInAllDirs(source, nnf, imageTxId, canceler);
      storage.writeAll();
    } finally {
      if (editLogWasOpen) {
        editLog.startLogSegment(imageTxId + 1, true);
        // Take this opportunity to note the current transaction.
        // Even if the namespace save was cancelled, this marker
        // is only used to determine what transaction ID is required
        // for startup. So, it doesn't hurt to update it unnecessarily.
        storage.writeTransactionIdFileToStorage(imageTxId + 1);
      }
    }
  } finally {
    removeFromCheckpointing(imageTxId);
  }
}
 
Example #12
Source File: FSImage.java    From hadoop with Apache License 2.0 5 votes vote down vote up
/**
 * Save FSimage in the legacy format. This is not for NN consumption,
 * but for tools like OIV.
 */
public void saveLegacyOIVImage(FSNamesystem source, String targetDir,
    Canceler canceler) throws IOException {
  FSImageCompression compression =
      FSImageCompression.createCompression(conf);
  long txid = getLastAppliedOrWrittenTxId();
  SaveNamespaceContext ctx = new SaveNamespaceContext(source, txid,
      canceler);
  FSImageFormat.Saver saver = new FSImageFormat.Saver(ctx);
  String imageFileName = NNStorage.getLegacyOIVImageFileName(txid);
  File imageFile = new File(targetDir, imageFileName);
  saver.save(imageFile, compression);
  archivalManager.purgeOldLegacyOIVImages(targetDir, txid);
}
 
Example #13
Source File: TransferFsImage.java    From hadoop with Apache License 2.0 5 votes vote down vote up
private static void writeFileToPutRequest(Configuration conf,
    HttpURLConnection connection, File imageFile, Canceler canceler)
    throws FileNotFoundException, IOException {
  connection.setRequestProperty(CONTENT_TYPE, "application/octet-stream");
  connection.setRequestProperty(CONTENT_TRANSFER_ENCODING, "binary");
  OutputStream output = connection.getOutputStream();
  FileInputStream input = new FileInputStream(imageFile);
  try {
    copyFileToStream(output, imageFile, input,
        ImageServlet.getThrottler(conf), canceler);
  } finally {
    IOUtils.closeStream(input);
    IOUtils.closeStream(output);
  }
}
 
Example #14
Source File: SaveNamespaceContext.java    From hadoop with Apache License 2.0 5 votes vote down vote up
SaveNamespaceContext(
    FSNamesystem sourceNamesystem,
    long txid,
    Canceler canceller) {
  this.sourceNamesystem = sourceNamesystem;
  this.txid = txid;
  this.canceller = canceller;
}
 
Example #15
Source File: FSImage.java    From big-c with Apache License 2.0 5 votes vote down vote up
/**
 * Save the contents of the FS image to a new image file in each of the
 * current storage directories.
 */
public synchronized void saveNamespace(FSNamesystem source, NameNodeFile nnf,
    Canceler canceler) throws IOException {
  assert editLog != null : "editLog must be initialized";
  LOG.info("Save namespace ...");
  storage.attemptRestoreRemovedStorage();

  boolean editLogWasOpen = editLog.isSegmentOpen();
  
  if (editLogWasOpen) {
    editLog.endCurrentLogSegment(true);
  }
  long imageTxId = getLastAppliedOrWrittenTxId();
  if (!addToCheckpointing(imageTxId)) {
    throw new IOException(
        "FS image is being downloaded from another NN at txid " + imageTxId);
  }
  try {
    try {
      saveFSImageInAllDirs(source, nnf, imageTxId, canceler);
      storage.writeAll();
    } finally {
      if (editLogWasOpen) {
        editLog.startLogSegment(imageTxId + 1, true);
        // Take this opportunity to note the current transaction.
        // Even if the namespace save was cancelled, this marker
        // is only used to determine what transaction ID is required
        // for startup. So, it doesn't hurt to update it unnecessarily.
        storage.writeTransactionIdFileToStorage(imageTxId + 1);
      }
    }
  } finally {
    removeFromCheckpointing(imageTxId);
  }
}
 
Example #16
Source File: TestFSImageWithSnapshot.java    From big-c with Apache License 2.0 5 votes vote down vote up
/** Save the fsimage to a temp file */
private File saveFSImageToTempFile() throws IOException {
  SaveNamespaceContext context = new SaveNamespaceContext(fsn, txid,
      new Canceler());
  FSImageFormatProtobuf.Saver saver = new FSImageFormatProtobuf.Saver(context);
  FSImageCompression compression = FSImageCompression.createCompression(conf);
  File imageFile = getImageFile(testDir, txid);
  fsn.readLock();
  try {
    saver.save(imageFile, compression);
  } finally {
    fsn.readUnlock();
  }
  return imageFile;
}
 
Example #17
Source File: TestContainerScrubberMetrics.java    From hadoop-ozone with Apache License 2.0 5 votes vote down vote up
private void setupMockContainer(
    Container<ContainerData> c, boolean shouldScanData,
    boolean scanMetaDataSuccess, boolean scanDataSuccess) {
  ContainerData data = mock(ContainerData.class);
  when(data.getContainerID()).thenReturn(containerIdSeq.getAndIncrement());
  when(c.getContainerData()).thenReturn(data);
  when(c.shouldScanData()).thenReturn(shouldScanData);
  when(c.scanMetaData()).thenReturn(scanMetaDataSuccess);
  when(c.scanData(any(DataTransferThrottler.class), any(Canceler.class)))
      .thenReturn(scanDataSuccess);
}
 
Example #18
Source File: KeyValueContainer.java    From hadoop-ozone with Apache License 2.0 5 votes vote down vote up
public boolean scanData(DataTransferThrottler throttler, Canceler canceler) {
  if (!shouldScanData()) {
    throw new IllegalStateException("The checksum verification can not be" +
        " done for container in state "
        + containerData.getState());
  }

  long containerId = containerData.getContainerID();
  KeyValueContainerCheck checker =
      new KeyValueContainerCheck(containerData.getMetadataPath(), config,
          containerId);

  return checker.fullCheck(throttler, canceler);
}
 
Example #19
Source File: FSImage.java    From hadoop with Apache License 2.0 4 votes vote down vote up
private synchronized void saveFSImageInAllDirs(FSNamesystem source,
    NameNodeFile nnf, long txid, Canceler canceler) throws IOException {
  StartupProgress prog = NameNode.getStartupProgress();
  prog.beginPhase(Phase.SAVING_CHECKPOINT);
  if (storage.getNumStorageDirs(NameNodeDirType.IMAGE) == 0) {
    throw new IOException("No image directories available!");
  }
  if (canceler == null) {
    canceler = new Canceler();
  }
  SaveNamespaceContext ctx = new SaveNamespaceContext(
      source, txid, canceler);
  
  try {
    List<Thread> saveThreads = new ArrayList<Thread>();
    // save images into current
    for (Iterator<StorageDirectory> it
           = storage.dirIterator(NameNodeDirType.IMAGE); it.hasNext();) {
      StorageDirectory sd = it.next();
      FSImageSaver saver = new FSImageSaver(ctx, sd, nnf);
      Thread saveThread = new Thread(saver, saver.toString());
      saveThreads.add(saveThread);
      saveThread.start();
    }
    waitForThreads(saveThreads);
    saveThreads.clear();
    storage.reportErrorsOnDirectories(ctx.getErrorSDs());

    if (storage.getNumStorageDirs(NameNodeDirType.IMAGE) == 0) {
      throw new IOException(
        "Failed to save in any storage directories while saving namespace.");
    }
    if (canceler.isCancelled()) {
      deleteCancelledCheckpoint(txid);
      ctx.checkCancelled(); // throws
      assert false : "should have thrown above!";
    }

    renameCheckpoint(txid, NameNodeFile.IMAGE_NEW, nnf, false);

    // Since we now have a new checkpoint, we can clean up some
    // old edit logs and checkpoints.
    purgeOldStorage(nnf);
  } finally {
    // Notify any threads waiting on the checkpoint to be canceled
    // that it is complete.
    ctx.markComplete();
    ctx = null;
  }
  prog.endPhase(Phase.SAVING_CHECKPOINT);
}
 
Example #20
Source File: TransferFsImage.java    From big-c with Apache License 2.0 4 votes vote down vote up
private static void copyFileToStream(OutputStream out, File localfile,
    FileInputStream infile, DataTransferThrottler throttler,
    Canceler canceler) throws IOException {
  byte buf[] = new byte[HdfsConstants.IO_FILE_BUFFER_SIZE];
  try {
    CheckpointFaultInjector.getInstance()
        .aboutToSendFile(localfile);

    if (CheckpointFaultInjector.getInstance().
          shouldSendShortFile(localfile)) {
        // Test sending image shorter than localfile
        long len = localfile.length();
        buf = new byte[(int)Math.min(len/2, HdfsConstants.IO_FILE_BUFFER_SIZE)];
        // This will read at most half of the image
        // and the rest of the image will be sent over the wire
        infile.read(buf);
    }
    int num = 1;
    while (num > 0) {
      if (canceler != null && canceler.isCancelled()) {
        throw new SaveNamespaceCancelledException(
          canceler.getCancellationReason());
      }
      num = infile.read(buf);
      if (num <= 0) {
        break;
      }
      if (CheckpointFaultInjector.getInstance()
            .shouldCorruptAByte(localfile)) {
        // Simulate a corrupted byte on the wire
        LOG.warn("SIMULATING A CORRUPT BYTE IN IMAGE TRANSFER!");
        buf[0]++;
      }
      
      out.write(buf, 0, num);
      if (throttler != null) {
        throttler.throttle(num, canceler);
      }
    }
  } catch (EofException e) {
    LOG.info("Connection closed by client");
    out = null; // so we don't close in the finally
  } finally {
    if (out != null) {
      out.close();
    }
  }
}
 
Example #21
Source File: FSImage.java    From big-c with Apache License 2.0 4 votes vote down vote up
private synchronized void saveFSImageInAllDirs(FSNamesystem source,
    NameNodeFile nnf, long txid, Canceler canceler) throws IOException {
  StartupProgress prog = NameNode.getStartupProgress();
  prog.beginPhase(Phase.SAVING_CHECKPOINT);
  if (storage.getNumStorageDirs(NameNodeDirType.IMAGE) == 0) {
    throw new IOException("No image directories available!");
  }
  if (canceler == null) {
    canceler = new Canceler();
  }
  SaveNamespaceContext ctx = new SaveNamespaceContext(
      source, txid, canceler);
  
  try {
    List<Thread> saveThreads = new ArrayList<Thread>();
    // save images into current
    for (Iterator<StorageDirectory> it
           = storage.dirIterator(NameNodeDirType.IMAGE); it.hasNext();) {
      StorageDirectory sd = it.next();
      FSImageSaver saver = new FSImageSaver(ctx, sd, nnf);
      Thread saveThread = new Thread(saver, saver.toString());
      saveThreads.add(saveThread);
      saveThread.start();
    }
    waitForThreads(saveThreads);
    saveThreads.clear();
    storage.reportErrorsOnDirectories(ctx.getErrorSDs());

    if (storage.getNumStorageDirs(NameNodeDirType.IMAGE) == 0) {
      throw new IOException(
        "Failed to save in any storage directories while saving namespace.");
    }
    if (canceler.isCancelled()) {
      deleteCancelledCheckpoint(txid);
      ctx.checkCancelled(); // throws
      assert false : "should have thrown above!";
    }

    renameCheckpoint(txid, NameNodeFile.IMAGE_NEW, nnf, false);

    // Since we now have a new checkpoint, we can clean up some
    // old edit logs and checkpoints.
    purgeOldStorage(nnf);
  } finally {
    // Notify any threads waiting on the checkpoint to be canceled
    // that it is complete.
    ctx.markComplete();
    ctx = null;
  }
  prog.endPhase(Phase.SAVING_CHECKPOINT);
}
 
Example #22
Source File: SecondaryNameNode.java    From big-c with Apache License 2.0 4 votes vote down vote up
/**
 * Create a new checkpoint
 * @return if the image is fetched from primary or not
 */
@VisibleForTesting
@SuppressWarnings("deprecated")
public boolean doCheckpoint() throws IOException {
  checkpointImage.ensureCurrentDirExists();
  NNStorage dstStorage = checkpointImage.getStorage();
  
  // Tell the namenode to start logging transactions in a new edit file
  // Returns a token that would be used to upload the merged image.
  CheckpointSignature sig = namenode.rollEditLog();
  
  boolean loadImage = false;
  boolean isFreshCheckpointer = (checkpointImage.getNamespaceID() == 0);
  boolean isSameCluster =
      (dstStorage.versionSupportsFederation(NameNodeLayoutVersion.FEATURES)
          && sig.isSameCluster(checkpointImage)) ||
      (!dstStorage.versionSupportsFederation(NameNodeLayoutVersion.FEATURES)
          && sig.namespaceIdMatches(checkpointImage));
  if (isFreshCheckpointer ||
      (isSameCluster &&
       !sig.storageVersionMatches(checkpointImage.getStorage()))) {
    // if we're a fresh 2NN, or if we're on the same cluster and our storage
    // needs an upgrade, just take the storage info from the server.
    dstStorage.setStorageInfo(sig);
    dstStorage.setClusterID(sig.getClusterID());
    dstStorage.setBlockPoolID(sig.getBlockpoolID());
    loadImage = true;
  }
  sig.validateStorageInfo(checkpointImage);

  // error simulation code for junit test
  CheckpointFaultInjector.getInstance().afterSecondaryCallsRollEditLog();

  RemoteEditLogManifest manifest =
    namenode.getEditLogManifest(sig.mostRecentCheckpointTxId + 1);

  // Fetch fsimage and edits. Reload the image if previous merge failed.
  loadImage |= downloadCheckpointFiles(
      fsName, checkpointImage, sig, manifest) |
      checkpointImage.hasMergeError();
  try {
    doMerge(sig, manifest, loadImage, checkpointImage, namesystem);
  } catch (IOException ioe) {
    // A merge error occurred. The in-memory file system state may be
    // inconsistent, so the image and edits need to be reloaded.
    checkpointImage.setMergeError();
    throw ioe;
  }
  // Clear any error since merge was successful.
  checkpointImage.clearMergeError();

  
  //
  // Upload the new image into the NameNode. Then tell the Namenode
  // to make this new uploaded image as the most current image.
  //
  long txid = checkpointImage.getLastAppliedTxId();
  TransferFsImage.uploadImageFromStorage(fsName, conf, dstStorage,
      NameNodeFile.IMAGE, txid);

  // error simulation code for junit test
  CheckpointFaultInjector.getInstance().afterSecondaryUploadsNewImage();

  LOG.warn("Checkpoint done. New Image Size: " 
           + dstStorage.getFsImageName(txid).length());

  if (legacyOivImageDir != null && !legacyOivImageDir.isEmpty()) {
    try {
      checkpointImage.saveLegacyOIVImage(namesystem, legacyOivImageDir,
          new Canceler());
    } catch (IOException e) {
      LOG.warn("Failed to write legacy OIV image: ", e);
    }
  }
  return loadImage;
}
 
Example #23
Source File: TestStandbyCheckpoints.java    From big-c with Apache License 2.0 4 votes vote down vote up
/**
 * Make sure that clients will receive StandbyExceptions even when a
 * checkpoint is in progress on the SBN, and therefore the StandbyCheckpointer
 * thread will have FSNS lock. Regression test for HDFS-4591.
 */
@Test(timeout=300000)
public void testStandbyExceptionThrownDuringCheckpoint() throws Exception {
  
  // Set it up so that we know when the SBN checkpoint starts and ends.
  FSImage spyImage1 = NameNodeAdapter.spyOnFsImage(nn1);
  DelayAnswer answerer = new DelayAnswer(LOG);
  Mockito.doAnswer(answerer).when(spyImage1)
      .saveNamespace(Mockito.any(FSNamesystem.class),
          Mockito.eq(NameNodeFile.IMAGE), Mockito.any(Canceler.class));

  // Perform some edits and wait for a checkpoint to start on the SBN.
  doEdits(0, 1000);
  nn0.getRpcServer().rollEditLog();
  answerer.waitForCall();
  assertTrue("SBN is not performing checkpoint but it should be.",
      answerer.getFireCount() == 1 && answerer.getResultCount() == 0);
  
  // Make sure that the lock has actually been taken by the checkpointing
  // thread.
  ThreadUtil.sleepAtLeastIgnoreInterrupts(1000);
  try {
    // Perform an RPC to the SBN and make sure it throws a StandbyException.
    nn1.getRpcServer().getFileInfo("/");
    fail("Should have thrown StandbyException, but instead succeeded.");
  } catch (StandbyException se) {
    GenericTestUtils.assertExceptionContains("is not supported", se);
  }

  // Make sure new incremental block reports are processed during
  // checkpointing on the SBN.
  assertEquals(0, cluster.getNamesystem(1).getPendingDataNodeMessageCount());
  doCreate();
  Thread.sleep(1000);
  assertTrue(cluster.getNamesystem(1).getPendingDataNodeMessageCount() > 0);
  
  // Make sure that the checkpoint is still going on, implying that the client
  // RPC to the SBN happened during the checkpoint.
  assertTrue("SBN should have still been checkpointing.",
      answerer.getFireCount() == 1 && answerer.getResultCount() == 0);
  answerer.proceed();
  answerer.waitForResult();
  assertTrue("SBN should have finished checkpointing.",
      answerer.getFireCount() == 1 && answerer.getResultCount() == 1);
}
 
Example #24
Source File: TestStandbyCheckpoints.java    From big-c with Apache License 2.0 4 votes vote down vote up
@Test(timeout=300000)
public void testReadsAllowedDuringCheckpoint() throws Exception {
  
  // Set it up so that we know when the SBN checkpoint starts and ends.
  FSImage spyImage1 = NameNodeAdapter.spyOnFsImage(nn1);
  DelayAnswer answerer = new DelayAnswer(LOG);
  Mockito.doAnswer(answerer).when(spyImage1)
      .saveNamespace(Mockito.any(FSNamesystem.class),
          Mockito.any(NameNodeFile.class),
          Mockito.any(Canceler.class));
  
  // Perform some edits and wait for a checkpoint to start on the SBN.
  doEdits(0, 1000);
  nn0.getRpcServer().rollEditLog();
  answerer.waitForCall();
  assertTrue("SBN is not performing checkpoint but it should be.",
      answerer.getFireCount() == 1 && answerer.getResultCount() == 0);
  
  // Make sure that the lock has actually been taken by the checkpointing
  // thread.
  ThreadUtil.sleepAtLeastIgnoreInterrupts(1000);
  
  // Perform an RPC that needs to take the write lock.
  Thread t = new Thread() {
    @Override
    public void run() {
      try {
        nn1.getRpcServer().restoreFailedStorage("false");
      } catch (IOException e) {
        e.printStackTrace();
      }
    }
  };
  t.start();
  
  // Make sure that our thread is waiting for the lock.
  ThreadUtil.sleepAtLeastIgnoreInterrupts(1000);
  
  assertFalse(nn1.getNamesystem().getFsLockForTests().hasQueuedThreads());
  assertFalse(nn1.getNamesystem().getFsLockForTests().isWriteLocked());
  assertTrue(nn1.getNamesystem().getCpLockForTests().hasQueuedThreads());
  
  // Get /jmx of the standby NN web UI, which will cause the FSNS read lock to
  // be taken.
  String pageContents = DFSTestUtil.urlGet(new URL("http://" +
      nn1.getHttpAddress().getHostName() + ":" +
      nn1.getHttpAddress().getPort() + "/jmx"));
  assertTrue(pageContents.contains("NumLiveDataNodes"));
  
  // Make sure that the checkpoint is still going on, implying that the client
  // RPC to the SBN happened during the checkpoint.
  assertTrue("SBN should have still been checkpointing.",
      answerer.getFireCount() == 1 && answerer.getResultCount() == 0);
  answerer.proceed();
  answerer.waitForResult();
  assertTrue("SBN should have finished checkpointing.",
      answerer.getFireCount() == 1 && answerer.getResultCount() == 1);
  
  t.join();
}
 
Example #25
Source File: TestSaveNamespace.java    From big-c with Apache License 2.0 4 votes vote down vote up
@Test(timeout=20000)
public void testCancelSaveNamespace() throws Exception {
  Configuration conf = getConf();
  NameNode.initMetrics(conf, NamenodeRole.NAMENODE);
  DFSTestUtil.formatNameNode(conf);
  FSNamesystem fsn = FSNamesystem.loadFromDisk(conf);

  // Replace the FSImage with a spy
  final FSImage image = fsn.getFSImage();
  NNStorage storage = image.getStorage();
  storage.close(); // unlock any directories that FSNamesystem's initialization may have locked
  storage.setStorageDirectories(
      FSNamesystem.getNamespaceDirs(conf), 
      FSNamesystem.getNamespaceEditsDirs(conf));

  FSNamesystem spyFsn = spy(fsn);
  final FSNamesystem finalFsn = spyFsn;
  DelayAnswer delayer = new GenericTestUtils.DelayAnswer(LOG);
  BlockIdManager bid = spy(spyFsn.getBlockIdManager());
  Whitebox.setInternalState(finalFsn, "blockIdManager", bid);
  doAnswer(delayer).when(bid).getGenerationStampV2();

  ExecutorService pool = Executors.newFixedThreadPool(2);
  
  try {
    doAnEdit(fsn, 1);
    final Canceler canceler = new Canceler();
    
    // Save namespace
    fsn.setSafeMode(SafeModeAction.SAFEMODE_ENTER);
    try {
      Future<Void> saverFuture = pool.submit(new Callable<Void>() {
        @Override
        public Void call() throws Exception {
          image.saveNamespace(finalFsn, NameNodeFile.IMAGE, canceler);
          return null;
        }
      });

      // Wait until saveNamespace calls getGenerationStamp
      delayer.waitForCall();
      // then cancel the saveNamespace
      Future<Void> cancelFuture = pool.submit(new Callable<Void>() {
        @Override
        public Void call() throws Exception {
          canceler.cancel("cancelled");
          return null;
        }
      });
      // give the cancel call time to run
      Thread.sleep(500);
      
      // allow saveNamespace to proceed - it should check the cancel flag after
      // this point and throw an exception
      delayer.proceed();

      cancelFuture.get();
      saverFuture.get();
      fail("saveNamespace did not fail even though cancelled!");
    } catch (Throwable t) {
      GenericTestUtils.assertExceptionContains(
          "SaveNamespaceCancelledException", t);
    }
    LOG.info("Successfully cancelled a saveNamespace");


    // Check that we have only the original image and not any
    // cruft left over from half-finished images
    FSImageTestUtil.logStorageContents(LOG, storage);
    for (StorageDirectory sd : storage.dirIterable(null)) {
      File curDir = sd.getCurrentDir();
      GenericTestUtils.assertGlobEquals(curDir, "fsimage_.*",
          NNStorage.getImageFileName(0),
          NNStorage.getImageFileName(0) + MD5FileUtils.MD5_SUFFIX);
    }      
  } finally {
    fsn.close();
  }
}
 
Example #26
Source File: TestSaveNamespace.java    From hadoop with Apache License 2.0 4 votes vote down vote up
@Test(timeout=20000)
public void testCancelSaveNamespace() throws Exception {
  Configuration conf = getConf();
  NameNode.initMetrics(conf, NamenodeRole.NAMENODE);
  DFSTestUtil.formatNameNode(conf);
  FSNamesystem fsn = FSNamesystem.loadFromDisk(conf);

  // Replace the FSImage with a spy
  final FSImage image = fsn.getFSImage();
  NNStorage storage = image.getStorage();
  storage.close(); // unlock any directories that FSNamesystem's initialization may have locked
  storage.setStorageDirectories(
      FSNamesystem.getNamespaceDirs(conf), 
      FSNamesystem.getNamespaceEditsDirs(conf));

  FSNamesystem spyFsn = spy(fsn);
  final FSNamesystem finalFsn = spyFsn;
  DelayAnswer delayer = new GenericTestUtils.DelayAnswer(LOG);
  BlockIdManager bid = spy(spyFsn.getBlockIdManager());
  Whitebox.setInternalState(finalFsn, "blockIdManager", bid);
  doAnswer(delayer).when(bid).getGenerationStampV2();

  ExecutorService pool = Executors.newFixedThreadPool(2);
  
  try {
    doAnEdit(fsn, 1);
    final Canceler canceler = new Canceler();
    
    // Save namespace
    fsn.setSafeMode(SafeModeAction.SAFEMODE_ENTER);
    try {
      Future<Void> saverFuture = pool.submit(new Callable<Void>() {
        @Override
        public Void call() throws Exception {
          image.saveNamespace(finalFsn, NameNodeFile.IMAGE, canceler);
          return null;
        }
      });

      // Wait until saveNamespace calls getGenerationStamp
      delayer.waitForCall();
      // then cancel the saveNamespace
      Future<Void> cancelFuture = pool.submit(new Callable<Void>() {
        @Override
        public Void call() throws Exception {
          canceler.cancel("cancelled");
          return null;
        }
      });
      // give the cancel call time to run
      Thread.sleep(500);
      
      // allow saveNamespace to proceed - it should check the cancel flag after
      // this point and throw an exception
      delayer.proceed();

      cancelFuture.get();
      saverFuture.get();
      fail("saveNamespace did not fail even though cancelled!");
    } catch (Throwable t) {
      GenericTestUtils.assertExceptionContains(
          "SaveNamespaceCancelledException", t);
    }
    LOG.info("Successfully cancelled a saveNamespace");


    // Check that we have only the original image and not any
    // cruft left over from half-finished images
    FSImageTestUtil.logStorageContents(LOG, storage);
    for (StorageDirectory sd : storage.dirIterable(null)) {
      File curDir = sd.getCurrentDir();
      GenericTestUtils.assertGlobEquals(curDir, "fsimage_.*",
          NNStorage.getImageFileName(0),
          NNStorage.getImageFileName(0) + MD5FileUtils.MD5_SUFFIX);
    }      
  } finally {
    fsn.close();
  }
}
 
Example #27
Source File: TestStandbyCheckpoints.java    From hadoop with Apache License 2.0 4 votes vote down vote up
@Test(timeout=300000)
public void testReadsAllowedDuringCheckpoint() throws Exception {
  
  // Set it up so that we know when the SBN checkpoint starts and ends.
  FSImage spyImage1 = NameNodeAdapter.spyOnFsImage(nn1);
  DelayAnswer answerer = new DelayAnswer(LOG);
  Mockito.doAnswer(answerer).when(spyImage1)
      .saveNamespace(Mockito.any(FSNamesystem.class),
          Mockito.any(NameNodeFile.class),
          Mockito.any(Canceler.class));
  
  // Perform some edits and wait for a checkpoint to start on the SBN.
  doEdits(0, 1000);
  nn0.getRpcServer().rollEditLog();
  answerer.waitForCall();
  assertTrue("SBN is not performing checkpoint but it should be.",
      answerer.getFireCount() == 1 && answerer.getResultCount() == 0);
  
  // Make sure that the lock has actually been taken by the checkpointing
  // thread.
  ThreadUtil.sleepAtLeastIgnoreInterrupts(1000);
  
  // Perform an RPC that needs to take the write lock.
  Thread t = new Thread() {
    @Override
    public void run() {
      try {
        nn1.getRpcServer().restoreFailedStorage("false");
      } catch (IOException e) {
        e.printStackTrace();
      }
    }
  };
  t.start();
  
  // Make sure that our thread is waiting for the lock.
  ThreadUtil.sleepAtLeastIgnoreInterrupts(1000);
  
  assertFalse(nn1.getNamesystem().getFsLockForTests().hasQueuedThreads());
  assertFalse(nn1.getNamesystem().getFsLockForTests().isWriteLocked());
  assertTrue(nn1.getNamesystem().getCpLockForTests().hasQueuedThreads());
  
  // Get /jmx of the standby NN web UI, which will cause the FSNS read lock to
  // be taken.
  String pageContents = DFSTestUtil.urlGet(new URL("http://" +
      nn1.getHttpAddress().getHostName() + ":" +
      nn1.getHttpAddress().getPort() + "/jmx"));
  assertTrue(pageContents.contains("NumLiveDataNodes"));
  
  // Make sure that the checkpoint is still going on, implying that the client
  // RPC to the SBN happened during the checkpoint.
  assertTrue("SBN should have still been checkpointing.",
      answerer.getFireCount() == 1 && answerer.getResultCount() == 0);
  answerer.proceed();
  answerer.waitForResult();
  assertTrue("SBN should have finished checkpointing.",
      answerer.getFireCount() == 1 && answerer.getResultCount() == 1);
  
  t.join();
}
 
Example #28
Source File: TestStandbyCheckpoints.java    From hadoop with Apache License 2.0 4 votes vote down vote up
/**
 * Make sure that clients will receive StandbyExceptions even when a
 * checkpoint is in progress on the SBN, and therefore the StandbyCheckpointer
 * thread will have FSNS lock. Regression test for HDFS-4591.
 */
@Test(timeout=300000)
public void testStandbyExceptionThrownDuringCheckpoint() throws Exception {
  
  // Set it up so that we know when the SBN checkpoint starts and ends.
  FSImage spyImage1 = NameNodeAdapter.spyOnFsImage(nn1);
  DelayAnswer answerer = new DelayAnswer(LOG);
  Mockito.doAnswer(answerer).when(spyImage1)
      .saveNamespace(Mockito.any(FSNamesystem.class),
          Mockito.eq(NameNodeFile.IMAGE), Mockito.any(Canceler.class));

  // Perform some edits and wait for a checkpoint to start on the SBN.
  doEdits(0, 1000);
  nn0.getRpcServer().rollEditLog();
  answerer.waitForCall();
  assertTrue("SBN is not performing checkpoint but it should be.",
      answerer.getFireCount() == 1 && answerer.getResultCount() == 0);
  
  // Make sure that the lock has actually been taken by the checkpointing
  // thread.
  ThreadUtil.sleepAtLeastIgnoreInterrupts(1000);
  try {
    // Perform an RPC to the SBN and make sure it throws a StandbyException.
    nn1.getRpcServer().getFileInfo("/");
    fail("Should have thrown StandbyException, but instead succeeded.");
  } catch (StandbyException se) {
    GenericTestUtils.assertExceptionContains("is not supported", se);
  }

  // Make sure new incremental block reports are processed during
  // checkpointing on the SBN.
  assertEquals(0, cluster.getNamesystem(1).getPendingDataNodeMessageCount());
  doCreate();
  Thread.sleep(1000);
  assertTrue(cluster.getNamesystem(1).getPendingDataNodeMessageCount() > 0);
  
  // Make sure that the checkpoint is still going on, implying that the client
  // RPC to the SBN happened during the checkpoint.
  assertTrue("SBN should have still been checkpointing.",
      answerer.getFireCount() == 1 && answerer.getResultCount() == 0);
  answerer.proceed();
  answerer.waitForResult();
  assertTrue("SBN should have finished checkpointing.",
      answerer.getFireCount() == 1 && answerer.getResultCount() == 1);
}
 
Example #29
Source File: SecondaryNameNode.java    From hadoop with Apache License 2.0 4 votes vote down vote up
/**
 * Create a new checkpoint
 * @return if the image is fetched from primary or not
 */
@VisibleForTesting
@SuppressWarnings("deprecated")
public boolean doCheckpoint() throws IOException {
  checkpointImage.ensureCurrentDirExists();
  NNStorage dstStorage = checkpointImage.getStorage();
  
  // Tell the namenode to start logging transactions in a new edit file
  // Returns a token that would be used to upload the merged image.
  CheckpointSignature sig = namenode.rollEditLog();
  
  boolean loadImage = false;
  boolean isFreshCheckpointer = (checkpointImage.getNamespaceID() == 0);
  boolean isSameCluster =
      (dstStorage.versionSupportsFederation(NameNodeLayoutVersion.FEATURES)
          && sig.isSameCluster(checkpointImage)) ||
      (!dstStorage.versionSupportsFederation(NameNodeLayoutVersion.FEATURES)
          && sig.namespaceIdMatches(checkpointImage));
  if (isFreshCheckpointer ||
      (isSameCluster &&
       !sig.storageVersionMatches(checkpointImage.getStorage()))) {
    // if we're a fresh 2NN, or if we're on the same cluster and our storage
    // needs an upgrade, just take the storage info from the server.
    dstStorage.setStorageInfo(sig);
    dstStorage.setClusterID(sig.getClusterID());
    dstStorage.setBlockPoolID(sig.getBlockpoolID());
    loadImage = true;
  }
  sig.validateStorageInfo(checkpointImage);

  // error simulation code for junit test
  CheckpointFaultInjector.getInstance().afterSecondaryCallsRollEditLog();

  RemoteEditLogManifest manifest =
    namenode.getEditLogManifest(sig.mostRecentCheckpointTxId + 1);

  // Fetch fsimage and edits. Reload the image if previous merge failed.
  loadImage |= downloadCheckpointFiles(
      fsName, checkpointImage, sig, manifest) |
      checkpointImage.hasMergeError();
  try {
    doMerge(sig, manifest, loadImage, checkpointImage, namesystem);
  } catch (IOException ioe) {
    // A merge error occurred. The in-memory file system state may be
    // inconsistent, so the image and edits need to be reloaded.
    checkpointImage.setMergeError();
    throw ioe;
  }
  // Clear any error since merge was successful.
  checkpointImage.clearMergeError();

  
  //
  // Upload the new image into the NameNode. Then tell the Namenode
  // to make this new uploaded image as the most current image.
  //
  long txid = checkpointImage.getLastAppliedTxId();
  TransferFsImage.uploadImageFromStorage(fsName, conf, dstStorage,
      NameNodeFile.IMAGE, txid);

  // error simulation code for junit test
  CheckpointFaultInjector.getInstance().afterSecondaryUploadsNewImage();

  LOG.warn("Checkpoint done. New Image Size: " 
           + dstStorage.getFsImageName(txid).length());

  if (legacyOivImageDir != null && !legacyOivImageDir.isEmpty()) {
    try {
      checkpointImage.saveLegacyOIVImage(namesystem, legacyOivImageDir,
          new Canceler());
    } catch (IOException e) {
      LOG.warn("Failed to write legacy OIV image: ", e);
    }
  }
  return loadImage;
}
 
Example #30
Source File: TransferFsImage.java    From hadoop with Apache License 2.0 4 votes vote down vote up
private static void copyFileToStream(OutputStream out, File localfile,
    FileInputStream infile, DataTransferThrottler throttler,
    Canceler canceler) throws IOException {
  byte buf[] = new byte[HdfsConstants.IO_FILE_BUFFER_SIZE];
  try {
    CheckpointFaultInjector.getInstance()
        .aboutToSendFile(localfile);

    if (CheckpointFaultInjector.getInstance().
          shouldSendShortFile(localfile)) {
        // Test sending image shorter than localfile
        long len = localfile.length();
        buf = new byte[(int)Math.min(len/2, HdfsConstants.IO_FILE_BUFFER_SIZE)];
        // This will read at most half of the image
        // and the rest of the image will be sent over the wire
        infile.read(buf);
    }
    int num = 1;
    while (num > 0) {
      if (canceler != null && canceler.isCancelled()) {
        throw new SaveNamespaceCancelledException(
          canceler.getCancellationReason());
      }
      num = infile.read(buf);
      if (num <= 0) {
        break;
      }
      if (CheckpointFaultInjector.getInstance()
            .shouldCorruptAByte(localfile)) {
        // Simulate a corrupted byte on the wire
        LOG.warn("SIMULATING A CORRUPT BYTE IN IMAGE TRANSFER!");
        buf[0]++;
      }
      
      out.write(buf, 0, num);
      if (throttler != null) {
        throttler.throttle(num, canceler);
      }
    }
  } catch (EofException e) {
    LOG.info("Connection closed by client");
    out = null; // so we don't close in the finally
  } finally {
    if (out != null) {
      out.close();
    }
  }
}