Java Code Examples for org.jsoup.select.NodeTraversor

The following examples show how to use org.jsoup.select.NodeTraversor. These examples are extracted from open source projects. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example 1
Source Project: act   Source File: PatentDocument.java    License: GNU General Public License v3.0 6 votes vote down vote up
private static List<String> extractTextFromHTML(DocumentBuilder docBuilder, NodeList textNodes)
    throws ParserConfigurationException, TransformerConfigurationException,
    TransformerException, XPathExpressionException {
  List<String> allTextList = new ArrayList<>(0);
  if (textNodes != null) {
    for (int i = 0; i < textNodes.getLength(); i++) {
      Node n = textNodes.item(i);
                  /* This extremely around-the-horn approach to handling text content is due to the mix of HTML and
                   * XML in the patent body.  We use Jsoup to parse the HTML entities we find in the body, and use
                   * its extremely convenient NodeVisitor API to recursively traverse the document and extract the
                   * text content in reasonable chunks.
                   */
      Document contentsDoc = Util.nodeToDocument(docBuilder, "body", n);
      String docText = Util.documentToString(contentsDoc);
      // With help from http://stackoverflow.com/questions/832620/stripping-html-tags-in-java
      org.jsoup.nodes.Document htmlDoc = Jsoup.parse(docText);
      HtmlVisitor visitor = new HtmlVisitor();
      NodeTraversor traversor = new NodeTraversor(visitor);
      traversor.traverse(htmlDoc);
      List<String> textSegments = visitor.getTextContent();
      allTextList.addAll(textSegments);
    }
  }
  return allTextList;
}
 
Example 2
Source Project: intellij-quarkus   Source File: HtmlToPlainText.java    License: Eclipse Public License 2.0 5 votes vote down vote up
/**
 * Format an Element to plain-text
 * @param element the root element to format
 * @return formatted text
 */
public String getPlainText(Element element) {
    FormattingVisitor formatter = new FormattingVisitor();
    NodeTraversor traversor = new NodeTraversor(formatter);
    traversor.traverse(element); // walk the DOM, and call .head() and .tail() for each node

    return formatter.toString();
}
 
Example 3
Source Project: firebase-android-sdk   Source File: HtmlToPlainText.java    License: Apache License 2.0 5 votes vote down vote up
/**
 * Format an Element to plain-text
 * @param element the root element to format
 * @return formatted text
 */
public String getPlainText(Element element) {
    FormattingVisitor formatter = new FormattingVisitor();
    NodeTraversor.traverse(formatter, element); // walk the DOM, and call .head() and .tail() for each node

    return formatter.toString();
}
 
Example 4
Source Project: lemminx   Source File: HtmlToPlainText.java    License: Eclipse Public License 2.0 5 votes vote down vote up
/**
 * Format an Element to plain-text
 * @param element the root element to format
 * @return formatted text
 */
public String getPlainText(Element element) {
	FormattingVisitor formatter = new FormattingVisitor();
	NodeTraversor traversor = new NodeTraversor(formatter);
	traversor.traverse(element); // walk the DOM, and call .head() and .tail() for each node

	return formatter.toString();
}
 
Example 5
Source Project: eclipse.jdt.ls   Source File: HtmlToPlainText.java    License: Eclipse Public License 2.0 5 votes vote down vote up
/**
 * Format an Element to plain-text
 * @param element the root element to format
 * @return formatted text
 */
public String getPlainText(Element element) {
	FormattingVisitor formatter = new FormattingVisitor();
	NodeTraversor traversor = new NodeTraversor(formatter);
	traversor.traverse(element); // walk the DOM, and call .head() and .tail() for each node

	return formatter.toString();
}
 
Example 6
Source Project: astor   Source File: W3CDom.java    License: GNU General Public License v2.0 5 votes vote down vote up
/**
 * Converts a jsoup document into the provided W3C Document. If required, you can set options on the output document
 * before converting.
 * @param in jsoup doc
 * @param out w3c doc
 * @see org.jsoup.helper.W3CDom#fromJsoup(org.jsoup.nodes.Document)
 */
public void convert(org.jsoup.nodes.Document in, Document out) {
    if (!StringUtil.isBlank(in.location()))
        out.setDocumentURI(in.location());

    org.jsoup.nodes.Element rootEl = in.child(0); // skip the #root node
    NodeTraversor traversor = new NodeTraversor(new W3CBuilder(out));
    traversor.traverse(rootEl);
}
 
Example 7
Source Project: astor   Source File: HtmlToPlainText.java    License: GNU General Public License v2.0 5 votes vote down vote up
/**
 * Format an Element to plain-text
 * @param element the root element to format
 * @return formatted text
 */
public String getPlainText(Element element) {
    FormattingVisitor formatter = new FormattingVisitor();
    NodeTraversor traversor = new NodeTraversor(formatter);
    traversor.traverse(element); // walk the DOM, and call .head() and .tail() for each node

    return formatter.toString();
}
 
Example 8
Source Project: astor   Source File: Node.java    License: GNU General Public License v2.0 5 votes vote down vote up
/**
 * Perform a depth-first traversal through this node and its descendants.
 * @param nodeVisitor the visitor callbacks to perform on each node
 * @return this node, for chaining
 */
public Node traverse(NodeVisitor nodeVisitor) {
    Validate.notNull(nodeVisitor);
    NodeTraversor traversor = new NodeTraversor(nodeVisitor);
    traversor.traverse(this);
    return this;
}
 
Example 9
Source Project: astor   Source File: W3CDom.java    License: GNU General Public License v2.0 5 votes vote down vote up
/**
 * Converts a jsoup document into the provided W3C Document. If required, you can set options on the output document
 * before converting.
 * @param in jsoup doc
 * @param out w3c doc
 * @see org.jsoup.helper.W3CDom#fromJsoup(org.jsoup.nodes.Document)
 */
public void convert(org.jsoup.nodes.Document in, Document out) {
    if (!StringUtil.isBlank(in.location()))
        out.setDocumentURI(in.location());

    org.jsoup.nodes.Element rootEl = in.child(0); // skip the #root node
    NodeTraversor.traverse(new W3CBuilder(out), rootEl);
}
 
Example 10
Source Project: astor   Source File: HtmlToPlainText.java    License: GNU General Public License v2.0 5 votes vote down vote up
/**
 * Format an Element to plain-text
 * @param element the root element to format
 * @return formatted text
 */
public String getPlainText(Element element) {
    FormattingVisitor formatter = new FormattingVisitor();
    NodeTraversor.traverse(formatter, element); // walk the DOM, and call .head() and .tail() for each node

    return formatter.toString();
}
 
Example 11
Source Project: astor   Source File: W3CDom.java    License: GNU General Public License v2.0 5 votes vote down vote up
/**
 * Converts a jsoup document into the provided W3C Document. If required, you can set options on the output document
 * before converting.
 * @param in jsoup doc
 * @param out w3c doc
 * @see org.jsoup.helper.W3CDom#fromJsoup(org.jsoup.nodes.Document)
 */
public void convert(org.jsoup.nodes.Document in, Document out) {
    if (!StringUtil.isBlank(in.location()))
        out.setDocumentURI(in.location());

    org.jsoup.nodes.Element rootEl = in.child(0); // skip the #root node
    NodeTraversor.traverse(new W3CBuilder(out), rootEl);
}
 
Example 12
Source Project: astor   Source File: HtmlToPlainText.java    License: GNU General Public License v2.0 5 votes vote down vote up
/**
 * Format an Element to plain-text
 * @param element the root element to format
 * @return formatted text
 */
public String getPlainText(Element element) {
    FormattingVisitor formatter = new FormattingVisitor();
    NodeTraversor.traverse(formatter, element); // walk the DOM, and call .head() and .tail() for each node

    return formatter.toString();
}
 
Example 13
Source Project: jsoup-learning   Source File: HtmlToPlainText.java    License: MIT License 5 votes vote down vote up
/**
 * Format an Element to plain-text
 * @param element the root element to format
 * @return formatted text
 */
public String getPlainText(Element element) {
    FormattingVisitor formatter = new FormattingVisitor();
    NodeTraversor traversor = new NodeTraversor(formatter);
    traversor.traverse(element); // walk the DOM, and call .head() and .tail() for each node

    return formatter.toString();
}
 
Example 14
Source Project: jsoup-learning   Source File: Node.java    License: MIT License 5 votes vote down vote up
/**
 * Perform a depth-first traversal through this node and its descendants.
 * @param nodeVisitor the visitor callbacks to perform on each node
 * @return this node, for chaining
 */
public Node traverse(NodeVisitor nodeVisitor) {
    Validate.notNull(nodeVisitor);
    NodeTraversor traversor = new NodeTraversor(nodeVisitor);
    traversor.traverse(this);
    return this;
}
 
Example 15
Source Project: storm-crawler   Source File: DocumentFragmentBuilder.java    License: Apache License 2.0 5 votes vote down vote up
public static DocumentFragment fromJsoup(
        org.jsoup.nodes.Document jsoupDocument) {
    HTMLDocumentImpl htmlDoc = new HTMLDocumentImpl();
    htmlDoc.setErrorChecking(false);
    DocumentFragment fragment = htmlDoc.createDocumentFragment();
    org.jsoup.nodes.Element rootEl = jsoupDocument.child(0); // skip the
                                                             // #root node
    NodeTraversor.traverse(new W3CBuilder(htmlDoc, fragment), rootEl);
    return fragment;
}
 
Example 16
Source Project: Recaf   Source File: Javadocs.java    License: MIT License 4 votes vote down vote up
private static String text(Element element) {
	FormattingVisitor formatter = new FormattingVisitor();
	NodeTraversor.traverse(formatter, element);
	return formatter.toString();
}
 
Example 17
Source Project: echo   Source File: HtmlToPlainText.java    License: Apache License 2.0 4 votes vote down vote up
public String getPlainText(Element element) {
  FormattingVisitor formatter = new FormattingVisitor();
  NodeTraversor traversor = new NodeTraversor(formatter);
  traversor.traverse(element);
  return formatter.toString();
}
 
Example 18
Source Project: astor   Source File: Cleaner.java    License: GNU General Public License v2.0 4 votes vote down vote up
private int copySafeNodes(Element source, Element dest) {
    CleaningVisitor cleaningVisitor = new CleaningVisitor(source, dest);
    NodeTraversor traversor = new NodeTraversor(cleaningVisitor);
    traversor.traverse(source);
    return cleaningVisitor.numDiscarded;
}
 
Example 19
Source Project: astor   Source File: Node.java    License: GNU General Public License v2.0 4 votes vote down vote up
protected void outerHtml(Appendable accum) {
    new NodeTraversor(new OuterHtmlVisitor(accum, getOutputSettings())).traverse(this);
}
 
Example 20
Source Project: astor   Source File: Cleaner.java    License: GNU General Public License v2.0 4 votes vote down vote up
private int copySafeNodes(Element source, Element dest) {
    CleaningVisitor cleaningVisitor = new CleaningVisitor(source, dest);
    NodeTraversor.traverse(cleaningVisitor, source);
    return cleaningVisitor.numDiscarded;
}
 
Example 21
Source Project: astor   Source File: Node.java    License: GNU General Public License v2.0 4 votes vote down vote up
/**
 * Perform a depth-first traversal through this node and its descendants.
 * @param nodeVisitor the visitor callbacks to perform on each node
 * @return this node, for chaining
 */
public Node traverse(NodeVisitor nodeVisitor) {
    Validate.notNull(nodeVisitor);
    NodeTraversor.traverse(nodeVisitor, this);
    return this;
}
 
Example 22
Source Project: astor   Source File: Node.java    License: GNU General Public License v2.0 4 votes vote down vote up
/**
 * Perform a depth-first filtering through this node and its descendants.
 * @param nodeFilter the filter callbacks to perform on each node
 * @return this node, for chaining
 */
public Node filter(NodeFilter nodeFilter) {
    Validate.notNull(nodeFilter);
    NodeTraversor.filter(nodeFilter, this);
    return this;
}
 
Example 23
Source Project: astor   Source File: Node.java    License: GNU General Public License v2.0 4 votes vote down vote up
protected void outerHtml(Appendable accum) {
    NodeTraversor.traverse(new OuterHtmlVisitor(accum, getOutputSettings()), this);
}
 
Example 24
Source Project: astor   Source File: Cleaner.java    License: GNU General Public License v2.0 4 votes vote down vote up
private int copySafeNodes(Element source, Element dest) {
    CleaningVisitor cleaningVisitor = new CleaningVisitor(source, dest);
    NodeTraversor.traverse(cleaningVisitor, source);
    return cleaningVisitor.numDiscarded;
}
 
Example 25
Source Project: astor   Source File: Node.java    License: GNU General Public License v2.0 4 votes vote down vote up
/**
 * Perform a depth-first traversal through this node and its descendants.
 * @param nodeVisitor the visitor callbacks to perform on each node
 * @return this node, for chaining
 */
public Node traverse(NodeVisitor nodeVisitor) {
    Validate.notNull(nodeVisitor);
    NodeTraversor.traverse(nodeVisitor, this);
    return this;
}
 
Example 26
Source Project: astor   Source File: Node.java    License: GNU General Public License v2.0 4 votes vote down vote up
/**
 * Perform a depth-first filtering through this node and its descendants.
 * @param nodeFilter the filter callbacks to perform on each node
 * @return this node, for chaining
 */
public Node filter(NodeFilter nodeFilter) {
    Validate.notNull(nodeFilter);
    NodeTraversor.filter(nodeFilter, this);
    return this;
}
 
Example 27
Source Project: astor   Source File: Node.java    License: GNU General Public License v2.0 4 votes vote down vote up
protected void outerHtml(Appendable accum) {
    NodeTraversor.traverse(new OuterHtmlVisitor(accum, getOutputSettings()), this);
}
 
Example 28
Source Project: jsoup-learning   Source File: Cleaner.java    License: MIT License 4 votes vote down vote up
private int copySafeNodes(Element source, Element dest) {
    CleaningVisitor cleaningVisitor = new CleaningVisitor(source, dest);
    NodeTraversor traversor = new NodeTraversor(cleaningVisitor);
    traversor.traverse(source);
    return cleaningVisitor.numDiscarded;
}
 
Example 29
Source Project: jsoup-learning   Source File: Node.java    License: MIT License 4 votes vote down vote up
protected void outerHtml(StringBuilder accum) {
    new NodeTraversor(new OuterHtmlVisitor(accum, getOutputSettings())).traverse(this);
}