org.jsoup.helper.DataUtil Java Examples

The following examples show how to use org.jsoup.helper.DataUtil. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: Html.java    From flow with Apache License 2.0 6 votes vote down vote up
/**
 * Creates an instance based on the HTML fragment read from the stream. The
 * fragment must have exactly one root element.
 * <p>
 * A best effort is done to parse broken HTML but no guarantees are given
 * for how invalid HTML is handled.
 * <p>
 * Any heading or trailing whitespace is removed while parsing but any
 * whitespace inside the root tag is preserved.
 *
 * @param stream
 *            the input stream which provides the HTML in UTF-8
 * @throws UncheckedIOException
 *             if reading the stream fails
 */
public Html(InputStream stream) {
    super(null);
    if (stream == null) {
        throw new IllegalArgumentException("HTML stream cannot be null");
    }
    try {
        /*
         * Cannot use any of the methods that accept a stream since they all
         * parse as a document rather than as a body fragment. The logic for
         * reading a stream into a String is the same that is used
         * internally by JSoup if you strip away all the logic to guess an
         * encoding in case one isn't defined.
         */
        setOuterHtml(UTF_8.decode(DataUtil.readToByteBuffer(stream, 0))
                .toString());
    } catch (IOException e) {
        throw new UncheckedIOException("Unable to read HTML from stream",
                e);
    }
}
 
Example #2
Source File: DomMapper.java    From mica with GNU Lesser General Public License v3.0 5 votes vote down vote up
/**
 * 将流读取为 jsoup Document
 *
 * @param inputStream InputStream
 * @return Document
 */
public static Document readDocument(InputStream inputStream) {
	try {
		return DataUtil.load(inputStream, StandardCharsets.UTF_8.name(), "");
	} catch (IOException e) {
		throw Exceptions.unchecked(e);
	}
}
 
Example #3
Source File: Jsoup.java    From jsoup-learning with MIT License votes vote down vote up
/**
 Read an input stream, and parse it to a Document. You can provide an alternate parser, such as a simple XML
 (non-HTML) parser.

 @param in          input stream to read. Make sure to close it after parsing.
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @param baseUri     The URL where the HTML was retrieved from, to resolve relative links against.
 @param parser alternate {@link Parser#xmlParser() parser} to use.
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 */
public static Document parse(InputStream in, String charsetName, String baseUri, Parser parser) throws IOException {
    return DataUtil.load(in, charsetName, baseUri, parser);
}
 
Example #4
Source File: Jsoup.java    From jsoup-learning with MIT License votes vote down vote up
/**
 Read an input stream, and parse it to a Document.

 @param in          input stream to read. Make sure to close it after parsing.
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @param baseUri     The URL where the HTML was retrieved from, to resolve relative links against.
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 */
public static Document parse(InputStream in, String charsetName, String baseUri) throws IOException {
    return DataUtil.load(in, charsetName, baseUri);
}
 
Example #5
Source File: Jsoup.java    From jsoup-learning with MIT License votes vote down vote up
/**
 Parse the contents of a file as HTML. The location of the file is used as the base URI to qualify relative URLs.

 @param in          file to load HTML from
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 @see #parse(File, String, String)
 */
public static Document parse(File in, String charsetName) throws IOException {
    return DataUtil.load(in, charsetName, in.getAbsolutePath());
}
 
Example #6
Source File: Jsoup.java    From jsoup-learning with MIT License votes vote down vote up
/**
 Parse the contents of a file as HTML.

 @param in          file to load HTML from
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @param baseUri     The URL where the HTML was retrieved from, to resolve relative links against.
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 */
public static Document parse(File in, String charsetName, String baseUri) throws IOException {
    return DataUtil.load(in, charsetName, baseUri);
}
 
Example #7
Source File: Jsoup.java    From astor with GNU General Public License v2.0 votes vote down vote up
/**
 Read an input stream, and parse it to a Document. You can provide an alternate parser, such as a simple XML
 (non-HTML) parser.

 @param in          input stream to read. Make sure to close it after parsing.
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @param baseUri     The URL where the HTML was retrieved from, to resolve relative links against.
 @param parser alternate {@link Parser#xmlParser() parser} to use.
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 */
public static Document parse(InputStream in, String charsetName, String baseUri, Parser parser) throws IOException {
    return DataUtil.load(in, charsetName, baseUri, parser);
}
 
Example #8
Source File: Jsoup.java    From astor with GNU General Public License v2.0 votes vote down vote up
/**
 Read an input stream, and parse it to a Document.

 @param in          input stream to read. Make sure to close it after parsing.
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @param baseUri     The URL where the HTML was retrieved from, to resolve relative links against.
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 */
public static Document parse(InputStream in, String charsetName, String baseUri) throws IOException {
    return DataUtil.load(in, charsetName, baseUri);
}
 
Example #9
Source File: Jsoup.java    From astor with GNU General Public License v2.0 votes vote down vote up
/**
 Parse the contents of a file as HTML. The location of the file is used as the base URI to qualify relative URLs.

 @param in          file to load HTML from
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 @see #parse(File, String, String)
 */
public static Document parse(File in, String charsetName) throws IOException {
    return DataUtil.load(in, charsetName, in.getAbsolutePath());
}
 
Example #10
Source File: Jsoup.java    From astor with GNU General Public License v2.0 votes vote down vote up
/**
 Parse the contents of a file as HTML.

 @param in          file to load HTML from
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @param baseUri     The URL where the HTML was retrieved from, to resolve relative links against.
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 */
public static Document parse(File in, String charsetName, String baseUri) throws IOException {
    return DataUtil.load(in, charsetName, baseUri);
}
 
Example #11
Source File: Jsoup.java    From astor with GNU General Public License v2.0 votes vote down vote up
/**
 Read an input stream, and parse it to a Document. You can provide an alternate parser, such as a simple XML
 (non-HTML) parser.

 @param in          input stream to read. Make sure to close it after parsing.
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @param baseUri     The URL where the HTML was retrieved from, to resolve relative links against.
 @param parser alternate {@link Parser#xmlParser() parser} to use.
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 */
public static Document parse(InputStream in, String charsetName, String baseUri, Parser parser) throws IOException {
    return DataUtil.load(in, charsetName, baseUri, parser);
}
 
Example #12
Source File: Jsoup.java    From astor with GNU General Public License v2.0 votes vote down vote up
/**
 Read an input stream, and parse it to a Document.

 @param in          input stream to read. Make sure to close it after parsing.
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @param baseUri     The URL where the HTML was retrieved from, to resolve relative links against.
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 */
public static Document parse(InputStream in, String charsetName, String baseUri) throws IOException {
    return DataUtil.load(in, charsetName, baseUri);
}
 
Example #13
Source File: Jsoup.java    From astor with GNU General Public License v2.0 votes vote down vote up
/**
 Parse the contents of a file as HTML. The location of the file is used as the base URI to qualify relative URLs.

 @param in          file to load HTML from
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 @see #parse(File, String, String)
 */
public static Document parse(File in, String charsetName) throws IOException {
    return DataUtil.load(in, charsetName, in.getAbsolutePath());
}
 
Example #14
Source File: Jsoup.java    From astor with GNU General Public License v2.0 votes vote down vote up
/**
 Parse the contents of a file as HTML.

 @param in          file to load HTML from
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @param baseUri     The URL where the HTML was retrieved from, to resolve relative links against.
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 */
public static Document parse(File in, String charsetName, String baseUri) throws IOException {
    return DataUtil.load(in, charsetName, baseUri);
}
 
Example #15
Source File: Jsoup.java    From astor with GNU General Public License v2.0 votes vote down vote up
/**
 Read an input stream, and parse it to a Document. You can provide an alternate parser, such as a simple XML
 (non-HTML) parser.

 @param in          input stream to read. Make sure to close it after parsing.
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @param baseUri     The URL where the HTML was retrieved from, to resolve relative links against.
 @param parser alternate {@link Parser#xmlParser() parser} to use.
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 */
public static Document parse(InputStream in, String charsetName, String baseUri, Parser parser) throws IOException {
    return DataUtil.load(in, charsetName, baseUri, parser);
}
 
Example #16
Source File: Jsoup.java    From astor with GNU General Public License v2.0 votes vote down vote up
/**
 Read an input stream, and parse it to a Document.

 @param in          input stream to read. Make sure to close it after parsing.
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @param baseUri     The URL where the HTML was retrieved from, to resolve relative links against.
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 */
public static Document parse(InputStream in, String charsetName, String baseUri) throws IOException {
    return DataUtil.load(in, charsetName, baseUri);
}
 
Example #17
Source File: Jsoup.java    From astor with GNU General Public License v2.0 votes vote down vote up
/**
 Parse the contents of a file as HTML. The location of the file is used as the base URI to qualify relative URLs.

 @param in          file to load HTML from
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 @see #parse(File, String, String)
 */
public static Document parse(File in, String charsetName) throws IOException {
    return DataUtil.load(in, charsetName, in.getAbsolutePath());
}
 
Example #18
Source File: Jsoup.java    From astor with GNU General Public License v2.0 votes vote down vote up
/**
 Parse the contents of a file as HTML.

 @param in          file to load HTML from
 @param charsetName (optional) character set of file contents. Set to {@code null} to determine from {@code http-equiv} meta tag, if
 present, or fall back to {@code UTF-8} (which is often safe to do).
 @param baseUri     The URL where the HTML was retrieved from, to resolve relative links against.
 @return sane HTML

 @throws IOException if the file could not be found, or read, or if the charsetName is invalid.
 */
public static Document parse(File in, String charsetName, String baseUri) throws IOException {
    return DataUtil.load(in, charsetName, baseUri);
}