Parse HTML in Java

This code example shows how to parse HTML in Java by using jsoup. As there are many libraries for various purposes, there are a lot of html parser in Java. A lot of developers wonder which one is the best before they made a decision on an HTML parser. Jsoup is a very good start.

The following Java code accepts a url, finds elements by class name and finds all available links in the page.

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class Main{
	public static void main(String[] args) throws IOException {
		Document doc = Jsoup.connect("").get();
		Elements titles =".entrytitle");
		//print all titles in main page
		for(Element e: titles){
			System.out.println("text: " +e.text());
			System.out.println("html: "+ e.html());
		//print all available links on page
		Elements links ="a[href]");
		for(Element l: links){
			System.out.println("link: " +l.attr("abs:href"));

You can download the jsoup Java html parser by simply google searching "jsoup".

Category >> Java  
If you want someone to read your code, please put the code inside <pre><code> and </code></pre> tags. For example:
String foo = "bar";
  • Good post keep updating.

  • ccna training in pune

    Thanks , Good to know about new things here, Let me share this, . CCNA training in pune

  • Aadil Musavir

    Richard Dickinson. This is because your class path is not correct. i follow the same steps and got this error. i was running project with this command java -cp target/htmlLParser-1.0-SNAPSHOT.jar i was getting error because of -cp was not defined. then i run the class from main .java by right clicking on . it work . hope this help

  • Clara

    ClassNotFoundException: org.jsoup.Jsoup …. easy solution, download JSoup (search google), and add it as a library in your project.

  • Richard Dickinson

    I’ve probably made an error compiling but when I try this I get these errors:

    java Main
    Exception in thread “main” java.lang.NoClassDefFoundError: org/jsoup/Jsoup
    at Main.main(
    Caused by: java.lang.ClassNotFoundException: org.jsoup.Jsoup
    at Method)
    at java.lang.ClassLoader.loadClass(
    at sun.misc.Launcher$AppClassLoader.loadClass(
    at java.lang.ClassLoader.loadClass(
    … 1 more

    any ideas?