Parse HTML in Java

This code example shows how to parse HTML in Java by using jsoup. As there are many libraries for various purposes, there are a lot of html parser in Java. A lot of developers ask question of which one is the best before they start picking a HTML parser. Jsoup is a very good start.

The following code example Java code accepts an url and find elements by class name and find all available links in the page.

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
 
public class Main{
	public static void main(String[] args) throws IOException {
 
		Document doc = Jsoup.connect("http://www.programcreek.com").get();
		Elements titles = doc.select(".entrytitle");
 
		//print all titles in main page
		for(Element e: titles){
			System.out.println("text: " +e.text());
			System.out.println("html: "+ e.html());
		}	
 
		//print all available links on page
		Elements links = doc.select("a[href]");
		for(Element l: links){
			System.out.println("link: " +l.attr("abs:href"));
		}
 
	}
}

You can download the jsoup Java html parser by simply google searching "jsoup".

Category >> Jsoup  
If you want someone to read your code, please put the code inside <pre><code> and </code></pre> tags. For example:
<pre><code> 
String foo = "bar";
</code></pre>
  • Aadil Musavir

    Richard Dickinson. This is because your class path is not correct. i follow the same steps and got this error. i was running project with this command java -cp target/htmlLParser-1.0-SNAPSHOT.jar com.fatBas.com.Main i was getting error because of -cp was not defined. then i run the class from main .java by right clicking on main.java . it work . hope this help

  • Clara

    ClassNotFoundException: org.jsoup.Jsoup …. easy solution, download JSoup (search google), and add it as a library in your project.

  • Richard Dickinson

    I’ve probably made an error compiling but when I try this I get these errors:

    java Main
    Exception in thread “main” java.lang.NoClassDefFoundError: org/jsoup/Jsoup
    at Main.main(Main.java:34)
    Caused by: java.lang.ClassNotFoundException: org.jsoup.Jsoup
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    … 1 more

    any ideas?