Category: HTML
Ruby Html Parser
October 23rd, 2010Java Html Parser
October 23rd, 2010In Java World, HTML Parsers play a role that change dirty, untidy html pages in real word into a well-formatted HTML files.
Pure-Java Solution
Here is a summary page: Open Source HTML Parsers in Java
Using Firefox-DLL
Besides all these, there is a method to using firefox-dll to parse html. This method can get best result.
But this method is very hard to deploy and easy to broken.
Benchmark
Here is a benchmark result, which shows that CyberNeko is winner.
Javascript and CSS2 support
However, if you want to support javascript and css2, Cobra could be a good answer. Please check following blog Using the Lobo Cobra Toolkit to Retrieve the HTML of Rendered Pages