Web Dehydrator: Any web-page to JSON!

Web Dehydrator is a tool that helps transform web-pages to JSON. Uses Zend Framework 2, Symfony DomCrawler and PhantomJS

Posted on July 07, 2013

WARNING: Please note that this article was published a long time ago. The information contained might be outdated.

My new project is online. It's called Web Dehydrator and it can be described as a tool that transforms any web-page to JSON. Web Dehydrator is made by a mix of Zend Framework 2, Symfony DomCrawler and PhantomJS. This is what each component does:

  • PhantomJS is used to retrieve the content of a web-page
  • a plugin manager runs a set of plugins built to extract data (via Symfony DomCrawler) from the content of the web-page
  • the extracted data is used to create the JSON result
  • Zend Framework 2 sticks all together via service manager, event manager, caching and MVC Layer.

I haven't published the code behind the Web Dehydrator service, but I could share it if someone is interested in helping.

The following is a sample JSON output of the result of the data extracted from the http://www.dilbert.com/ website: