A scalable web crawler framework for Java.
Open Source Web Crawler for Java
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
Open-source Enterprise Grade Search Engine Software
A scalable, mature and versatile web crawler based on Apache Storm
SitemapGen4j is a library to generate XML sitemaps in Java.
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)