A scalable web crawler framework for Java.
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
Open Source Web Crawler for Java
Open-source Enterprise Grade Search Engine Software
A scalable, mature and versatile web crawler based on Apache Storm
SitemapGen4j is a library to generate XML sitemaps in Java.
The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)