A scalable web crawler framework for Java.
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
Open Source Web Crawler for Java
Open-source Enterprise Grade Search Engine Software
SitemapGen4j is a library to generate XML sitemaps in Java.
We introduce TACIT: An Open-Source Text Analysis, Crawling and Interpretation Tool. TACIT's plugin architecture has three main components: 1. Crawling plugins 2. Corpus management 3. Analysis plugins. TACIT's open-source plugin platform allows the architecture to easily adapt with the rapid developments text analysis.
REST and STREAMING crawlers of Twitter (java)