Browse free open source Java Web Scrapers and projects below. Use the toggles on the left to filter open source Java Web Scrapers by OS, license, language, programming language, and project status.

  • Run applications fast and securely in a fully managed environment Icon
    Run applications fast and securely in a fully managed environment

    Cloud Run is a fully-managed compute platform that lets you run your code in a container directly on top of scalable infrastructure.

    Run frontend and backend services, batch jobs, deploy websites and applications, and queue processing workloads without the need to manage infrastructure.
    Try for free
  • The Complete PDF SDK Icon
    The Complete PDF SDK

    Bring PDF viewing, annotating, editing, creation, and generation to any web, mobile, desktop or server framework or application.

    Bring accurate PDF viewing, annotating, editing, creation, and generation to any web, mobile, desktop or server framework or application. Own the full document and data lifecycle by deploying on your own infrastructure without worrying about third-party server dependencies.
    Learn More
  • 1
    A Java implementation of a flexible and extensible web spider engine. Optional modules allow functionality to be added (searching dead links, testing the performance and scalability of a site, creating a sitemap, etc ..
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    Robust featureful multi-threaded CLI web spider using apache commons httpclient v3.0 written in java. ASpider downloads any files matching your given mime-types from a website. Tries to reg.exp. match emails by default, logging all results using log4j.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    GitGet

    GitGet

    Ever wanted to download only a part of a Git repository.

    Ever wanted to download only a part of a Git repository. Just paste the URL of the repo you want to download and sit back and enjoy. This simple java application makes use of Web Scraping and downloads only those files you need, thus helping you save your precious bandwidth and space.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    HtmlClient provides an SGML/HTML/XHTML parser and connection client making web-spidering as easy for developers as actually surfing the web with a premade browser. Based on Apache's HttpClient.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Call center software with almost endless possibilities Icon
    Call center software with almost endless possibilities

    Squaretalk is an integration-friendly cloud communications platform for sales and support teams.

    Powerful and versatile communications platform with automations and integrations made simple. So simple, you’ll have time for the other parts of your business.
    Learn More
  • 5
    Other spiders has a limited link depth, follows links not randomized or are combined with heavy indexing machines. This spider will has not link depth limits, randomize next url, that will be checked for new urls.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    WebCollector

    WebCollector is an open source web crawler framework based on Java.

    WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. Github: https://github.com/CrawlScript/WebCollector Demo: https://github.com/CrawlScript/WebCollector/blob/master/YahooCrawler.java
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next