Skip to content

smhussain5/BB100-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Billboard Hot 100 Web Scraper

Web scraper that collects data from Billboard's Hot 100 Chart.

Problem πŸ€”

Access to current data from other websites allows clients to extract valuable insghts for academic research, business trends, content aggregation, etc. Ideally, this is accomplished via an API because they are reliable and structured; however, not every website has an API for clients to consume.

Solution πŸ’‘

Web scraping is the process of extracting data by "scraping" text content associated with specific CSS selectors. Although web scraping is a viable alternative to an API, it requires manual formatting/structuring and can violate a website's TOS.

This project extracts Billboard Hot 100's data using Node.js and Puppeteer. Specifically, it launches a web browser, captures a full-length screenshot, and returns extracted data as a JSON object.

Technologies Used βš™

  • JavaScript
  • Node.js
  • Puppeteer
  • Visual Studio Code

Insights πŸ’­

Through this project, I was able appreciate the power of Puppeteer and its functionalities (automated browser-launching, crawling, etc). It was challenging to find specific CSS selectors that would extract the correct data, but it was mostly achieved via trial and error. Future improvements may include extracting data from Billboard's other charts, displaying data via UIs, and/or automating the process to occur every specified interval.

Contact πŸ“²

Static Badge
Static Badge
Static Badge
Static Badge

About

Web scraper that collects data from Billboard's Hot 100 Chart

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published