Web scraper that collects data from Billboard's Hot 100 Chart.
Access to current data from other websites allows clients to extract valuable insghts for academic research, business trends, content aggregation, etc. Ideally, this is accomplished via an API because they are reliable and structured; however, not every website has an API for clients to consume.
Web scraping is the process of extracting data by "scraping" text content associated with specific CSS selectors. Although web scraping is a viable alternative to an API, it requires manual formatting/structuring and can violate a website's TOS.
This project extracts Billboard Hot 100's data using Node.js and Puppeteer. Specifically, it launches a web browser, captures a full-length screenshot, and returns extracted data as a JSON object.
- JavaScript
- Node.js
- Puppeteer
- Visual Studio Code
Through this project, I was able appreciate the power of Puppeteer and its functionalities (automated browser-launching, crawling, etc). It was challenging to find specific CSS selectors that would extract the correct data, but it was mostly achieved via trial and error. Future improvements may include extracting data from Billboard's other charts, displaying data via UIs, and/or automating the process to occur every specified interval.