Open source tool for open source researchers: How to use TG Collector to scrape Telegram channels?
Despite journalists and researchers who work with open sources are mostly tech-savvy people, not everyone is familiar with using command-line interface (CLI) tools nor have much time to do so. For a researcher, who is working on hundreds of different cases and monitoring online space to document any wrongdoings, war crimes, hate speech or disinformation, it is preferable to have a graphical user interface (GUI) tool to use it, first and most importantly to save time.
As there are a lot of awesome open-source tools and scripts for researchers which focus on Telegram, I want to add a new one to the list. So let me introduce you to a new open-source tool — TG Collector (TGC) — and how to use it.
TGC is a browser based application for scraping (collecting) Telegram messages from the channels. The purpose of this tool is to facilitate the workload of researchers who work with Telegram channels. As it is a tool, not a service, your personal data will not be collected (except anonymous usage statistics, described on the website). While using this tool, all data will be stored in your computer, specifically in your browser. Let’s see the process step by step.
First step — get your API keys
After accessing the tool, you will see the collection section where you can list your channels and start to collect messages from those channels. You can create and name the collection folder, but to start the process you should get your Telegram API.
Login popup will show up to direct you to the MyTelegram page where you can get your API keys.
Second step — add your channels
After login, you can start to add channels that you aim to collect messages from. First, create your collection, then insert channel handles (not name). The purpose of collection is to help you organize the channels under respective folders [here collection]. So you can keep separate your channels according to your topic or interest.
After inserting the channels, you will see general information about the channel such as creation date, subscriber number, description, name and handle.
If you have dozens or hundreds of channels, you can insert all of them at once by separating them with the comma.
Third step — collect
Once you have collection(s) you can start to collect messages. Select channels that you want to get data from. You can name your project whatever you want. Then choose which fields you want to get data from. For example, if you only need forwards, you can select only “fwdrom” which will give you information such as the URL of the post, forwarded from and to where, and when.
Also, you can select all data fields, which will give you a comprehensive overview.
Fourth step — download data
In the respective collection, you will have a second subsection which shows “collected messages”. You will find information about scraping date, status, the number of channels, and messages collected. To download data, you will have two options — JSON and CSV — depending on your need.
Feedback and contribution!
As the tool is open-source, you’re free to contribute or take it from here to improve!