From the course: Machine Learning for Red Team Hackers by Infosec

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Preprocessing the dataset

Preprocessing the dataset

(upbeat music) - [Instructor] We've now gathered up all of our CAPTCHA images, and we are ready to build a classifier on them. So I'm going to start by just listing them. So here's the name of the folder, the captcha_images, and then just enumerate these. (keys thumping) So you can see we have a bunch of CAPTCHAs, great. So what we're going to want to do is extract their label, our target for prediction, so we iterate through every CAPTCHA and get the label. We need to define this function here that gets the label. (keys thumping) So what it does is get the base name, which is this part, the last one, for getting the initial path, split it using the dot so as to get this part, only dropping the png part, and there you go. Quick test, (keys thumping) fix that real quick, okay, (keys thumping) and this is the last one, so that's good. So now we have labels for everything, and you might think, "Great, now we just feed it into our neural network and we are good to go. It has the labels…

Contents