使用 Python 识别英文数字验证码

原创于 2024-10-21 12:26:51 发布 · 425 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#python #开发语言

1. 环境准备
在开始之前，我们需要确保安装以下库：

bash

pip install pytesseract pillow requests
确保你已安装 Tesseract OCR 引擎，并将其路径添加到系统环境变量中。

2. 下载验证码图片
我们首先需要从网络上下载验证码图片并保存到本地。以下代码使用 requests 库实现这一功能：

python

import requests

def download_captcha(url, save_path):
response = requests.get(url)
if response.status_code == 200:
with open(save_path, 'wb') as f:
f.write(response.content)
print(f'验证码图片已保存为 {save_path}')
else:
print(f'下载失败: {response.status_code}')
3. 图像处理与 OCR 识别
接下来，我们使用 Pillow 和 pytesseract 库对验证码进行处理和识别：

python
更多内容联系1436423940
from PIL import Image
import pytesseract

def recognize_captcha(image_path):
# 打开图片
image = Image.open(image_path)
# 进行 OCR 识别
captcha_text = pytesseract.image_to_string(image, config='--psm 8')
print(f'识别结果: {captcha_text.strip()}')
return captcha_text.strip()
4. 自动化登录
最后，我们可以将识别出的验证码用于模拟登录操作。下面的代码使用 requests 发送 POST 请求：

python

def login(username, password, captcha):
url = 'https://captcha7.scrape.center/login'
data = {
'username': username,
'password': password,
'captcha': captcha
}

response = requests.post(url, data=data)
if response.status_code == 200:
print('登录成功')
else:
print(f'登录失败: {response.status_code}')
5. 主程序
整合上述代码，创建主程序：

python

def main():
captcha_url = 'https://captcha7.scrape.center/captcha.png'
captcha_path = 'captcha.png'

# 下载验证码图片
download_captcha(captcha_url, captcha_path)

# 识别验证码
captcha_text = recognize_captcha(captcha_path)

# 模拟登录
login('admin', 'admin', captcha_text)

if __name__ == '__main__':
main()