外设数据到昇腾310推理卡之三 pgprot_dmacoherent

原创已于 2025-07-19 16:15:46 修改 · 573 阅读

21 ·

CC 4.0 BY-SA版权

文章标签：

#linux #MMAP

于 2025-07-19 15:56:07 首次发布

驱动之源同时被 2 个专栏收录

38 篇文章

订阅专栏

图形图像

13 篇文章

订阅专栏

vb2_queue示例之rockchip capture

attrs属性初始化

内核源码及功能说明

文件	函数及说明

kernel\drivers\media\platform\sunxi\sun4i-csi\sun4i_v4l2.c \kernel\drivers\media\platform\rockchip\isp1\capture.c kernel\drivers\media\platform\rockchip\isp1\capture.c	sun4i_csi_dma_register rkisp_init_vb2_queue 具体设备驱动的ioctl入口
\kernel\drivers\media\common\videobuf2\videobuf2-v4l2.c	ioctl的通用入口 vb2_queue_init
kernel\drivers\media\common\videobuf2\videobuf2-core.c	ioctl接口的具体实现vb2_core_queue_init
kernel\drivers\media\common\videobuf2\videobuf2-dma-contig.c

vb2_queue

vb2_queue的作用主要是收发缓存的管理，包括分配的接口、分配缓存的队列管理等。在整个V4L2中的位置如上图。

这里着重强调的一点：此结构的初始化在具体设备初始化中进行的。即设备才清楚自己支持的IO 模式（MMAP，DMABUF），设备使用的内存分配接口，设备需要的dma属性等。

vb2_queue示例之sun4i

int sun4i_csi_dma_register(struct sun4i_csi *csi, int irq)
{
	struct vb2_queue *q = &csi->queue;
	q->min_buffers_needed = 3;
	q->type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
	q->io_modes = VB2_MMAP | VB2_DMABUF;
	q->lock = &csi->lock;
	q->drv_priv = csi;
	q->buf_struct_size = sizeof(struct sun4i_csi_buffer);
	q->ops = &sun4i_csi_qops;
	q->mem_ops = &vb2_dma_contig_memops;
	q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
	q->dev = csi->dev;

	ret = vb2_queue_init(q);

从上述的代码中可以得知：

1）此款控制器不支持USERPTR的接口，因而用户在编写程序时就不能采用此种接口。

2）此款控制器采用的contig_memops进行内存管理。

3）此款控制器没有对dma_attrs进行特殊控制。而设备初始化时分配的结构体默认设置了0。

4）最后调用了公共的接口

vb2_queue示例之rockchip capture

static int rkisp_init_vb2_queue(struct vb2_queue *q,
				struct rkisp_stream *stream,
				enum v4l2_buf_type buf_type)
{
	q->type = buf_type;
	q->io_modes = VB2_MMAP | VB2_USERPTR | VB2_DMABUF;
	q->drv_priv = stream;
	q->ops = &rkisp_vb2_ops;
	q->mem_ops = stream->ispdev->hw_dev->mem_ops;
	q->buf_struct_size = sizeof(struct rkisp_buffer);
	q->min_buffers_needed = CIF_ISP_REQ_BUFS_MIN;
	q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
	q->lock = &stream->apilock;
	q->dev = stream->ispdev->hw_dev->dev;
	q->allow_cache_hints = 1;
	q->bidirectional = 1;
	if (stream->ispdev->hw_dev->is_dma_contig)
		q->dma_attrs = DMA_ATTR_FORCE_CONTIGUOUS;
	q->gfp_flags = GFP_DMA32;
	return vb2_queue_init(q);
}

这里主要差异：

1）设置了 allow_cache_hints 设置后，用户空间可以传递缓存管理提示，以便跳过 ->prepare() 或/和 ->finish() 上的缓存刷新/失效。

2) 设置了dma_attrs ,仅仅配置了连续物理地址的属性，cache的控制并没有设置。

我们检索内核代码，只有在如下部分看到cache相关属性，而视频采集部分没看到相关配置。

./drivers/rknpu/rknpu_gem.c:160:#ifdef DMA_ATTR_SYS_CACHE_ONLY

./drivers/rknpu/rknpu_gem.c:161: rknpu_obj->dma_attrs |= DMA_ATTR_SYS_CACHE_ONLY;

mmap

内核源码及功能说明

文件	函数及说明

kernel\drivers\media\platform\sunxi\sun4i-csi\sun4i_v4l2.c	struct v4l2_file_operations sun4i_csi_fops
\kernel\drivers\media\common\videobuf2\videobuf2-v4l2.c	ioctl的通用入口 vb2_fop_mmap
kernel\drivers\media\common\videobuf2\videobuf2-core.c	ioctl接口的具体实现vb2_mmap
kernel\drivers\media\common\videobuf2\videobuf2-dma-contig.c	vb2_dc_mmap mmap具体实现

mmap page类型配置

#ifdef CONFIG_MMU
/*
 * Return the page attributes used for mapping dma_alloc_* memory, either in
 * kernel space if remapping is needed, or to userspace through dma_mmap_*.
 */
pgprot_t dma_pgprot(struct device *dev, pgprot_t prot, unsigned long attrs)
{
	if (force_dma_unencrypted(dev))
		prot = pgprot_decrypted(prot);
	if (dev_is_dma_coherent(dev))
		return prot;
#ifdef CONFIG_ARCH_HAS_DMA_WRITE_COMBINE
	if (attrs & DMA_ATTR_WRITE_COMBINE)
		return pgprot_writecombine(prot);
#endif
	if (attrs & DMA_ATTR_SYS_CACHE_ONLY ||
	    attrs & DMA_ATTR_SYS_CACHE_ONLY_NWA)
		return pgprot_syscached(prot);
	return pgprot_dmacoherent(prot);
}
#endif /* CONFIG_MMU */

根据前面对设备区分的分析，最后映射时，设置属性的接口为 pgprot_dmacoherent

pgprot_dmacoherent

根据上述的检索代码，可以看到arm64 单独定义了此函数，具体如下

/*
 * DMA allocations for non-coherent devices use what the Arm architecture calls
 * "Normal non-cacheable" memory, which permits speculation, unaligned accesses
 * and merging of writes.  This is different from "Device-nGnR[nE]" memory which
 * is intended for MMIO and thus forbids speculation, preserves access size,
 * requires strict alignment and can also force write responses to come from the
 * endpoint.
 */
#define pgprot_dmacoherent(prot) \
	__pgprot_modify(prot, PTE_ATTRINDX_MASK, \
			PTE_ATTRINDX(MT_NORMAL_NC) | PTE_PXN | PTE_UXN)

而对于大部分SOC，其定义如下，即no cache

#ifndef pgprot_dmacoherent
#define pgprot_dmacoherent(prot)	pgprot_noncached(prot)
#endif

arm64的no cache定义

#define pgprot_noncached(prot) \
	__pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_DEVICE_nGnRnE) | PTE_PXN | PTE_UXN)

通过上面的分析，我们了解到，在aarch64上，dma_alloc_attrs函数分配映射出的内存并非完全no cached的。

aarch64的内存类型

以下主要参考《\芯片资料\arm\Armv8-A memory model guide.pdf》和《\芯片资料\arm\armv8\DDI0487G_b_armv8_arm.pdf》

Armv6 and Armv7 include a third memory type: Strongly Ordered. In Armv8, this maps to Device_nGnRnE.
The Armv8-A architecture employs a weakly ordered model of memory. This means that the order of memory accesses is not necessarily required to be the same as the program order for load and store operations

Device

There are two memory types in Armv8-A: Normal memory and Device memory
There are four different types of device memory, defining the rules which memory accesses must obey.

As the memory type weakens those rules are relaxed:

• Device-nGnRnE is the most restrictive.----即pgprot_noncached

• Device-nGnRE

• Device-nGRE

• Device-GRE least restrictive

具体的含义就不再赘述，网上资料很多,而nGnRnE 是CPU 访问此种内存最慢的，什么优化措施都不能使用。

Normal

dmacoherent接口采用了第一种方式，即使用Non-cacheable的方式。更详细解释参考DDI0487G_b_armv8_arm.pdf page168页前后。

宏	内存类型	主要特性
`pgprot_dmacoherent`	`MT_NORMAL_NC`	- Normal Non-Cacheable（普通非缓存） - 允许推测执行、非对齐访问、写合并。 - 适用于非一致性 DMA设备。
`pgprot_noncached`	`MT_DEVICE_nGnRnE`	- Device-nGnRnE（严格设备内存） - 禁止推测、强制对齐、保留访问大小。 - 适用于 MMIO 或严格设备访问。

映射类型的数据测试

采用dmacohernet,nocached 及cached三种映射内存的参数，映射一段内存，分别进行读写测试，耗时信息如下：

通过用前述文章《mmap映射物理内存》的方法，我们确认了dmacohernet的映射方式，读写对应的内存不需要软件管理cache，数据也可以正确读写。

读：即从映射内存读取，拷贝到malloc的内存。

大小（M）	耗时(MS)	映射
512M	1 660	DMA
512	160	CACHE
512M	2 365	No cache
16	52	DMA
16	5	CACHE
16	74	No cache
1	3351 微秒	DMA
1	282 微秒	CACHE
1		No cache

写：即将malloc的内存，拷贝到映射内存中

大小	耗时(MS)	映射
512M	255	DMA
512	257	CACHE
512M		No cache
16	8.3	DMA
16	8.1	CACHE
16	60	No cache
1	457 微秒	DMA
1	435 微秒（多次连续影响）	CACHE
1	3349 微秒	No cache