@antiagainst Great thanks for reply. my context is as follows, after comprehensive-bufferize pass: the ir is
// -----// IR Dump After IREEComprehensiveBufferize (iree-codegen-iree-comprehensive-bufferize) //----- //
module {
func.func @forward_dispatch_0_matmul_transpose_b_16x16x16_f16() {
%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f16
%cst_0 = arith.constant dense<[8.227530e-02, 1.696780e-01, -8.789060e-03, -2.351070e-01, -1.804200e-01, -1.469730e-01, -3.710940e-02, 8.691400e-02, -1.838380e-01, 8.837890e-02, -2.265630e-01, -2.456050e-01, -2.158200e-01, 1.945800e-01, 3.955080e-02, 1.032710e-01]> : tensor<16xf16>
%0 = bufferization.to_memref %cst_0 : memref<16xf16>
%1 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c0) flags(ReadOnly) : memref<16x16xf16, #hal.descriptor_type<storage_buffer>>
memref.assume_alignment %1, 64 : memref<16x16xf16, #hal.descriptor_type<storage_buffer>>
%2 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) alignment(64) offset(%c0) flags(ReadOnly) : memref<16x16xf16, #hal.descriptor_type<storage_buffer>>
memref.assume_alignment %2, 64 : memref<16x16xf16, #hal.descriptor_type<storage_buffer>>
%3 = hal.interface.binding.subspan set(0) binding(2) type(storage_buffer) alignment(64) offset(%c0) : memref<16x16xf16, #hal.descriptor_type<storage_buffer>>
memref.assume_alignment %3, 64 : memref<16x16xf16, #hal.descriptor_type<storage_buffer>>
%workgroup_id_x = hal.interface.workgroup.id[0] : index
%4 = affine.apply affine_map<()[s0] -> (s0 * 16)>()[%workgroup_id_x]
linalg.fill ins(%cst : f16) outs(%3 : memref<16x16xf16, #hal.descriptor_type<storage_buffer>>)
linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"]} ins(%1, %2 : memref<16x16xf16, #hal.descriptor_type<storage_buffer>>, memref<16x16xf16, #hal.descriptor_type<storage_buffer>>) outs(%3 : memref<16x16xf16, #hal.descriptor_type<storage_buffer>>) attrs = {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[16, 16], [16, 16], [0, 0, 16], [16, 16, 16]]>} {
^bb0(%in: f16, %in_1: f16, %out: f16):
%5 = arith.mulf %in, %in_1 : f16
%6 = arith.addf %out, %5 : f16
linalg.yield %6 : f16
}
%subview = memref.subview %0[%4] [16] [1] : memref<16xf16> to memref<16xf16, strided<[1], offset: ?>>
linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%subview : memref<16xf16, strided<[1], offset: ?>>) outs(%3 : memref<16x16xf16, #hal.descriptor_type<storage_buffer>>) {
^bb0(%in: f16, %out: f16):
%5 = arith.addf %out, %in : f16
linalg.yield %5 : f16
}
memref.copy %3, %3 : memref<16x16xf16, #hal.descriptor_type<storage_buffer>> to memref<16x16xf16, #hal.descriptor_type<storage_buffer>>
return
}
}
Then after many passes till to convert-to-spirv, the %cst_0 convert to a spirv.Constant which is a !spirv.array and bufferization.to_memref to unrealized_conversion_cast op.
%5 = "spirv.Constant"() <{value = dense<[8.227530e-02, 1.696780e-01, -8.789060e-03, -2.351070e-01, -1.804200e-01, -1.469730e-01, -3.710940e-02, 8.691400e-02, -1.838380e-01, 8.837890e-02, -2.265630e-01, -2.456050e-01, -2.158200e-01, 1.945800e-01, 3.955080e-02, 1.032710e-01]> : tensor<16xf16>}> : () -> !spirv.array<16 x f16>
%6 = "builtin.unrealized_conversion_cast"(%5) : (!spirv.array<16 x f16>) -> !spirv.ptr<!spirv.struct<(!spirv.array<16 x f16, stride=2> [0])>, StorageBuffer>
and this array will be accessed by index
%51 = "spirv.AccessChain"(%6, %46, %50) : (!spirv.ptr<!spirv.struct<(!spirv.array<16 x f16, stride=2> [0])>, StorageBuffer>, i32, i32) -> !spirv.ptr<f16, StorageBuffer>
Should I convert the bufferization.to_memref op to a memref.store or vector.transfer_write op in some pass? currently this op is unchanged until convert-to-spirv pass.