Handling strided memref in Affine Super Vectorize

Hello, I’m trying to vectorize the following IR using:
-affine-simplify-structures -affine-super-vectorize="virtual-vector-size=8 test-fastest-varying=0"

module {
  func.func @func_failing() -> i32 {
    %c0 = arith.constant 0 : index
    
    %alloc = memref.alloc() : memref<16x16xi32>
    %subview = memref.subview %alloc[0, 0] [8, 16] [1, 1] : memref<16x16xi32> to memref<8x16xi32, strided<[16, 1]>>
    
    affine.for %i = 0 to 8 {
      affine.for %j = 0 to 8 {
        %val = affine.load %subview[%i, %j] : memref<8x16xi32, strided<[16, 1]>>
        affine.store %val, %subview[%i, %j] : memref<8x16xi32, strided<[16, 1]>>
      }
    }
    
    %result = affine.load %subview[%c0, %c0] : memref<8x16xi32, strided<[16, 1]>>
    memref.dealloc %alloc : memref<16x16xi32>
    return %result : i32
  }
}

However, the vectorization is failing with the following error:

error: NYI: non-trivial layout map
        %val = affine.load %subview[%i, %j] : memref<8x16xi32, strided<[16, 1]>>
               ^

I tried modifying this check with the following code, and it resolves the issue for me:

  if (auto layout = dyn_cast<AffineMapAttr>(memRefType.getLayout()))
    if (!layout.getAffineMap().isIdentity())
      return memoryOp.emitError("NYI: non-trivial layout map"), false;

Question:
Should I be adding more checks for strided layouts to ensure that I’m not violating any vectorization constraints?
If yes, How should I handle dynamic stride from strided layout and dynamic size?

Thanks!

Does it actually resolve it or does it just produce some IR that may not be doing what you expect, e.g., reading consecutive elements regardless of strides?

Affine vectorized predates strided memrefs so it is likely not accounting for anything but dense row-major storage.

In my case, the data is stored in dense row-major format, at least for the dimensions I’m attempting to vectorize.

Thanks for the insight.

Would it be safe to assume that if I’m vectorizing along a dimension, and that dimension is contiguous, then the vectorization will generate correct code?

For example, if my memref is of type memref<axbxcxf32, strided<[z,c,1]>> and I’m using the vectorization arguments -affine-super-vectorize="virtual-vector-size=8,8 test-fastest-varying=1,0" can I expect correct vectorized code generation?

I don’t remember offhand. This is rather old and rarely used code. There is a long description starting at llvm-project/mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp at 36cbd43ae8d5a5274ae3193b6383fff2ba9671f4 · llvm/llvm-project · GitHub that you could refer to, otherwise look at the code and inspect the generated IR. The function you pointed to has a “TODO: check access stride” above, which suggests that implementing the check for the stride along the given dimension to be 1 may make it all work together.