Skip to content

Macos Intel crashes on calling linalg.mul #3762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
powerc9000 opened this issue Jun 15, 2024 · 4 comments
Closed

Macos Intel crashes on calling linalg.mul #3762

powerc9000 opened this issue Jun 15, 2024 · 4 comments

Comments

@powerc9000
Copy link
Contributor

powerc9000 commented Jun 15, 2024

Context

Odin: dev-2024-06:02f11dfde
OS: macOS Sonoma 14.4.1 (build: 23F79, kernel: 23.4.0)
CPU: Intel(R) Core(TM) i5-8500B CPU @ 3.00GHz
RAM: 16384 MiB
Backend: LLVM 18.1.6

Calling linalg.mul with a Matrix2 crashes with a EXC_I386_GPFLT (General fault).
Only in -o:non or -debug o:speed does not share the issue.

from the discord it was believed this is because of bad code gen causing a bad stack pointer.

Expected Behavior

Dont crash.

Current Behavior

Crash

Failure Information (for bugs)

Disassembly

main`linalg.matrix_mul_vector-8549:
    0x100007260 <+0>:   movaps %xmm1, -0x58(%rsp)
    0x100007265 <+5>:   movaps %xmm0, -0x48(%rsp)
    0x10000726a <+10>:  movaps %xmm2, -0x38(%rsp)
    0x10000726f <+15>:  movaps -0x38(%rsp), %xmm0
    0x100007274 <+20>:  movaps -0x58(%rsp), %xmm1
    0x100007279 <+25>:  movaps -0x48(%rsp), %xmm2
    0x10000727e <+30>:  movlpd %xmm2, -0x10(%rsp)
    0x100007284 <+36>:  movlpd %xmm1, -0x8(%rsp)
    0x10000728a <+42>:  movlpd %xmm0, -0x18(%rsp)
    0x100007290 <+48>:  movq   $0x0, -0x20(%rsp)
->  0x100007299 <+57>:  movaps -0x10(%rsp), %xmm1
    0x10000729e <+62>:  movsd  -0x8(%rsp), %xmm0
    0x1000072a4 <+68>:  movss  -0x18(%rsp), %xmm3
    0x1000072aa <+74>:  movss  -0x14(%rsp), %xmm2
    0x1000072b0 <+80>:  movsldup %xmm3, %xmm3 ; xmm3 = xmm3[0,0,2,2] 
    0x1000072b4 <+84>:  movsldup %xmm2, %xmm2 ; xmm2 = xmm2[0,0,2,2] 
    0x1000072b8 <+88>:  mulps  %xmm3, %xmm1
    0x1000072bb <+91>:  mulps  %xmm2, %xmm0
    0x1000072be <+94>:  addps  %xmm1, %xmm0
    0x1000072c1 <+97>:  movq   $0x0, -0x28(%rsp)
    0x1000072ca <+106>: movlpd %xmm0, -0x28(%rsp)
    0x1000072d0 <+112>: movss  -0x28(%rsp), %xmm0
    0x1000072d6 <+118>: movss  -0x24(%rsp), %xmm1
    0x1000072dc <+124>: movss  %xmm1, -0x1c(%rsp)
    0x1000072e2 <+130>: movss  %xmm0, -0x20(%rsp)
    0x1000072e8 <+136>: movsd  -0x28(%rsp), %xmm0
    0x1000072ee <+142>: retq   

registers

General Purpose Registers:
       rax = 0x41a0000041a00000
       rbx = 0x0000000100601b90
       rcx = 0x00000001000060a0  main`runtime.default_logger_proc at core.odin:653
       rdx = 0x00007ff80db1aaf0  libsystem_m.dylib`_FE_DFL_DISABLE_SSE_DENORMS_ENV + 7552
       rdi = 0x00007ff7bfefed70
       rsi = 0x00007ff7bfefecc8
       rbp = 0x00007ff7bfefece0
       rsp = 0x00007ff7bfefec78
        r8 = 0x0000000100007bdb  "/odin/base/runtime/entry_unix.odin"
        r9 = 0x0000000000000001
       r10 = 0x0000000000000000
       r11 = 0x0000000000000088
       r12 = 0x00007ff7bfefee20
       r13 = 0x0000000000000000
       r14 = 0x0000000100007080  main`main at entry_unix.odin:50
       r15 = 0x00007ff7bfefefa0
       rip = 0x0000000100007299  main`linalg.matrix_mul_vector-8549 + 57 at general.odin:217:2
    rflags = 0x0000000000010246
        cs = 0x000000000000002b
        fs = 0x0000000000000000
        gs = 0x0000000000000000

Steps to Reproduce

Sample program

package main

import "core:math/linalg"

main :: proc() {
    v1 : linalg.Vector2f32 = {1, 2}
    rot := linalg.matrix2_rotate(f32(20))

    res := linalg.mul(rot, v1)
}
@laytan
Copy link
Collaborator

laytan commented Jun 27, 2024

Looks like an alignment issue with the amd64 sysv ABI, you can see in the following snippet that it is allocating the parameter on align 4 and then loading it as if it is align 16:

define internal void @main.foos(<{ <2 x float>, <2 x float> }> %0, ptr noalias nocapture nonnull %__.context_ptr) {
decls:
  %1 = alloca [4 x float], align 4
  %2 = alloca [4 x float], align 32
  %b = alloca [4 x float], align 32
  br label %entry

entry:                                            ; preds = %decls
  store <{ <2 x float>, <2 x float> }> %0, ptr %1, align 1
  %3 = load <4 x float>, ptr %1, align 16
  %4 = load <4 x float>, ptr %1, align 16

@Feoramund
Copy link
Contributor

Is this still happening? I was able to replicate the original issue for the commit as stated in the report, but when I try the sample program with lldb attached on the latest commit, I no longer get a signal, and I'm able to print the result of the rotation with no issue.

Odin:    dev-2025-05:dd31075c3
OS:      macOS Big Sur 11.7.10 (build 20G1427, kernel 20.6.0)
CPU:     Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
RAM:     16384 MiB
Backend: LLVM 19.1.7

@laytan
Copy link
Collaborator

laytan commented May 18, 2025

May have been fixed with f9b9e9e or some earlier abi fix we've done quite a few

@Kelimion
Copy link
Member

Let's close it as fixed, then. Can always be reopened.

@Kelimion Kelimion removed the stale label May 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants