Skip to content

Shared memory for multiple entry points with XNNPACK delegate #11738

@Olgacmt

Description

@Olgacmt

🚀 The feature, motivation and pitch

The feature, motivation and pitch

We have a set of production models written in PyTorch and exported to mobile using TorchScript with the XNNPACK delegate. These models:

  • Have multiple entry points (e.g., forward, decode, infer)
  • Share internal state (e.g., buffers like self.hidden)
  • Use custom operators
  • Are expected to share memory between entry points at runtime

With Executorch 0.6.0, it is currently not possible to export these models cleanly with support for shared state and multiple entry points when using the XNNPACK delegate.

✅ What currently works (badly):

We can sort-of simulate shared memory by:

  • Lifting state tensors out of the module using memory planning,
  • Giving them fixed buffer IDs,
  • And manually ensuring the runtime binds those buffers to the same address.

This is error-prone, brittle, and not user-friendly at all. It works, but it’s not scalable or robust — especially with models that have delegate compatibility requirements or custom ops.

💣 What’s broken:

  • No first-class support for multiple entry points during export.
  • No clean UX for declaring shared memory/state across entry points.
  • Delegates like XNNPACK or CoreML can’t currently share tensor state across entry points without hacks.
  • Overriding forward() as a workaround for multi-entry export is extremely fragile. If the aliased function calls forward internally, it breaks in dumb ways.

🧠 Why this matters:

This breaks real deployment scenarios. Without proper shared state handling:

  • Delegates reallocate or reset state per entry point
  • You can’t maintain persistent buffers across calls (e.g., RNN hidden states)
  • Exported models can't reuse memory, killing performance on mobile
  • You can’t export realistic multi-stage or streaming inference pipelines

💡 What we propose:

At minimum:

  • Provide a default memory planner utility that allows you to declare shared buffers by name (e.g. "hidden_state", "cache"), and automatically lifts + aligns them to shared IDs
  • Longer term: Add first-class multi-entry point support in export, and expose shared buffer semantics to delegates in a clean, non-hacky way
  • Explore delegate-level shared context support so operators that span entry points (e.g., XNNPACK-conv using the same weights) don’t get re-initialized unnecessarily

This is currently being actively investigated on the delegate side, but the lack of upstream support in export/memory planning makes it hard to build consistent tooling around this.

Alternatives

No response

Additional context

Existing work and conversations regarding these issues:
#9012
#8030
#10144
#8870
#7458

RFC (Optional)

No response

cc @digantdesai @mcr229 @cbilgin

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: xnnpackIssues related to xnnpack delegation and the code under backends/xnnpack/triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions