Add new export LLM config #11028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

facebook-github-bot merged 20 commits into gh/jackzhxng/10/base from gh/jackzhxng/10/head

Jun 10, 2025

Contributor

jackzhxng commented May 21, 2025 •

edited

Loading

Stack from ghstack (oldest at bottom):

Differential Revision: D75263991


          Add new export LLM config

aa54351

[ghstack-poisoned]

jackzhxng requested a review from lucylq as a code owner

May 21, 2025 00:43

pytorch-bot bot commented May 21, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11028

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4372d61 with merge base c2aa614 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jackzhxng mentioned this pull request

Introduce hydra framework with backwards compatibility #11029

Merged

facebook-github-bot added the CLA Signed label

jackzhxng added 3 commits

May 22, 2025 13:24


          Update on "Add new export LLM config"

48c8a19

[ghstack-poisoned]


          Update on "Add new export LLM config"

ca9474c

[ghstack-poisoned]


          Update on "Add new export LLM config"

983ff6d

[ghstack-poisoned]

This was referenced May 22, 2025

Convert args to LlmConfig #11081

Merged

Use llm_config instead of args in export_llama functions #11084

Closed

jackzhxng added 2 commits

May 22, 2025 17:39


          Update on "Add new export LLM config"

8124a32

[ghstack-poisoned]


          Update on "Add new export LLM config"

c69b158

[ghstack-poisoned]

jackzhxng added release notes: api release notes: llm labels

Contributor Author

jackzhxng commented May 23, 2025

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment

Contributor Author

jackzhxng commented May 27, 2025

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jackzhxng mentioned this pull request

Use llm_config instead of args in export_llama functions #11162

Merged

Contributor Author

jackzhxng commented May 27, 2025

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.


          Update on "Add new export LLM config"

5f4c78d

Differential Revision: [D75263991](https://our.internmc.facebook.com/intern/diff/D75263991)

[ghstack-poisoned]

Contributor Author

jackzhxng commented May 27, 2025

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

This was referenced May 27, 2025

Fix test_llama #11165

Closed

refactor: Replace self.args with LlmConfig in model.py and export_llama_lib.py #11166

Closed

fix: add missing LlmConfig import and parameter in export_llama_lib.py #11167

Closed

refactor: Use LlmConfig for model parameters instead of kwargs #11168

Closed

refactor: remove backward compatibility and simplify Llama2Model configuration #11169

Closed

refactor: simplify Llama2Model constructor to take llm_config directly #11170

Closed

Completely remove args from export_llama_lib #11171

Closed

Contributor Author

jackzhxng commented May 27, 2025

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment

Contributor Author

jackzhxng commented May 28, 2025

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.


          Update on "Add new export LLM config"

e630be8

Differential Revision: [D75263991](https://our.internmc.facebook.com/intern/diff/D75263991)

[ghstack-poisoned]

Contributor

facebook-github-bot commented Jun 6, 2025

This pull request was exported from Phabricator. Differential Revision: D75263991

jackzhxng requested a review from larryliu0820

June 6, 2025 07:49


          Update on "Add new export LLM config"

a02693f

Differential Revision: [D75263991](https://our.internmc.facebook.com/intern/diff/D75263991)

[ghstack-poisoned]

jackzhxng added a commit that referenced this pull request


          Add new export LLM config

9c44bdd

Pull Request resolved: #11028




@imported-using-ghimport

Differential Revision: [D75263991](https://our.internmc.facebook.com/intern/diff/D75263991/)
ghstack-source-id: 288636930

Contributor

facebook-github-bot commented Jun 6, 2025

This pull request was exported from Phabricator. Differential Revision: D75263991


          Update on "Add new export LLM config"

3fb93fe

Differential Revision: [D75263991](https://our.internmc.facebook.com/intern/diff/D75263991)

[ghstack-poisoned]

Contributor

facebook-github-bot commented Jun 6, 2025

This pull request was exported from Phabricator. Differential Revision: D75263991


          Update on "Add new export LLM config"

0bf2ea4

Differential Revision: [D75263991](https://our.internmc.facebook.com/intern/diff/D75263991)

[ghstack-poisoned]

Contributor

facebook-github-bot commented Jun 6, 2025

This pull request was exported from Phabricator. Differential Revision: D75263991

larryliu0820 approved these changes

View reviewed changes


          Update on "Add new export LLM config"

4372d61

Differential Revision: [D75263991](https://our.internmc.facebook.com/intern/diff/D75263991)

[ghstack-poisoned]

Contributor

facebook-github-bot commented Jun 9, 2025

This pull request was exported from Phabricator. Differential Revision: D75263991

facebook-github-bot merged commit e46a59a into gh/jackzhxng/10/base

103 of 104 checks passed

facebook-github-bot deleted the gh/jackzhxng/10/head branch

June 10, 2025 03:44

facebook-github-bot temporarily deployed to cherry-pick-bot

June 10, 2025 03:44

— with

GitHub Actions Inactive

pytorchbot mentioned this pull request

Add new export LLM config #11511

Merged

JacobSzwejbka pushed a commit that referenced this pull request


          Add new export LLM config (#11511)

18e9149

This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: #11028 by
@jackzhxng
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/jackzhxng/10/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/jackzhxng/10/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/jackzhxng/10/orig
@diff-train-skip-merge

Co-authored-by: Jack Zhang <[email protected]>

kimishpatel reviewed

View reviewed changes

examples/models/llama/config/llm_config.py

		SMOLLM2 = "smollm2"


		class PreqMode(str, Enum):

Contributor

kimishpatel Jun 10, 2025

worth tagging as deprecated

kimishpatel reviewed

View reviewed changes

examples/models/llama/config/llm_config.py

+                      model_class: Which model to to export.
+                      params: Model parameters, such as n_layers, hidden_size, etc.
+                          If left empty will use defaults specified in model_args.py.
+                      checkpoint: Path to the checkpoint file.

Contributor

kimishpatel Jun 10, 2025

can this be hf path as well?

Contributor Author

jackzhxng Jun 10, 2025

No, but at the moment if you specify a non-llama model_class and don't specify checkpoint, it will download from HF. Worth adding this comment

kimishpatel reviewed

View reviewed changes

examples/models/llama/config/llm_config.py

+                      tokenizer_path: Path to the tokenizer file.
+                      metadata: Json string containing metadata information.
+                          e.g. '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}'
+                      use_lora: Rank of the LoRA, if set to 0 then this means no LoRA. For use with QAT.

Contributor

kimishpatel Jun 10, 2025

lora in our case is really tied to QAT model that we released, right? It is not independently applicable to any model? If so I think we want to tied this to QAT checkpoints specifically for llama3_2

Contributor Author

jackzhxng Jun 10, 2025

Is that the case? If so I'll add something to the post_init to verify and update the comment

kimishpatel reviewed

View reviewed changes

examples/models/llama/config/llm_config.py

+                      metadata: Json string containing metadata information.
+                          e.g. '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}'
+                      use_lora: Rank of the LoRA, if set to 0 then this means no LoRA. For use with QAT.
+                      fairseq2: For legacy internal use cases, this is safe to ignore.

Contributor

kimishpatel Jun 10, 2025

prefix with _ to differentiate between this and other supported params

kimishpatel reviewed

View reviewed changes

examples/models/llama/config/llm_config.py

Comment on lines +74 to +77

+                      preq_mode: Legacy option to specify how prequantized weights are loaded.
+                          Going forward, ExecuTorch supports loading weights prequantized through
+                          TorchAo as-is, without any special handling.
+                      preq_group_size: Legacy option to specify the group size of prequantized weights.

Contributor

kimishpatel Jun 10, 2025

I thik we probably want to couple these together. If you are loading pre-quantized checkpoint that group size cannot be set independently, right? So maybe having a separate dataclass that captures all the params and maps it by name is better. ALthough I presume you have to keep this for BC?

Contributor Author

jackzhxng Jun 10, 2025

Yeah, just keeping for BC. I think ideally we are moving away from preq to tho right?

kimishpatel reviewed

View reviewed changes

examples/models/llama/config/llm_config.py

+              @dataclass
+              class ModelConfig:
+                  """
+                  Configurations not necessarily specific to the model, but are needed to

Contributor

kimishpatel Jun 10, 2025

Then why call it modelconfig?

Contributor

kimishpatel Jun 10, 2025

Maybe call LoweringConfig? or ExportConfig although not all options are export related

Contributor Author

jackzhxng Jun 10, 2025

This is more like: what other modifications do I want to make to the model in eager that aren't specific to the model itself (e.g. NOT checkpoint, model architecture, tokenizer) and can be shared across different models.

kimishpatel reviewed

View reviewed changes

examples/models/llama/config/llm_config.py

+                          doesn't actually have anything to do with the kv_cache at the moment.
+                      expand_rope_table: Temporary workaround to expand sin/cos table in head
+                          dim to take vectorized path in optimized kernels.
+                      use_attention_sink: Whether to use attention sink to support multi-round

Contributor

kimishpatel Jun 10, 2025

I would also considering pruning options that we dont intend to support. attention sink at the moment does not have performant implementation and I would rather hide it somewhere than to expose it. Reduce maintenance burden

kimishpatel reviewed

View reviewed changes

examples/models/llama/config/llm_config.py

+                      max_context_length: Maximum of context for the model to remember.
+                      output_dir: Output dir to save the exported .pte file to.
+                      output_name: File name to override the exported .pte file.
+                      so_library: Shared library to specify custom quantized operators.

Contributor

kimishpatel Jun 10, 2025

I dont think we need so_library anymore. Remove it in follow up

kimishpatel reviewed

View reviewed changes

examples/models/llama/config/llm_config.py

+                      output_dir: Output dir to save the exported .pte file to.
+                      output_name: File name to override the exported .pte file.
+                      so_library: Shared library to specify custom quantized operators.
+                      export_only: Whether to stop right after torch.export() and

Contributor

kimishpatel Jun 10, 2025

was this for debug?

kimishpatel reviewed

View reviewed changes

examples/models/llama/config/llm_config.py

Comment on lines +270 to +271

		XNNPACK_DYNAMIC = "xnnpack_dynamic"
		XNNPACK_DYNAMIC_QC4 = "xnnpack_dynamic_qc4"

Contributor

kimishpatel Jun 10, 2025

we probably can remove these two

kimishpatel reviewed

View reviewed changes

examples/models/llama/config/llm_config.py

Comment on lines +298 to +300

+                      pt2e_quantize: Quantization mode using pt2e, which is an alternative
+                          to TorchAo that uses backend-aware graph mode quantization rather
+                          than source transformation quantization.

Contributor

kimishpatel Jun 10, 2025

I would want to hide these details from users

kimishpatel reviewed

View reviewed changes

examples/models/llama/config/llm_config.py

Comment on lines +377 to +379

+              class CoreMLQuantize(str, Enum):
+                  B4W = "b4w"
+                  C4W = "c4w"

Contributor

kimishpatel Jun 10, 2025

maybe favor this pattern over the ones in pt2e_quantizer

kimishpatel reviewed

View reviewed changes

examples/models/llama/config/test_llm_config.py

+              class TestValidConstruction(unittest.TestCase):
+                  def test_valid_llm_config(self):
+                      LlmConfig(

Contributor

kimishpatel Jun 10, 2025

are these configs constructible from json? I think that would e the best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed fb-exported release notes: api release notes: llm