Skip to content

Add new export LLM config #11028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jun 10, 2025
Merged

Conversation

jackzhxng
Copy link
Contributor

@jackzhxng jackzhxng commented May 21, 2025

[ghstack-poisoned]
@jackzhxng jackzhxng requested a review from lucylq as a code owner May 21, 2025 00:43
Copy link

pytorch-bot bot commented May 21, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11028

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4372d61 with merge base c2aa614 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 21, 2025
@jackzhxng jackzhxng added release notes: api Changes to public facing apis (any interfaces, pybinded runtime methods, etc.) release notes: llm To capture llm specific changes in release notes labels May 23, 2025
@jackzhxng
Copy link
Contributor Author

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@jackzhxng
Copy link
Contributor Author

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jackzhxng
Copy link
Contributor Author

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jackzhxng
Copy link
Contributor Author

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jackzhxng
Copy link
Contributor Author

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@jackzhxng
Copy link
Contributor Author

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75263991

@jackzhxng jackzhxng requested a review from larryliu0820 June 6, 2025 07:49
jackzhxng added a commit that referenced this pull request Jun 6, 2025
Pull Request resolved: #11028




@imported-using-ghimport

Differential Revision: [D75263991](https://our.internmc.facebook.com/intern/diff/D75263991/)
ghstack-source-id: 288636930
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75263991

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75263991

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75263991

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75263991

@facebook-github-bot facebook-github-bot merged commit e46a59a into gh/jackzhxng/10/base Jun 10, 2025
103 of 104 checks passed
@facebook-github-bot facebook-github-bot deleted the gh/jackzhxng/10/head branch June 10, 2025 03:44
JacobSzwejbka pushed a commit that referenced this pull request Jun 10, 2025
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: #11028 by
@jackzhxng
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/jackzhxng/10/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/jackzhxng/10/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/jackzhxng/10/orig
@diff-train-skip-merge

Co-authored-by: Jack Zhang <[email protected]>
SMOLLM2 = "smollm2"


class PreqMode(str, Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth tagging as deprecated

model_class: Which model to to export.
params: Model parameters, such as n_layers, hidden_size, etc.
If left empty will use defaults specified in model_args.py.
checkpoint: Path to the checkpoint file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be hf path as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but at the moment if you specify a non-llama model_class and don't specify checkpoint, it will download from HF. Worth adding this comment

tokenizer_path: Path to the tokenizer file.
metadata: Json string containing metadata information.
e.g. '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}'
use_lora: Rank of the LoRA, if set to 0 then this means no LoRA. For use with QAT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lora in our case is really tied to QAT model that we released, right? It is not independently applicable to any model? If so I think we want to tied this to QAT checkpoints specifically for llama3_2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the case? If so I'll add something to the post_init to verify and update the comment

metadata: Json string containing metadata information.
e.g. '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}'
use_lora: Rank of the LoRA, if set to 0 then this means no LoRA. For use with QAT.
fairseq2: For legacy internal use cases, this is safe to ignore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix with _ to differentiate between this and other supported params

Comment on lines +74 to +77
preq_mode: Legacy option to specify how prequantized weights are loaded.
Going forward, ExecuTorch supports loading weights prequantized through
TorchAo as-is, without any special handling.
preq_group_size: Legacy option to specify the group size of prequantized weights.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thik we probably want to couple these together. If you are loading pre-quantized checkpoint that group size cannot be set independently, right? So maybe having a separate dataclass that captures all the params and maps it by name is better. ALthough I presume you have to keep this for BC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just keeping for BC. I think ideally we are moving away from preq to tho right?

@dataclass
class ModelConfig:
"""
Configurations not necessarily specific to the model, but are needed to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why call it modelconfig?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call LoweringConfig? or ExportConfig although not all options are export related

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more like: what other modifications do I want to make to the model in eager that aren't specific to the model itself (e.g. NOT checkpoint, model architecture, tokenizer) and can be shared across different models.

doesn't actually have anything to do with the kv_cache at the moment.
expand_rope_table: Temporary workaround to expand sin/cos table in head
dim to take vectorized path in optimized kernels.
use_attention_sink: Whether to use attention sink to support multi-round
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also considering pruning options that we dont intend to support. attention sink at the moment does not have performant implementation and I would rather hide it somewhere than to expose it. Reduce maintenance burden

max_context_length: Maximum of context for the model to remember.
output_dir: Output dir to save the exported .pte file to.
output_name: File name to override the exported .pte file.
so_library: Shared library to specify custom quantized operators.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need so_library anymore. Remove it in follow up

output_dir: Output dir to save the exported .pte file to.
output_name: File name to override the exported .pte file.
so_library: Shared library to specify custom quantized operators.
export_only: Whether to stop right after torch.export() and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this for debug?

Comment on lines +270 to +271
XNNPACK_DYNAMIC = "xnnpack_dynamic"
XNNPACK_DYNAMIC_QC4 = "xnnpack_dynamic_qc4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably can remove these two

Comment on lines +298 to +300
pt2e_quantize: Quantization mode using pt2e, which is an alternative
to TorchAo that uses backend-aware graph mode quantization rather
than source transformation quantization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would want to hide these details from users

Comment on lines +377 to +379
class CoreMLQuantize(str, Enum):
B4W = "b4w"
C4W = "c4w"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe favor this pattern over the ones in pt2e_quantizer

class TestValidConstruction(unittest.TestCase):

def test_valid_llm_config(self):
LlmConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these configs constructible from json? I think that would e the best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported release notes: api Changes to public facing apis (any interfaces, pybinded runtime methods, etc.) release notes: llm To capture llm specific changes in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[etLLM] New config system to export_llama
5 participants