Skip to content

Debug discrepancy of avg_inference_latency reported from optimum-et models #11650

Open
@guangy10

Description

@guangy10

🐛 Describe the bug

Image

As highlighted in the screenshot, avg_inference_latency doesn't make sense between the etLLM and optimum-et generated models.

Upon checking the raw results from the CI, I can see the other latency related metrics like generate_time and tokens_per_sec (TPS) are close between etLLM and optimum-et.

The avg_inference_latency is a separate metric that is measured separately from how generate_timean and tokens_per_sec (TPS) are reported (a separate test), and it should just report the latency on forward() call regardless the model is LLM or not. Since the reported generate_time and TPS are very close, I suspect if there is a bug in how the avg_inference_latency are measured, or wired to the dashboard.

Versions

trunk

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions