Open
Description
🐛 Describe the bug
As highlighted in the screenshot, avg_inference_latency
doesn't make sense between the etLLM and optimum-et generated models.
Upon checking the raw results from the CI, I can see the other latency related metrics like generate_time
and tokens_per_sec (TPS)
are close between etLLM and optimum-et.
The avg_inference_latency
is a separate metric that is measured separately from how generate_time
an and tokens_per_sec (TPS)
are reported (a separate test), and it should just report the latency on forward()
call regardless the model is LLM or not. Since the reported generate_time
and TPS
are very close, I suspect if there is a bug in how the avg_inference_latency
are measured, or wired to the dashboard.
Versions
trunk
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
In Progress