Fail to reproduce the perplexity of Llama-2 7B on wikitext

Hi, when I use the command for evaluating Llama-2 7B on wikitext2:
lm_eval --model hf     --model_args pretrained=meta-llama/Llama-2-7b-hf     --tasks wikitext     --device cuda:0     --batch_size 1
The result is 
![image](https://github.com/user-attachments/assets/6f9385e5-ca6b-4d5a-a0db-aaeb573978b4)
However, the fp16 result I saw in many papers is 5.47. Another confusing point is that the other tasks like piqa, winogrande, arc-e, arc-c ... I can get the exact same results as the papers reported. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fail to reproduce the perplexity of Llama-2 7B on wikitext #2301

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fail to reproduce the perplexity of Llama-2 7B on wikitext #2301

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions