You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, when I use the command for evaluating Llama-2 7B on wikitext2:
lm_eval --model hf --model_args pretrained=meta-llama/Llama-2-7b-hf --tasks wikitext --device cuda:0 --batch_size 1
The result is
However, the fp16 result I saw in many papers is 5.47. Another confusing point is that the other tasks like piqa, winogrande, arc-e, arc-c ... I can get the exact same results as the papers reported. Thanks!