-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Draft: #1776 making bos and eos available for user input
#1986
opened Jun 24, 2023 by
HashemAlsaket
•
Draft
Added Arbitrary mixed quantization
Less than 4 bits
Efforts related to viable quantized models using <4 bits
research 🔬
#1834
opened Jun 13, 2023 by
Milkdrop
Loading…
Create run.py
enhancement
New feature or request
obsolete?
Marker for potentially obsolete PR
python
python script changes
Review Complexity : Low
Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
script
Script related
#1204
opened Apr 27, 2023 by
jdpsl
Loading…
Add a option to force the token end of text apears even on interative, and also shows loading porcentage
#1058
opened Apr 19, 2023 by
jeffersoncgo
Loading…
Add command mode to interactive mode.
enhancement
New feature or request
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
#1022
opened Apr 17, 2023 by
wbpxre150
Loading…
Run several single thread operators parellel
threading
Parallel processing and thread management
#850
opened Apr 8, 2023 by
howard0su
Loading…
Q4_0 scale selection using RMSE
enhancement
New feature or request
Less than 4 bits
Efforts related to viable quantized models using <4 bits
research 🔬
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
Optimize locking behavior
threading
Parallel processing and thread management
#813
opened Apr 6, 2023 by
janekb04
Loading…
Add "-e"/"--eval-threads" to distinguish thread counts for single-token eval and prompt eval
threading
Parallel processing and thread management
#744
opened Apr 3, 2023 by
MagisterLuddite
•
Draft
Previous Next
ProTip!
What’s not been updated in a month: updated:<2025-10-05.