llama.cppC++api开发入门demo_llama.cppc++api资源-CSDN下载

共300个文件

cpp：51个

txt：48个

py：30个

1星需积分: 5 91 浏览量 2023-11-24 17:38:24 上传评论 2 收藏 6.33MB ZIP 举报

在本文中，我们将深入探讨如何使用`llama.cpp` C++ API进行开发，这是一个与GPT（Generative Pre-trained Transformer）相关的AI模型接口。通过学习这个入门级的demo，你可以了解如何加载`llama`模型并利用它生成回答语句。我们将主要关注以下几个方面： 1. **C++ API基础知识**：在C++编程环境中，API（Application Programming Interface）是一组预先定义的函数、类和常量，用于帮助开发者实现特定功能。`llama.cpp` API提供了与`llama`模型交互的方法，使开发者能够轻松地在C++项目中集成自然语言处理能力。 2. **GPT模型简介**： GPT是一种基于Transformer架构的预训练语言模型，由OpenAI开发。它通过大量文本数据自我学习，能理解和生成人类语言，广泛应用于问答系统、文本生成、对话系统等多个领域。`llama`模型可能是对GPT的一个特定实现或变体。 3. **加载llama模型**：在`llama.cpp` API中，首先你需要加载模型的权重和配置。这通常涉及到初始化一个模型实例，并指定模型的路径。加载模型可能包括解析模型的配置文件、加载权重文件以及设置模型的计算设备（如CPU或GPU）。 4. **使用API生成回答**：加载模型后，你可以使用API提供的接口输入一个问题或提示，让模型生成相应的回答。这通常涉及调用一个`generate`或`predict`方法，传入你的输入文本，并可能设置一些生成策略，如最大生成长度、温度参数等。 5. **代码结构**： `llamacpp_starter`文件可能包含一个简单的示例程序，展示了如何初始化模型、构建输入上下文、调用模型生成回答，以及输出结果。通过分析这个示例，你可以了解到如何将这些概念应用到实际代码中。 6. **错误处理与调试**：在开发过程中，理解API的错误处理机制至关重要。当模型加载失败或生成过程出错时，API通常会抛出异常。学习如何捕获和处理这些异常是确保程序稳定运行的关键。 7. **性能优化**：对于大型语言模型，内存管理和计算效率都是需要考虑的问题。了解如何在C++环境中有效地管理模型资源，以及如何通过批处理、缓存等手段提高模型的运行速度，可以帮助你优化应用程序的性能。 8. **扩展应用**：除了基础的问答功能，`llama.cpp` API可能还支持其他NLP任务，如文本分类、情感分析等。探索API文档和示例，可以发现更多的应用场景。 9. **持续学习与社区支持**：跟踪`llama`模型和`llama.cpp` API的最新更新，参与相关的技术社区讨论，可以帮助你获取最新的信息和技巧，解决遇到的问题。通过以上步骤，你应该能够逐步掌握`llama.cpp` API的使用，实现自己的AI应用。不断实践和深入学习，你将在AI开发领域更加熟练和自信。

资源推荐

资源详情

资源评论

收起资源包目录

llama.cpp C++ api开发入门demo （300个子文件）

chat-13B.bat 2KB

ggml.c 615KB

ggml-quants.c 284KB

ggml-backend.c 35KB

ggml-alloc.c 27KB

ggml-mpi.c 7KB

test-c.c 38B

.clang-tidy 751B

cloud-v-pipeline 1KB

FindSIMD.cmake 3KB

build-info.cmake 2KB

llama.cpp 359KB

finetune.cpp 91KB

server.cpp 91KB

ggml-opencl.cpp 69KB

train.cpp 65KB

baby-llama.cpp 61KB

common.cpp 59KB

train-text-from-scratch.cpp 59KB

test-grad0.cpp 54KB

clip.cpp 38KB

llama-bench.cpp 36KB

convert-llama2c-to-ggml.cpp 36KB

main.cpp 34KB

infill.cpp 28KB

perplexity.cpp 28KB

grammar-parser.cpp 17KB

speculative.cpp 16KB

console.cpp 16KB

quantize-stats.cpp 16KB

parallel.cpp 15KB

export-lora.cpp 15KB

test-quantize-perf.cpp 14KB

vdot.cpp 13KB

llava-cli.cpp 12KB

test-llama-grammar.cpp 11KB

benchmark-matmult.cpp 10KB

test-grammar-parser.cpp 8KB

sampling.cpp 8KB

gguf.cpp 8KB

batched.cpp 7KB

test-tokenizer-0-llama.cpp 7KB

batched-bench.cpp 7KB

test-tokenizer-0-falcon.cpp 7KB

quantize.cpp 7KB

test-sampling.cpp 7KB

test-rope.cpp 6KB

beam-search.cpp 6KB

llava.cpp 6KB

test-quantize-fns.cpp 6KB

q8dot.cpp 5KB

test-opt.cpp 5KB

simple.cpp 5KB

save-load-state.cpp 5KB

main.cpp 4KB

test-tokenizer-1-bpe.cpp 4KB

test-tokenizer-1-llama.cpp 3KB

metal.cpp 3KB

embedding.cpp 3KB

test-double-float.cpp 2KB

tokenize.cpp 1KB

build-info.cpp 161B

qasheet.csv 16KB

ggml-cuda.cu 311KB

full-rocm.Dockerfile 979B

main-rocm.Dockerfile 969B

main-cuda.Dockerfile 776B

full-cuda.Dockerfile 742B

full.Dockerfile 371B

main.Dockerfile 283B

.dockerignore 147B

.ecrc 46B

.editorconfig 395B

.editorconfig 12B

.flake8 31B

c.gbnf 1KB

json_arr.gbnf 790B

json.gbnf 595B

chess.gbnf 565B

japanese.gbnf 249B

arithmetic.gbnf 177B

list.gbnf 109B

ggml-vocab-aquila.gguf 4.6MB

ggml-vocab-falcon.gguf 2.43MB

ggml-vocab-gpt-neox.gguf 1.69MB

ggml-vocab-mpt.gguf 1.69MB

ggml-vocab-stablelm-3b-4e1t.gguf 1.69MB

ggml-vocab-refact.gguf 1.64MB

ggml-vocab-starcoder.gguf 1.64MB

ggml-vocab-baichuan.gguf 1.28MB

ggml-vocab-llama.gguf 707KB

.gitignore 1KB

.gitignore 388B

.gitignore 173B

stb_image.h 313KB

httplib.h 282KB

ggml.h 78KB

unicode.h 46KB

llama.h 35KB

log.h 24KB

共 300 条

# llama.cpp ![llama](https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT) [Roadmap](https://github.com/users/ggerganov/projects/7) / [Project status](https://github.com/ggerganov/llama.cpp/discussions/3471) / [Manifesto](https://github.com/ggerganov/llama.cpp/discussions/205) / [ggml](https://github.com/ggerganov/ggml) Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++ ### Hot topics - *No hot topics atm. Open to suggestions about what is hot today* ---- <details> <summary>Table of Contents</summary> <ol> <li> <a href="#description">Description</a> </li> <li> <a href="#usage">Usage</a> <ul> <li><a href="#get-the-code">Get the Code</a></li> <li><a href="#build">Build</a></li> <li><a href="#blas-build">BLAS Build</a></li> <li><a href="#prepare-data--run">Prepare Data & Run</a></li> <li><a href="#memorydisk-requirements">Memory/Disk Requirements</a></li> <li><a href="#quantization">Quantization</a></li> <li><a href="#interactive-mode">Interactive mode</a></li> <li><a href="#constrained-output-with-grammars">Constrained output with grammars</a></li> <li><a href="#instruction-mode-with-alpaca">Instruction mode with Alpaca</a></li> <li><a href="#using-openllama">Using OpenLLaMA</a></li> <li><a href="#using-gpt4all">Using GPT4All</a></li> <li><a href="#using-pygmalion-7b--metharme-7b">Using Pygmalion 7B & Metharme 7B</a></li> <li><a href="#obtaining-the-facebook-llama-original-model-and-stanford-alpaca-model-data">Obtaining the Facebook LLaMA original model and Stanford Alpaca model data</a></li> <li><a href="#verifying-the-model-files">Verifying the model files</a></li> <li><a href="#seminal-papers-and-background-on-the-models">Seminal papers and background on the models</a></li> <li><a href="#perplexity-measuring-model-quality">Perplexity (measuring model quality)</a></li> <li><a href="#android">Android</a></li> <li><a href="#docker">Docker</a></li> </ul> </li> <li><a href="#contributing">Contributing</a></li> <li><a href="#coding-guidelines">Coding guidelines</a></li> <li><a href="#docs">Docs</a></li> </ol> </details> ## Description The main goal of `llama.cpp` is to run the LLaMA model using 4-bit integer quantization on a MacBook - Plain C/C++ implementation without dependencies - Apple silicon first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks - AVX, AVX2 and AVX512 support for x86 architectures - Mixed F16 / F32 precision - 2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit integer quantization support - CUDA, Metal and OpenCL GPU backend support The original implementation of `llama.cpp` was [hacked in an evening](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022). Since then, the project has improved significantly thanks to many contributions. This project is mainly for educational purposes and serves as the main playground for developing new features for the [ggml](https://github.com/ggerganov/ggml) library. **Supported platforms:** - [X] Mac OS - [X] Linux - [X] Windows (via CMake) - [X] Docker **Supported models:** - [X] LLaMA ð¦ - [x] LLaMA 2 ð¦ð¦ - [X] Falcon - [X] [Alpaca](https://github.com/ggerganov/llama.cpp#instruction-mode-with-alpaca) - [X] [GPT4All](https://github.com/ggerganov/llama.cpp#using-gpt4all) - [X] [Chinese LLaMA / Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) and [Chinese LLaMA-2 / Alpaca-2](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) - [X] [Vigogne (French)](https://github.com/bofenghuang/vigogne) - [X] [Vicuna](https://github.com/ggerganov/llama.cpp/discussions/643#discussioncomment-5533894) - [X] [Koala](https://bair.berkeley.edu/blog/2023/04/03/koala/) - [X] [OpenBuddy ð¶ (Multilingual)](https://github.com/OpenBuddy/OpenBuddy) - [X] [Pygmalion/Metharme](#using-pygmalion-7b--metharme-7b) - [X] [WizardLM](https://github.com/nlpxucan/WizardLM) - [X] [Baichuan 1 & 2](https://huggingface.co/models?search=baichuan-inc/Baichuan) + [derivations](https://huggingface.co/hiyouga/baichuan-7b-sft) - [X] [Aquila 1 & 2](https://huggingface.co/models?search=BAAI/Aquila) - [X] [Starcoder models](https://github.com/ggerganov/llama.cpp/pull/3187) - [X] [Mistral AI v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) - [X] [Refact](https://huggingface.co/smallcloudai/Refact-1_6B-fim) - [X] [Persimmon 8B](https://github.com/ggerganov/llama.cpp/pull/3410) - [X] [MPT](https://github.com/ggerganov/llama.cpp/pull/3417) - [X] [Bloom](https://github.com/ggerganov/llama.cpp/pull/3553) - [X] [StableLM-3b-4e1t](https://github.com/ggerganov/llama.cpp/pull/3586) **Bindings:** - Python: [abetlen/llama-cpp-python](https://github.com/abetlen/llama-cpp-python) - Go: [go-skynet/go-llama.cpp](https://github.com/go-skynet/go-llama.cpp) - Node.js: [withcatai/node-llama-cpp](https://github.com/withcatai/node-llama-cpp) - Ruby: [yoshoku/llama_cpp.rb](https://github.com/yoshoku/llama_cpp.rb) - Rust: [mdrokz/rust-llama.cpp](https://github.com/mdrokz/rust-llama.cpp) - C#/.NET: [SciSharp/LLamaSharp](https://github.com/SciSharp/LLamaSharp) - Scala 3: [donderom/llm4s](https://github.com/donderom/llm4s) - Clojure: [phronmophobic/llama.clj](https://github.com/phronmophobic/llama.clj) - React Native: [mybigday/llama.rn](https://github.com/mybigday/llama.rn) - Java: [kherud/java-llama.cpp](https://github.com/kherud/java-llama.cpp) **UI:** - [nat/openplayground](https://github.com/nat/openplayground) - [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) - [withcatai/catai](https://github.com/withcatai/catai) --- Here is a typical run using LLaMA v2 13B on M2 Ultra: ```java $ make -j && ./main -m models/llama-13b-v2/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e I llama.cpp build info: I UNAME_S: Darwin I UNAME_P: arm I UNAME_M: arm64 I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -pthread -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE I CXXFLAGS: -I. -I./common -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -DGGML_USE_K_QUANTS I LDFLAGS: -framework Accelerate I CC: Apple clang version 14.0.3 (clang-1403.0.22.14.1) I CXX: Apple clang version 14.0.3 (clang-1403.0.22.14.1) make: Nothing to be done for `default'. main: build = 1041 (cf658ad) main: seed = 1692823051 llama_model_loader: loaded meta data with 16 key-value pairs and 363 tensors from models/llama-13b-v2/ggml-model-q4_0.gguf (version GGUF V1 (latest)) llama_model_loader: - type f32: 81 tensors llama_model_loader: - type q4_0: 281 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_print_meta: format = GGUF V1 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 4096 llm_load_print_meta: n_ctx = 512 llm_load_print_meta: n_embd = 5120 llm_load_print_meta: n_head = 40 llm_load_print_meta: n_head_kv = 40 llm_load_print_meta: n_layer = 40 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: f_norm_eps = 1.0e-05 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: n_ff = 13824 llm_load_print_meta: freq_base = 10000.0 llm_load_print_meta: freq_scale = 1 llm_load_print_meta: model type = 13B llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: m

评论收藏

内容反馈