aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorflu0r1ne <flu0r1ne@flu0r1ne.net>2023-11-01 20:46:01 -0500
committerflu0r1ne <flu0r1ne@flu0r1ne.net>2023-11-01 20:46:01 -0500
commitaf5a2996234768921b81d96ffaae00cb88229862 (patch)
tree5b2a688582652fc8080616ccc0de162198aa8ee0 /README.md
downloadmyllama2-af5a2996234768921b81d96ffaae00cb88229862.tar.xz
myllama2-af5a2996234768921b81d96ffaae00cb88229862.zip
Initial commitHEADmain
Diffstat (limited to 'README.md')
-rw-r--r--README.md31
1 files changed, 31 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..bc573b4
--- /dev/null
+++ b/README.md
@@ -0,0 +1,31 @@
+MyLlama2
+========
+
+This is a ground-up reimplementation of the Llama 2 family of large language models.
+It adheres to the exact same architecture, based on a decoder-only transformer model
+equipped with Group-Query Attention (GQA), key-value caching, SwiGLU feedforward layers,
+and SentencePiece embeddings. It is compatible with the original Llama 2 weights.
+Unlike Meta's model, this implementation does not incorporate parallel layers or any
+distributed processing APIs. Consequently, it can only run on a single GPU and is also
+capable of running on a CPU without the need for special tools (e.g., `torchrun`, `mpi`, etc.).
+
+This model was created for demonstration purposes, with the intent of sharing it with the
+community. During its development, I identified a few minor issues in FAIR's
+implementation, which I plan to contribute back through pull requests. I believe this
+implementation is more accessible for those new to AI, and I've included references to the papers
+where these concepts were first introduced. However, this code has not been extensively reviewed.
+For production projects, I recommend starting with [Meta's implementation](https://github.com/facebookresearch/llama).
+For high-performance CPU-only inference, consider compiling
+[llama.cpp](https://github.com/ggerganov/llama.cpp) while targeting the native architecture.
+
+Example usage:
+--------------
+
+```bash
+python inference_example.py \
+ llama/llama-2-7b \
+ ./tokenizer.model \
+ --top_p 0.8 \
+ --max_generation_length 100 \
+ --context "Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal."
+```