æ¾ÂÂ为ç¨ÂÃ¥ÂÂæÂÂéÂÂæÂÂæ¯社åºé¦ÂÃ¥ÂÂç¾约æÂÂç« ï¼Â30天å ç¦Âæ¢转载ï¼Â30天åÂÂæªè·æÂÂæÂÂç¦Âæ¢转载ï¼Âä¾µæÂÂ忠究ï¼Â
éÂÂç ChatGPT è¿ éÂÂçÂÂç«ï¼Âå¼Âé¢ÂåºäºÂTransformeræ¶æÂÂçÂÂ大模åÂÂä»Âå¹ÂÃ¥ÂÂèµ°å°å°åÂÂãÂÂä½ ChatGPT çÂÂæÂÂÃ¥ÂÂ并ä¸Âæ¯ä¸Âè¹´èÂÂå°±ï¼ÂèÂÂæ¯ï¼Âç»Âè¿ÂäºÂä»Âæ©æÂÂç GPT1 å° GPT2ï¼Âä¹ÂÃ¥ÂÂå° GPT3 å InstructGPTãÂÂç¶åÂÂå°GPT3.5Ã¥ÂÂChatGPTï¼Âç´å°å¦Âä»ÂçÂÂå¤Â模æÂÂ大模å GPT4ãÂÂ
ä½Âæ¯ GPT3 ä¹ÂÃ¥ÂÂçÂÂä¸Âç³»åÂÂå·¥ä½Âï¼ÂOpenAI并没æÂÂå¼ÂæºÂ堶模åÂÂï¼Âå æ¤ï¼ÂæÂÂ们没åÂÂæ³ÂÃ¥Âȏª己çÂÂÃ¥ÂÂæÂÂå ¶èÂÂÃ¥ÂÂçÂÂæºçÂÂãÂÂä½Âæ¯ï¼Âä½Â为 GPT ç³»åÂÂçÂÂé¼»ç¥Âä¹Âä¸Âï¼ÂGPT2 å´æ¯å¼ÂæºÂçÂÂï¼Âå æ¤ï¼Âæ¾ÂÂå°Â使ç¨ Megatron-LM éÂÂ对 GPT2 模åÂÂè¿Âè¡Âé¢Âè®Âç»Âï¼Â为äºÂä¸Âå½±åÂÂæÂÂç« çÂÂé 读ä½ÂéªÂï¼Âå ·ä½ÂçÂÂèÂÂæÂŒÂÂ代ç ÂÃ¥ÂÂæ¾置å¨GitHubï¼Âllm-actionãÂÂ
è¿Âè¡Âç¯å¢ÂæÂÂ建
åºç¡Âç¯å¢Âé Âç½®å¦Âä¸Âï¼Â
- æÂÂä½Âç³»ç»Â: Ubuntu 18.04
- CPUs: Ã¥ÂÂ个èÂÂç¹堷æ 384GB å åÂÂç Intel CPUï¼Âç©çÂÂCPU个æ°为2ï¼Âæ¯Âé¢ÂCPUæ ¸æ°为20
- GPUs: 4 å¡ A800 80GB GPUs
- Python: 3.8.10
- NVIDIA驱å¨ç¨ÂåºÂçÂÂæ¬: 525.105.17ï¼Âæ ¹æ®ä¸ÂÃ¥ÂÂÃ¥ÂÂå·éÂÂæ©ä¸ÂÃ¥ÂÂçÂÂ驱å¨ç¨ÂåºÂï¼Âç¹å»ä¸Âè½½ãÂÂ
- CUDA工堷å : 12.1ï¼Âç¹å»ä¸Âè½½
为äºÂè½å¤Âå¿«éÂÂå¤Âç° GPT2 çÂÂæ´个é¢Âè®Âç»Âè¿Âç¨Âï¼Âæ¾ÂÂéÂÂæ©åºäºÂè±ä¼Âè¾¾å®Âæ¹æÂÂä¾Âç Doker éÂÂÃ¥ÂÂæÂ¥æÂÂ建è¿Âè¡Âç¯å¢ÂãÂÂ
é¦Âå Âï¼Âä»Âè±ä¼Âè¾¾å®Âæ¹ä¸Â载对åºÂçÂÂæÂÂPytorchéÂÂÃ¥ÂÂãÂÂ
docker pull nvcr.io/nvidia/pytorch:23.04-py3
éÂÂÃ¥ÂÂä¸Âè½½å®ÂæÂÂä¹ÂÃ¥ÂÂï¼ÂÃ¥ÂÂ建è®Âç»Âç¯å¢ÂçÂÂ容å¨ãÂÂ
docker run -dt --name nvidia_pytorch_env --restart=always --gpus all \
--network=host \
--shm-size 4G \
-v /home/gdong/workspace:/workspace \
-w /workspace \
nvcr.io/nvidia/pytorch:23.04-py3 \
/bin/bash
ä¹ÂÃ¥ÂÂï¼Âè¿Â堥容å¨åÂÂå¤Â代ç ÂãÂÂ模åÂÂãÂÂæ°æ®çÂÂãÂÂ
docker exec -it nvidia_pytorch_env bash
代ç ÂÃ¥ÂÂå¤Â
ä¸Âè½½ Megatron-LM æºÂç Âï¼Âç¶åÂÂï¼ÂÃ¥ÂÂæ¢å°对åºÂç commitidï¼Â
git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
git checkout 992da75
模åÂÂæÂÂéÂÂÃ¥ÂÂè¯Â表åÂÂå¤Â
ä¸Âè½½GPT2æÂÂéÂÂï¼Â
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/megatron_lm_345m/versions/v0.0/zip -O megatron_lm_345m_v0.0.zip
解åÂÂä¹ÂÃ¥ÂÂçÂÂæÂÂ件格å¼Âå¦Âä¸ÂæÂÂ示ï¼Â
> tree -h megatron
megatron
âÂÂâÂÂâ [ 8] latest_checkpointed_iteration.txt
âÂÂâÂÂâ [4.0K] release
âÂÂâÂÂâ [4.0K] mp_rank_00
âÂÂâÂÂâ [677M] model_optim_rng.pt
2 directories, 2 files
> cat megatron/latest_checkpointed_iteration.txt
release
ä¸Âè½½GPT2è¯Â表ï¼Â
https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json
https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt
æ°æ®éÂÂÃ¥ÂÂå¤Â
Megatron-LM è®Âç» GPT2 å©ç¨亠jcpeterson å eukaryote31 çÂÂå ¬å¼Âå¯ç¨ç OpenWebText åºÂæÂ¥ä¸Âè½½ URLã ç¶åÂÂï¼Âæ ¹æ® openwebtext ç®å½Âä¸ÂæÂÂè¿°çÂÂè¿Âç¨Â对æÂÂæÂÂä¸Âè½½çÂÂå 容è¿Âè¡Âè¿Â滤ãÂÂ渠çÂÂÃ¥ÂÂéÂÂå¤Âæ°æ®å é¤ã 根æ®æªè³ 2018 å¹´ 10 æ Reddit URL 对åºÂçÂÂå 容ï¼Âå¾Âå°äºÂ大约 37GB çÂÂå 容ãÂÂ
ä¸Âé¢根æ® Megatron-LM 丠openwebtext æÂÂæ¡£åÂÂå¤Âè®Âç»Âæ°æ®ãÂÂ
é¦Âå Âï¼Âå®Â裠ä¾ÂèµÂåºÂãÂÂ
pip install ftfy langdetect numpy torch pandas nltk sentencepiece boto3 tqdm regex bs4 newspaper3k htmlmin tldextract -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn
ç¶åÂÂï¼Âå®Â裠LSHãÂÂ
git clone https://github.com/mattilyra/LSH
cd LSH
git checkout a57069b
python setup.py install
ç±äºÂï¼Âè¿ÂéÂÂ使ç¨çÂÂPythonçÂÂæ¬为3.8.10ï¼ÂÃ¥ÂÂå¨ä¸Â堼容çÂÂé®é¢Âï¼Âå®Â裠çÂÂæ¶åÂÂä¼ÂæÂ¥éÂÂï¼ÂæÂÂç §æÂÂ示è¿Âè¡Âä¿®æ¹å³å¯ãÂÂ
ä¿®æ¹lsh/cMinhash.cpp
æÂÂ件ï¼Â
- å°Âexc_typeæ¹为curexc_type
- å°Âexc_valueæ¹为curexc_value
- å°Âexc_tracebackæ¹为curexc_traceback
å®Â裠å®ÂæÂÂä¹ÂÃ¥ÂÂï¼Âä¸Âé¢仠jcpeterson ä¸Âè½½åÂȎÂÂÃ¥ÂÂç URLï¼Âæ¾置å¨urlsç®å½Âä¸Âï¼Âç±äºÂæÂÂ件太å¤Âè¿ÂéÂÂä» ä¸Âè½½ä¸Â个URLæÂÂ件ç¨äºÂæ¼Â示ãÂÂ
> mkdir urls
> tree -h urls/
urls/
âÂÂâÂÂâ [5.3M] RS_2011-01.bz2.deduped.txt
0 directories, 1 file
ç¶åÂÂï¼Âå é¤åÂÂå ¥é»ÂÃ¥ÂÂÃ¥ÂÂç URLãÂÂ
# python blacklist_urls.py <path to the downloaded deduplicated URLs> <filename for clean urls. e.g. clean_urls.txt>
python3 blacklist_urls.py ./urls clean_urls.txt
# åªä¿ÂÃ¥ÂÂæ¸Â
é¤åÂÂçÂÂÃ¥ÂÂ100个URLãÂÂ
# head -n100 clean_urls.txt >> clean_urls_100.txt
æÂ¥ä¸ÂæÂ¥ï¼Â使ç¨ openwebtext çÂÂå®Âç¨工堷ä»Â渠æ´ÂÃ¥ÂÂç URL ä¸Âè½½å 容ãÂÂ
éÂÂè¦Âä¿®æ¹ä¸Âä¸Âdownload.py
éÂÂé¢çÂÂ--sqlite_meta
Ã¥ÂÂ--save_uncompressed
çÂÂé»Â认å¼ï¼ÂÃ¥ÂÂå«æ¹æÂÂFalse
Ã¥ÂÂTrue
ï¼Âè¿Âæ ·æ§è¡Âpython3 openwebtext/download.py clean_urls.txt
àä¹ÂÃ¥ÂÂå°±ä¼ÂçÂÂæÂÂä¸Â个scraped
æÂÂ件夹ï¼Âæ¯Â个urlä¸Âè½½çÂÂæÂÂæ¬就ä¿ÂÃ¥ÂÂå¨data
Ã¥ÂÂæÂÂ件夹ä¸ÂãÂÂ
# ef42b51
git clone https://github.com/yet-another-account/openwebtext.git
# vim openwebtext/download.py
python3 openwebtext/download.py ./Megatron-LM/tools/openwebtext/clean_urls.txt --output_dir /workspace/code/scraped
ä¸Âè½½å®ÂæÂÂä¹ÂÃ¥ÂÂï¼Âæ ¼å¼Âå¦Âä¸ÂæÂÂ示ï¼Â
> tree -h /workspace/code/scraped
/workspace/code/scraped
âÂÂâÂÂâ [304K] data
â âÂÂâÂÂâ [ 176] 0000300-ab9ff12f7658b8764a413bf58d58bc48b866b0c163ce5c0442296dce46ff0ff8.txt
â â ...
â âÂÂâÂÂâ [ 634] 0009896-6e15400f49434b3dbf9421a8f342f80f26c1e901f78f6350d4b738f58c456bdd.txt
âÂÂâÂÂâ [296K] meta
âÂÂâÂÂâ [ 154] 0001000-ab50f2cd5366369108d58d6e4eb77e8c4babf56e634a33dcd880597684109fc4.json
â ...
âÂÂâÂÂâ [ 224] 0009896-6e15400f49434b3dbf9421a8f342f80f26c1e901f78f6350d4b738f58c456bdd.json
2 directories, 4860 files
æÂÂ件å 容å¦Âä¸Âï¼Â
# meta Ã¥ÂÂæÂÂ件夹åÂÂå¨åÂ
Âæ°æ®
> cat /workspace/code/scraped/meta/0009896-6e15400f49434b3dbf9421a8f342f80f26c1e901f78f6350d4b738f58c456bdd.json
{"url": "http://minnesotaindependent.com/74302/bachmann-says-transportation-projects-shouldnt-count-as-earmarks", "word_count": 73, "elapsed": 3.2160894870758057, "scraper": "newspaper", "domain": "minnesotaindependent.com"}
# data Ã¥ÂÂæÂÂ件夹åÂÂå¨æÂÂæ¾°æ®
> cat /workspace/code/scraped/data/0009896-6e15400f49434b3dbf9421a8f342f80f26c1e901f78f6350d4b738f58c456bdd.txt
Der eigene Bodenwischer ist der wichtigste Begleiter im täglichen Haushalt. Ob für Parkett, Fliesen oder Laminat: Qualität, Ausstattung und Preis spielen bei der Kaufentscheidung eine groÃÂe Rolle.
...
Bodenwischer für â¦
å°ÂdataÃ¥ÂÂæÂÂ件夹çÂÂæÂÂæ¾ÂÂ件åÂÂ并æÂÂä¸Â个jsonæÂÂ件ãÂÂ
python3 Megatron-LM/tools/openwebtext/merge_data.py --data_path /workspace/code/scraped/data --output_file /workspace/data/merged_output.json
Ã¥ÂÂ并åÂÂæÂÂ件格å¼Âå¦Âä¸Âï¼Â
> head -n6 /workspace/data/merged_output.json
{"text": "With every new year, it's murder for Neal Smither and his crew.\n"}
{"text": "\n"}
{"text": "Suicide, too.\n"}
{"text": "\n"}
{"text": "As owner of Crime Scene Cleaners, Smither's job is to clean up the bloody messes left behind when people kill each other or themselves - and those first few weeks after Jan. 1 are his busiest time of year.\n"}
{"text": "\n"}
æ°æ®渠æ´Â
æ§衠ftfyãÂÂè±è¯Âæ£ÂæµÂ并å é¤å°Â亠128 个æ Âè®°çÂÂæÂÂæ¡£ãÂÂ
python3 cleanup_dataset.py /workspace/data/merged_output.json /workspace/data/merged_cleand.json
渠æ´ÂÃ¥ÂÂÃ¥ÂÂæ°æ®对æ¯Âï¼Â
> wc -l merged_output.json
78802 merged_output.json
> wc -l merged_cleand.json
2456 merged_cleand.json
ç¶åÂÂï¼Âshuffle渠æ´ÂÃ¥ÂÂçÂÂæ°æ®éÂÂãÂÂ
shuf /workspace/data/merged_cleand.json -o /workspace/data/train_data.json
æ°æ®é¢Âå¤ÂçÂÂ
æÂ¥ä¸ÂæÂ¥ï¼Âè¿Âè¡Âè®Âç»Âæ°æ®éÂÂè¦Âé¢Âå¤ÂçÂÂãÂÂ
python tools/preprocess_data.py \
--input /workspace/data/train_data.json \
--output-prefix /workspace/data/my-gpt2 \
--vocab-file /workspace/model/gpt2-vocab/gpt2-vocab.json\
--dataset-impl mmap \
--tokenizer-type GPT2BPETokenizer \
--merge-file /workspace/model/gpt2-vocab/gpt2-merges.txt \
--append-eod \
--workers 20 \
--chunk-size 25
è¾ÂåºæÂÂ件åÂÂ为 my-gpt2_text_document.bin å my-gpt2_text_document.idxãÂÂå¨ GPT2 è®Âç»Âæ¶ï¼Â使ç¨ä¸Â带æ©å±ÂÃ¥ÂÂçÂÂÃ¥ÂÂ称ä½Â为 --data-path
ãÂÂ
ç°å¨ï¼ÂæÂÂæÂÂçÂÂÃ¥ÂÂæÂÂå·¥ä½Âé½已ç»ÂÃ¥ÂÂå¤Â好äºÂï¼ÂæÂ¥ä¸ÂæÂ¥å¼Âå§Â模åÂÂè®Âç»ÂãÂÂ
模åÂÂè®Âç»Â
Ã¥ÂÂå¡è®Âç»Â
ä¸Âé¢ï¼Âä¿®æ¹examples/pretrain_gpt.sh
èÂÂæ¬ï¼ÂéÂ
Âç½®æÂÂéÂÂæÂÂ件路å¾Âï¼ÂCHECKPOINT_PATHï¼ÂãÂÂè¯Â表æÂÂ件路å¾Âï¼ÂVOCAB_FILEï¼Âmerge表路å¾Âï¼ÂMERGE_FILEï¼ÂãÂÂæ°æ®éÂÂè·¯å¾Âï¼ÂDATA_PATHï¼ÂçÂÂï¼Â
#!/bin/bash
# Runs the "345M" parameter model
export CUDA_DEVICE_MAX_CONNECTIONS=1
CHECKPOINT_PATH=/workspace/model/megatron-models/345m
VOCAB_FILE=/workspace/model/gpt2-vocab/gpt2-vocab.json
MERGE_FILE=/workspace/model/gpt2-vocab/gpt2-merges.txt
DATA_PATH=/workspace/data/my-gpt2_text_document
MODEL_PATH=/workspace/model/megatron-models/output
# 模åÂÂè¶Â
Ã¥ÂÂæ°
GPT_ARGS="
--num-layers 24 \
--hidden-size 1024 \
--num-attention-heads 16 \
--seq-length 1024 \
--max-position-embeddings 1024 \
--micro-batch-size 1 \
--global-batch-size 2 \
--lr 0.00015 \
--train-iters 5000 \
--lr-decay-iters 320000 \
--lr-decay-style cosine \
--min-lr 1.0e-5 \
--weight-decay 1e-2 \
--lr-warmup-fraction .01 \
--clip-grad 1.0 \
--fp16
"
# æ°æ®éÂÂÃ¥ÂÂè¯Â表路å¾ÂÃ¥ÂÂæ°
DATA_ARGS="
--data-path $DATA_PATH \
--vocab-file $VOCAB_FILE \
--merge-file $MERGE_FILE \
--data-impl mmap \
--split 700,200,100
"
# 模åÂÂæÂÂéÂÂè¾ÂåºãÂÂè¯Âä¼°ãÂÂæÂ¥å¿Âç¸åÂ
³çÂÂÃ¥ÂÂæ°
OUTPUT_ARGS="
--log-interval 100 \
--save-interval 10000 \
--eval-interval 1000 \
--eval-iters 10
"
# å¯å¨è®Âç»Âä»»å¡
torchrun pretrain_gpt.py \
$GPT_ARGS \
$DATA_ARGS \
$OUTPUT_ARGS \
--save $MODEL_PATH \
--load $CHECKPOINT_PATH
ç¶åÂÂï¼Âè¿Âè¡Âå¦Âä¸ÂèÂÂæ¬è¿Âè¡Âè®Âç»Âï¼Â
CUDA_VISIBLE_DEVICES=3 sh examples/pretrain_gpt.sh
è®Âç»Âå®ÂæÂÂä¹ÂÃ¥ÂÂï¼Â模åÂÂæÂÂéÂÂè¾Âåºå¦Âä¸ÂæÂÂ示ï¼Â
> tree -h 345m
345m
âÂÂâÂÂâ [4.0K] iter_0005000
â âÂÂâÂÂâ [4.0K] mp_rank_00
â âÂÂâÂÂâ [4.6G] model_optim_rng.pt
âÂÂâÂÂâ [ 4] latest_checkpointed_iteration.txt
> cat 345m/latest_checkpointed_iteration.txt
5000
é¤äºÂÃ¥ÂÂå¡è¿Âè¡Âè®Âç»Âä¹Âå¤Âï¼ÂæÂÂ们è¿Âå¯以使ç¨å¤Âå¡è¿Âè¡Âè®Âç»ÂãÂÂä¸Âé¢åÂÂå«æ¼Â示使ç¨4å¡æ°æ®并è¡ÂãÂÂ4å¡张éÂÂ并è¡ÂãÂÂ4å¡æµÂ水线并è¡ÂãÂÂ以åÂÂå¤Âç»´æ··åÂÂ并è¡Âï¼Â2å¡张éÂÂ并è¡ÂãÂÂ2å¡æµÂ水线并è¡Âï¼Âè®Âç»ÂãÂÂ
æ°æ®并è¡Âè®Âç»Âï¼Â4DPï¼Â
ä¸Âé¢使ç¨4DPè¿Âè¡Âæ°æ®并è¡Âè®Âç»Âï¼Âè¿Âè¡Âpretrain_gpt_distributed.shèÂÂæ¬è¿Âè¡Âè®Âç»ÂãÂÂ
è®Âç»Âå®ÂæÂÂä¹ÂÃ¥ÂÂï¼Â模åÂÂæÂÂéÂÂè¾Âåºï¼Â
tree -h /workspace/model/megatron-models/345m-init-4tp
/workspace/model/megatron-models/345m-init-4tp
âÂÂâÂÂâ [4.0K] iter_0002000
â âÂÂâÂÂâ [4.0K] mp_rank_00
â â âÂÂâÂÂâ [1.2G] model_optim_rng.pt
...
â âÂÂâÂÂâ [4.0K] mp_rank_03
â âÂÂâÂÂâ [1.2G] model_optim_rng.pt
âÂÂâÂÂâ [ 4] latest_checkpointed_iteration.txt
10 directories, 9 files
> cat /workspace/model/megatron-models/345m-init-4tp/latest_checkpointed_iteration.txt
2000
è®Âç»Âè¿Âç¨Âä¸Âï¼Âæ¾åÂÂå ç¨ï¼Â
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3227288 C /usr/bin/python 9652MiB |
| 1 N/A N/A 3227289 C /usr/bin/python 9652MiB |
| 2 N/A N/A 3227290 C /usr/bin/python 9652MiB |
| 3 N/A N/A 3227291 C /usr/bin/python 9652MiB |
+-----------------------------------------------------------------------------+
模åÂÂ并è¡Âè®Âç»Âï¼Â4PPï¼Â
ä¸Âé¢使ç¨4PPè¿Âè¡Â模åÂÂ并è¡Âè®Âç»Âï¼Â使ç¨pretrain_gpt_distributed_with_4pp.shèÂÂæ¬è¿Âè¡Âè®Âç»ÂãÂÂ
è®Âç»Âå®ÂæÂÂä¹ÂÃ¥ÂÂï¼Â模åÂÂæÂÂéÂÂè¾Âåºï¼Â
> tree -h /workspace/model/megatron-models/345m-init-4pp
/workspace/model/megatron-models/345m-init-4pp
âÂÂâÂÂâ [4.0K] iter_0002000
â âÂÂâÂÂâ [4.0K] mp_rank_00_000
â â âÂÂâÂÂâ [1.7G] model_optim_rng.pt
â âÂÂâÂÂâ [4.0K] mp_rank_00_001
â â âÂÂâÂÂâ [1009M] model_optim_rng.pt
â âÂÂâÂÂâ [4.0K] mp_rank_00_002
â â âÂÂâÂÂâ [1009M] model_optim_rng.pt
â âÂÂâÂÂâ [4.0K] mp_rank_00_003
â âÂÂâÂÂâ [1.7G] model_optim_rng.pt
âÂÂâÂÂâ [ 4] latest_checkpointed_iteration.txt
> cat /workspace/model/megatron-models/345m-init-4pp/latest_checkpointed_iteration.txt
2000
è®Âç»Âè¿Âç¨Âä¸Âï¼Âæ¾åÂÂå ç¨ï¼Â
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2630871 C /usr/bin/python 8680MiB |
| 1 N/A N/A 2630872 C /usr/bin/python 6408MiB |
| 2 N/A N/A 2630873 C /usr/bin/python 5080MiB |
| 3 N/A N/A 2630874 C /usr/bin/python 5436MiB |
+-----------------------------------------------------------------------------+
模åÂÂ并è¡Âè®Âç»Âï¼Â4TPï¼Â
ä¸Âé¢使ç¨4TPè¿Âè¡Â模åÂÂ并è¡Âè®Âç»Âï¼Â使ç¨pretrain_gpt_distributed_with_4tp.shèÂÂæ¬è¿Âè¡Âè®Âç»ÂãÂÂ
è®Âç»Âå®ÂæÂÂä¹ÂÃ¥ÂÂï¼Â模åÂÂæÂÂéÂÂè¾Âåºï¼Â
tree -h /workspace/model/megatron-models/345m-init-4tp
/workspace/model/megatron-models/345m-init-4tp
âÂÂâÂÂâ [4.0K] iter_0002000
â âÂÂâÂÂâ [4.0K] mp_rank_00
â â âÂÂâÂÂâ [1.2G] model_optim_rng.pt
...
â âÂÂâÂÂâ [4.0K] mp_rank_03
â âÂÂâÂÂâ [1.2G] model_optim_rng.pt
âÂÂâÂÂâ [ 4] latest_checkpointed_iteration.txt
> cat /workspace/model/megatron-models/345m-init-4tp/latest_checkpointed_iteration.txt
2000
è®Âç»Âè¿Âç¨Âä¸Âï¼Âæ¾åÂÂå ç¨ï¼Â
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3895346 C /usr/bin/python 4236MiB |
| 1 N/A N/A 3895347 C /usr/bin/python 4176MiB |
| 2 N/A N/A 3895348 C /usr/bin/python 4168MiB |
| 3 N/A N/A 3895349 C /usr/bin/python 4176MiB |
+-----------------------------------------------------------------------------+
模åÂÂ并è¡Âè®Âç»Âï¼Â2TP+2PPï¼Â
ä¸Âé¢使ç¨2TPÃ¥ÂÂ2PPè¿Âè¡Â模åÂÂ并è¡Âè®Âç»Âï¼Âè¿Âè¡Âpretrain_gpt_distributed_with_mp.shèÂÂæ¬è¿Âè¡Âè®Âç»ÂãÂÂ
è®Âç»Âå®ÂæÂÂä¹ÂÃ¥ÂÂï¼Â模åÂÂæÂÂéÂÂè¾Âåºï¼Â
> tree -h 345m-init-mp
345m-init-mp
âÂÂâÂÂâ [4.0K] iter_0005000
â âÂÂâÂÂâ [4.0K] mp_rank_00_000
â â âÂÂâÂÂâ [1.3G] model_optim_rng.pt
â âÂÂâÂÂâ [4.0K] mp_rank_00_001
â â âÂÂâÂÂâ [1.3G] model_optim_rng.pt
â âÂÂâÂÂâ [4.0K] mp_rank_01_000
â â âÂÂâÂÂâ [1.3G] model_optim_rng.pt
â âÂÂâÂÂâ [4.0K] mp_rank_01_001
â âÂÂâÂÂâ [1.3G] model_optim_rng.pt
âÂÂâÂÂâ [ 4] latest_checkpointed_iteration.txt
è®Âç»Âè¿Âç¨Âä¸Âï¼Âæ¾åÂÂå ç¨ï¼Â
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3448098 C /usr/bin/python 8732MiB |
| 1 N/A N/A 3448099 C /usr/bin/python 8732MiB |
| 2 N/A N/A 3448100 C /usr/bin/python 6828MiB |
| 3 N/A N/A 3448101 C /usr/bin/python 7078MiB |
+-----------------------------------------------------------------------------+
模åÂÂæÂÂéÂÂÃ¥ÂÂ并
Ã¥ÂÂ并åÂÂå¸Âå¼Â并è¡Âè®Âç»ÂçÂÂ模åÂÂï¼Âå¨æ´å°Âç GPU ä¸Â使ç¨å¯è½ä¼Âæ´æÂÂå©ãÂÂ
éÂÂè¿Â以ä¸ÂèÂÂæ¬å®ÂæÂÂÃ¥ÂÂ并æÂÂä½Âã æ¤示ä¾Â读åÂÂå ·æ 2TP å 2PP 模åÂÂ并è¡Âè®Âç»Âç GPT 模åÂÂï¼Â并è¾Âåº堷æ 1TP å 1PP çÂÂ模åÂÂãÂÂ
python tools/checkpoint_util.py \
--model-type GPT \
--load-dir /workspace/model/megatron-models/345m-init-mp\
--save-dir /workspace/model/megatron-models/345m-init-mp-out \
--target-tensor-parallel-size 1 \
--target-pipeline-parallel-size 1
模åÂÂæÂÂéÂÂÃ¥ÂÂ并ä¹ÂÃ¥ÂÂï¼Âä¸Âé¢使ç¨åÂÂ并åÂÂçÂÂæÂÂéÂÂè¿Âè¡Â模åÂÂè¯Âä¼°åÂÂæ¨çÂÂãÂÂ
模åÂÂè¯Âä¼°
ä¸Âé¢åº亠LAMBADA æ°æ®é è¿Âè¡Âå®Â形填空åÂÂç¡®çÂÂï¼Âå¨ç»Âå®ÂÃ¥ÂÂé¢çÂÂTokençÂÂæ åµä¸Âï¼Âé¢ÂæµÂæÂÂÃ¥ÂÂä¸Â个TokençÂÂÃ¥ÂÂç¡®æ§ï¼Âè¯Âä¼°ãÂÂ
使ç¨以ä¸Âå½令è¿Âè¡Â模åÂÂè¯Âä¼°ï¼Âæ§è¡ÂèÂÂæ¬ä¹ÂÃ¥ÂÂéÂÂé¢Âå Âé Â置模åÂÂæÂÂéÂÂãÂÂè¯Âä¼°æ°æ®éÂÂãÂÂè¯Â表路å¾ÂçÂÂãÂÂ
sh eval_gpt2_lambada.sh
注æÂÂï¼ÂåºÂ使ç¨ --strict-lambada
æÂ¥è¦Âæ±Âæ´个åÂÂè¯Âå¹éÂ
ÂãÂÂ
è¿Âè¡Âè¿Âç¨Âé¨åÂÂæÂ¥å¿Âå¦Âä¸Âï¼Â
using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1
setting global batch size to 8
using torch.float16 for parameters ...
------------------------ arguments ------------------------
accumulate_allreduce_grads_in_fp32 .............. False
...
world_size ...................................... 1
-------------------- end of arguments ---------------------
setting number of micro-batches to constant 1
> building GPT2BPETokenizer tokenizer ...
> padded vocab (size: 50257) with 47 dummy tokens (new size: 50304)
> initializing torch distributed ...
> initialized tensor model parallel with size 1
> initialized pipeline model parallel with size 1
> setting random seeds to 1234 ...
> compiling dataset index builder ...
...
make: Leaving directory '/workspace/code/bak/Megatron-LM/megatron/data'
>>> done with dataset index builder. Compilation time: 13.399 seconds
> compiling and loading fused kernels ...
>>> done with compiling and loading fused kernels. Compilation time: 1.411 seconds
building GPT model ...
> number of parameters on (tensor, pipeline) model parallel rank (0, 0): 354871296
loading checkpoint from /workspace/model/megatron-models/345m-init-mp-out at iteration 5000
checkpoint version 3.0
successfully loaded checkpoint from /workspace/model/megatron-models/345m-init-mp-out at iteration 5000
> building lambada dataset from /workspace/data/lambada_test.jsonl ...
> found 5153 samples.
> working on iteration: 0
...
> working on iteration: 640
--------------------------------------------------------------------------------------------------------------------
validation results on LAMBADA | number correct: 0.0000E+00 | total examples: 5.1530E+03 | avg accuracy: 0.0000E+00
--------------------------------------------------------------------------------------------------------------------
done :-)
模åÂÂæ¨çÂÂæÂÂå¡
å¨ tools/run_text_ Generation_server.py
ä¸ÂÃ¥ÂÂ
å«äºÂä¸Â个ç®ÂÃ¥ÂÂç REST æÂÂå¡ï¼Âç¨äºÂçÂÂæÂÂæÂÂæ‹ÂÂè¿Âè¡Âå®Âï¼Âä½ éÂÂè¦ÂæÂÂå®ÂéÂÂå½ÂçÂÂé¢Âè®Âç»Âæ£ÂæÂ¥ç¹ï¼Âcheckpointï¼Âã è¿ÂæÂÂä¸ÂäºÂå¯éÂÂÃ¥ÂÂæ°ï¼Âtemperature
ï¼Âàtop-k
Ã¥ÂÂàtop-p
çÂÂå¯以éÂ
Âç½®ï¼Â详ç»Âä¿¡æ¯请åÂÂéÂÂ
--help
æÂÂæºÂæÂÂ件ãÂÂ
å¯å¨æ¨çÂÂæÂÂå¡ä¹ÂÃ¥ÂÂï¼ÂéÂÂé¢Âå Âå®Â裠ä¾ÂèµÂåºÂï¼Â
pip install flask flask-restful -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn
å®Âè£Â
å®ÂæÂÂä¹ÂÃ¥ÂÂï¼Â使ç¨examples/run_text_generation_server_345M.sh
èÂÂ振å¨åºäºÂGPT2模åÂÂçÂÂæ¨çÂÂæÂÂå¡ãÂÂ
sh examples/run_text_generation_server_345M.sh
æ¨çÂÂæÂÂå¡è¿Âè¡ÂÃ¥ÂÂï¼Âæ¨å¯以使ç¨ tools/text_ Generation_cli.py
æ¥请æ±ÂæÂ¥å£ï¼Âå®ÂéÂÂè¦Âä¸Â个åÂÂæ°ï¼Âå³æÂÂå¡è¿Âè¡ÂçÂÂ主æºãÂÂ
> python tools/text_generation_cli.py localhost:5000
Enter prompt: hello
Enter number of tokens to generate: 5
Megatron Response:
hello! Until that protagonist receive
Enter prompt: world
Enter number of tokens to generate: 2
Megatron Response:
worldboarding-
Enter prompt:
é¤æ¤ä¹Âå¤Âï¼Âæ¨è¿Âå¯以使ç¨ curl æÂÂä»»ä½Âå ¶ä»ÂæÂ¥å£æµÂè¯Â工堷ç´æ¥请æ±ÂæÂ¥å£ï¼Â
> curl 'http://localhost:5000/api' -X 'PUT' -H 'Content-Type: application/json; charset=UTF-8' -d '{"prompts":["Hello world"], "tokens_to_generate":1}'
{"logprobs":null,"segments":[["Hello"," world",","]],"text":["Hello world,"]}
ä¸Âé¢æ¯使ç¨åÂÂå¡è¿Âè¡Â模åÂÂæ¨çÂÂï¼ÂæÂÂ们è¿Âå¯以è¿Âè¡Âå¤Âå¡模åÂÂ并è¡Âæ¨çÂÂãÂÂ
使ç¨4TPè¿Âè¡Â模åÂÂæ¨çÂÂï¼Â
sh examples/run_text_generation_server_345M_4_tensor_parallel.sh
æ¾åÂÂå ç¨ï¼Â
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1844443 C /usr/bin/python 788MiB |
| 1 N/A N/A 1844444 C /usr/bin/python 788MiB |
| 2 N/A N/A 1844445 C /usr/bin/python 788MiB |
| 3 N/A N/A 1844446 C /usr/bin/python 788MiB |
+-----------------------------------------------------------------------------+
使ç¨2TP+2PPè¿Âè¡Â模åÂÂæ¨çÂÂï¼Â
sh examples/run_text_generation_server_345M_2tp_2dp.sh
æ¾åÂÂå ç¨ï¼Â
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1869409 C /usr/bin/python 1222MiB |
| 1 N/A N/A 1869410 C /usr/bin/python 1222MiB |
| 2 N/A N/A 1869411 C /usr/bin/python 1222MiB |
| 3 N/A N/A 1869412 C /usr/bin/python 1222MiB |
+-----------------------------------------------------------------------------+
ç»Âè¯Â
æ¾ÂÂåºäºÂè±ä¼Âè¾¾å¼ÂæºÂç Megatron-LM æ¡Âæ¶å®ÂæÂÂäºÂGPT2 模åÂÂçÂÂé¢Âè®Âç»ÂãÂÂ模åÂÂè¯Âä¼°åÂÂæ¨çÂÂçÂÂæ´个è¿Âç¨ÂãÂÂÃ¥ÂÂæ¶ï¼Âä¹Â讲述äºÂÃ¥ÂÂ夠GPT2 模åÂÂè®Âç»ÂçÂÂæ°æ®éÂÂçÂÂæ´个é¢Âå¤ÂçÂÂè¿Âç¨ÂãÂÂ
å¦ÂæÂÂè§Âå¾ÂæÂÂçÂÂæÂÂç« è½å¤Âè½å¤Âç»Âæ¨带æ¥帮å©ï¼ÂæÂÂå¾ æ¨çÂÂç¹èµÂæ¶èÂÂå 堳注~~