Local Storage

Local Storage

One example of using local storage on the compute node for a loaded model from huggingface. This Python code will download the model and store it in the temporal folder on the compute node, instead of the default folder in ~/.cache/huggingface. This vastly reduces the loading time of the model as the 10 GB huge model is not saved and loaded via the network.

  1. Set up a miniconda environment as it is described in First Steps.
  2. Activate the conda environment and make sure that all required Python packages are installed:
> conda activate
> pip install torch==2.0.1 transformers
  1. Save the following sbatch script as falcon.sbatch:
#!/bin/bash
#SBATCH --partition=study
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00
#SBATCH --mem=30G
#SBATCH --tmp=20G

echo "Process $SLURM_PROCID of Job $SLURM_JOBID with the local id $SLURM_LOCALID using gpu id $CUDA_DEVICE (we may use gpu: $CUDA_VISIBLE_DEVICES on $(hostname))"
echo "computing on $(nvidia-smi --query-gpu=gpu_name --format=csv -i $CUDA_DEVICE | tail -n 1)"

srun python falcon.py
echo "done"
  1. Save the following Python script as falcon.py, this script is adjusted from https://huggingface.co/tiiuae/falcon-7b.

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
import os

model = "tiiuae/falcon-7b"
cache = os.environ['SLURM_JOB_TMP']
AutoModelForCausalLM.from_pretrained(model,cache_dir=cache)

tokenizer = AutoTokenizer.from_pretrained(model,cache_dir=cache)

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)

sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
   max_length=200,
   do_sample=True,
   top_k=10,
   num_return_sequences=1,
   eos_token_id=tokenizer.eos_token_id,
)

for seq in sequences:
        print(f"Result: {seq['generated_text']}")

Note that the temporal directory is used twice. First in Line 9, to download and save the model (this line is added to the given example in the web), and the second time in Line 11 where the model is loaded.

  1. Start the job with sbatch falcos.sbatch and read the generated text in the output file:
> cat slurm-*.out 
Process 0 of Job 24211 with the local id 0 using gpu id
 (we may use gpu: 0 on servant-3.GPU.CIT-EC.NET)
computing on 
config.json: 100%|██████████| 1.05k/1.05k [00:00<00:00, 4.92MB/s]
pytorch_model.bin.index.json: 100%|██████████| 16.9k/16.9k [00:00<00:00, 45.3MB/s]
pytorch_model-00001-of-00002.bin: 100%|██████████| 9.95G/9.95G [01:16<00:00, 130MB/s]
pytorch_model-00002-of-00002.bin: 100%|██████████| 4.48G/4.48G [00:34<00:00, 128MB/s] 
Downloading shards: 100%|██████████| 2/2 [01:51<00:00, 55.94s/it]0:34<00:00, 130MB/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:33<00:00, 16.98s/it]
generation_config.json: 100%|██████████| 117/117 [00:00<00:00, 740kB/s]
tokenizer_config.json: 100%|██████████| 287/287 [00:00<00:00, 1.58MB/s]
tokenizer.json: 100%|██████████| 2.73M/2.73M [00:00<00:00, 31.7MB/s]
special_tokens_map.json: 100%|██████████| 281/281 [00:00<00:00, 1.94MB/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:16<00:00,  8.34s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
Result: Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compare to the glorious majesty of the giraffe.
Daniel: Hello, Girafatron!
Girafatron: HELLO, DANIEL!
Daniel: You are a giraffe, aren't you, Girafatron?
Girafatron: (loudly) NO! I AM A GROTON, DANIEL!
Daniel: (sighs) Well, it's a bit of a disappointment, Girafatron, but I am still glad you are here. It's been a rough week.
Girafatron: Oh? What kind of rough week? (looks concerned)
Daniel: Well, I have been having some issues at home with my parents, and they've grounded me from using my
done