Large Language Models (LLM)¶
Description¶
Large language models (LLMs) largely represent a class of deep learning architectures called transformer networks, that learn context and meaning by tracking relationships in sequential data, like the words in a sentence. By training on internet-scale datasets with hundreds of billions of parameters, LLMs models have unlocked an AI ability to generate human-like content. These models can read, write, code, draw, and create in a credible fashion and augment human creativity and improve productivity across industries to solve the world’s toughest problems.
The applications for LLMs span across a plethora of use cases, such as learning the language of protein sequences to provide viable compounds that will help scientists make groundbreaking discoveries. Alternatively, leveraging LLMs to generate code based on natural language descriptors can increase software programmers productivity.
HPC-FineTuning-Examples¶
Users can access publicly available HPC Fine-Tuning Examples github repository dedicated to providing examples and tutorials for fine-tuning language models using HPC system Devana. Our goal is to help researchers and practitioners efficiently leverage HPC infrastructure to enhance their NLP workflows.
Contents¶
- BGE Embedding and Retrieval: Scripts for building information retrieval tool using BGE embeddings - BGE-M3 model, to retrieve similar documents from the 20 Newsgroups dataset using an Annoy index.
- LLM Training: Scripts for fine-tuning a pre-trained language model (LLM) using the Hugging Face Transformers library. The scripts demonstrate how to load a pre-trained model, tokenize input text, and train the model on a custom dataset.
- LLM Inference: Scripts for running inference on a pre-trained language model (LLM) using the Hugging Face Transformers library. The scripts demonstrate how to load a pre-trained model, tokenize input text, and generate output text.
- Named Entity Recognition with Transformers: Scripts and utilities for training DistilBERT model for Named Entity Recognition (NER). The model is fine-tuned on the CoNLL-2003 dataset, which includes named entities such as persons (PER), organizations (ORG), locations (LOC), and miscellaneous entities (MISC).
- BERT Sentiment Analysis Training: Scripts to train a BERT model for sentiment analysis using the Twitter dataset. It includes data preprocessing, model training, evaluation, and inference.
Requirements¶
To use the examples provided in this repository, you have two options:
[1] Use an Existing Singularity Image: Utilize the existing Singularity image named pt-2.3_llm.sif
, located at /storage-data/singularity_containers/pt-2.3_llm.sif
, which contains all the necessary libraries.
[2] Create a New Singularity Image: Follow the instructions below to create a new Singularity image.
Instructions for Creating a New Singularity Image¶
- Singularity: Ensure you have Singularity-CE version 3.10 or higher installed.
- Recipe File: Use the Singularity recipe file named
pt_devel_llm.recipe
. - Build Command: To build a Singularity image from the recipe file, use the following command:
sudo singularity build test.sif pt_devel_llm.recipe
Sudo access
Note that creation of singularity container requires superuser priviliges and as such cannot be done on Devana cluster.
Hugging Face Token Setup¶
Some of the scripts require that you have account and the access token generated at the Hugging Face website. The token is used to access some models. The token should be added to the code as a string as shown at the access token.
Example: Loading a Model with an Access Token¶
To load a model using an access token, you can use the following code snippet:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "your-model-id"
access_token = "your-access-token"
model = AutoModelForCausalLM.from_pretrained(
model_id,
token=access_token, # Add the access token here
device_map='cuda'
)
tokenizer = AutoTokenizer.from_pretrained(
model_id,
token=access_token # Add the access token here
)
Before accessing certain models, you may need to agree to the conditions on the model card on the Hugging Face website, including sharing your contact information (email and username) with the repository authors.
BGE Embeddings and Retrieval¶
Model that can be found at publicly available github repository serves to create a vector database using open-source BGE M3 embedding model. This model is used to generate vector representations of texts that allow to perform efficient text searches. These embeddings serve to create a vector database that facilitates quick search and analysis of documents based on their content similarity.
Required files
- bge_embeddings.py: Script to preprocess the 20 Newsgroups dataset, generate BGE embeddings using the BGE-M3 model, and to build an Annoy index for efficient retrieval.
You can download the bge_embeddings.py file as:
wget https://github.com/NSCC-Slovakia/NCC_HPC-FineTuning-Examples/raw/main/Embedding%20indices/bge_embeddings.py
To run bge_embeddings.py within the SLURM scheduler environment copy and modify following script to run_embeddings.sh and submit job to a compute node by command sbatch run_embeddings.sh
.
#!/bin/bash
#SBATCH --account= # Project number
#SBATCH --job-name= # Name of the job in SLURM
#SBATCH --partition=gpu # Selected partition
#SBATCH --gres=gpu:1 # Number of GPUs
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node= # Number of MPI ranks
#SBATCH --cpus-per-task= # Number of OMP threads per MPI rank
#SBATCH --output job%J.out # Standard output
#SBATCH --error job%J.err # Standard error
# Load modules
module purge
module load singularity
# Disable users python packages
export PYTHONNOUSERSITE=1
# Execute bge_embeddings
singularity exec --nv /storage-data/singularity_containers/pt-2.3_llm.sif python3 bge_embeddings.py
Required files
- bge_retrieval.py: Script to retrieve similar documents from the 20 Newsgroups dataset based on a query using the pre-built Annoy index and BGE-M3 embeddings.
You can download the bge_retrieval.py file as:
wget https://github.com/NSCC-Slovakia/NCC_HPC-FineTuning-Examples/raw/main/Embedding%20indices/bge_retrieval.py
To run bge_retrieval.py within the SLURM scheduler environment copy and modify following script to run_retrieval.sh and submit job to a compute node by command sbatch run_retrieval.sh
. It is required to specify your query and number of files to retrieve.
#!/bin/bash
#SBATCH --account= # Project number
#SBATCH --job-name= # Name of the job in SLURM
#SBATCH --partition=gpu # Selected partition
#SBATCH --gres=gpu:1 # Number of GPUs
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node= # Number of MPI ranks
#SBATCH --cpus-per-task= # Number of OMP threads per MPI rank
#SBATCH --output job%J.out # Standard output
#SBATCH --error job%J.err # Standard error
# Load modules
module purge
module load singularity
# Disable users python packages
export PYTHONNOUSERSITE=1
# User input variables
query="latest advancements in cancer treatment"
files=3
# Execute bge_retrieval
singularity exec --nv /storage-data/singularity_containers/pt-2.3_llm.sif python3 bge_retrieval.py --query $query --n_neighbors $files
Note
Customize paths and file names as necessary based on your project structure and requirements.
LLM Training¶
Publicly available github repository provides scripts for fine-tuning a LLM model, trained and tested on the question-answering tasks using the MedMCQA dataset, using the Hugging Face Transformers library. The scripts demonstrate how to load a pre-trained model, tokenize input text, and train the model on a custom dataset. The performance could be compared to the Mistral-7B-Instruct-v0.2 models that was not trained from the cookbooks in LLM inference folder.
Required files
- train_mistral_peft.py: Script to fine-tune the Mistral-7B-Instruct-v0.2 model on the MedMCQA dataset.
You can download the train_mistral_peft.py file as:
wget https://github.com/NSCC-Slovakia/NCC_HPC-FineTuning-Examples/raw/main/LLM%20training/train_mistral_peft.py
To run train_mistral_peft.py within the SLURM scheduler environment copy and modify following script to run_training.sh and submit job to a compute node by command sbatch run_training.sh
.
#!/bin/bash
#SBATCH --account= # Project number
#SBATCH --job-name= # Name of the job in SLURM
#SBATCH --partition=gpu # Selected partition
#SBATCH --gres=gpu:1 # Number of GPUs
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node= # Number of MPI ranks
#SBATCH --cpus-per-task= # Number of OMP threads per MPI rank
#SBATCH --output job%J.out # Standard output
#SBATCH --error job%J.err # Standard error
# Load modules
module purge
module load singularity
# Disable users python packages
export PYTHONNOUSERSITE=1
# Execute train_mistral_peft.py
singularity exec --nv /storage-data/singularity_containers/pt-2.3_llm.sif python3 train_mistral_peft.py
Required files
- mistral_trained_pred.py: Script to run inference on the fine-tuned Mistral-7B-Instruct-v0.2 model on the MedMCQA dataset.
You can download the mistral_trained_pred.py file as:
wget https://github.com/NSCC-Slovakia/NCC_HPC-FineTuning-Examples/blob/raw/LLM%20training/mistral_trained_pred.py
To run mistral_trained_pred.py within the SLURM scheduler environment copy and modify following script to run_trained_inference.sh and submit job to a compute node by command sbatch run_trained_inference.sh
.
#!/bin/bash
#SBATCH --account= # Project number
#SBATCH --job-name= # Name of the job in SLURM
#SBATCH --partition=gpu # Selected partition
#SBATCH --gres=gpu:1 # Number of GPUs
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node= # Number of MPI ranks
#SBATCH --cpus-per-task= # Number of OMP threads per MPI rank
#SBATCH --output job%J.out # Standard output
#SBATCH --error job%J.err # Standard error
# Load modules
module purge
module load singularity
# Disable users python packages
export PYTHONNOUSERSITE=1
# Execute train.py
singularity exec --nv /storage-data/singularity_containers/pt-2.3_llm.sif python3 mistral_trained_pred.py
Note
The training does not fine-tune the whole model, but only tune the adapters using PEFT. It is more efficient than fine-tuning the whole model. The saved adapters have less size than the model itself. The training script saves the model adapters in the data folder. The mistral_trained_pred.py
script loads the adapters and you can specifi the checkpoint to load it from. Customize paths and file names as necessary based on your project structure and requirements.
LLM Inference¶
Publicly available github repository contains scripts for running inference on a pre-trained language model (LLM) using the Hugging Face Transformers library. The scripts demonstrate how to load a pre-trained model, tokenize input text, and generate output text. The repository includes a script for running inference on the AYA-101 and Mistral-7B-Instruct-v0.2 models. The models are tested on the question answering task using the MedMCQA dataset.
Required files
- aya_pred.py: Script to run inference on the AYA-101 model using the MedMCQA dataset.
You can download the aya_pred.py file as:
wget https://github.com/NSCC-Slovakia/NCC_HPC-FineTuning-Examples/raw/main/LLM%20inference/aya_pred.py
To run aya_pred.py within the SLURM scheduler environment copy and modify following script to run_inference_aya.sh and submit job to a compute node by command sbatch run_inference_aya.sh
.
#!/bin/bash
#SBATCH --account= # Project number
#SBATCH --job-name= # Name of the job in SLURM
#SBATCH --partition=gpu # Selected partition
#SBATCH --gres=gpu:1 # Number of GPUs
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node= # Number of MPI ranks
#SBATCH --cpus-per-task= # Number of OMP threads per MPI rank
#SBATCH --output job%J.out # Standard output
#SBATCH --error job%J.err # Standard error
# Load modules
module purge
module load singularity
# Disable users python packages
export PYTHONNOUSERSITE=1
# Execute aya_pred.py
singularity exec --nv /storage-data/singularity_containers/pt-2.3_llm.sif python3 aya_pred.py
Required files
- mistral_pred.py: Script to run inference on the Mistral-7B-Instruct-v0.2 model using the MedMCQA dataset.
You can download the mistral_pred.py file as:
wget https://github.com/NSCC-Slovakia/NCC_HPC-FineTuning-Examples/raw/main/LLM%20inference/mistral_pred.py
To run mistral_pred.py within the SLURM scheduler environment copy and modify following script to run_inference_mistral.sh and submit job to a compute node by command sbatch run_inference_mistral.sh
.
#!/bin/bash
#SBATCH --account= # Project number
#SBATCH --job-name= # Name of the job in SLURM
#SBATCH --partition=gpu # Selected partition
#SBATCH --gres=gpu:1 # Number of GPUs
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node= # Number of MPI ranks
#SBATCH --cpus-per-task= # Number of OMP threads per MPI rank
#SBATCH --output job%J.out # Standard output
#SBATCH --error job%J.err # Standard error
# Load modules
module purge
module load singularity
# Disable users python packages
export PYTHONNOUSERSITE=1
# Execute mistral_pred.py
singularity exec --nv /storage-data/singularity_containers/pt-2.3_llm.sif python3 mistral_pred.py
Named Entity Recongnition (NER) with Transformers¶
Publicly available github repository contains scripts and utilities for training DistilBERT model for the task of identifying entities (Named Entities) in text. The model is fine-tuned on the CoNLL-2003 dataset, which includes named entities such as persons (PER), organizations (ORG), locations (LOC), and miscellaneous entities (MISC). Additionally, training hyperparameter values can be optimized using the Optuna library, which allows to systematically search for the best values of learning speed, batch size, number of epochs, etc.
Required files
- train.py: Script for fine-tuning DistilBERT for NER, includes evaluation on test set and saved training and validation history.
- train_with_optuna.py: Variant of train.py script that includes search for optimal set of hyperparameters using Optuna library.
You can download the train.py and train_with_optuna.py files as:
wget https://github.com/NSCC-Slovakia/NCC_HPC-FineTuning-Examples/raw/main/NER/train.py
wget https://github.com/NSCC-Slovakia/NCC_HPC-FineTuning-Examples/raw/main/NER/train_with_optuna.py
To run train.py within the SLURM scheduler environment copy and modify following script to run_train.sh and submit job to a compute node by command sbatch run_train.sh
.
#!/bin/bash
#SBATCH --account= # Project number
#SBATCH --job-name= # Name of the job in SLURM
#SBATCH --partition=gpu # Selected partition
#SBATCH --gres=gpu:1 # Number of GPUs
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node= # Number of MPI ranks
#SBATCH --cpus-per-task= # Number of OMP threads per MPI rank
#SBATCH --output job%J.out # Standard output
#SBATCH --error job%J.err # Standard error
# Load modules
module purge
module load singularity
# Disable users python packages
export PYTHONNOUSERSITE=1
# Execute train.py
singularity exec --nv /storage-data/singularity_containers/pt-2.3_llm.sif python3 train.py
# Alternatively execute search for optimal set of hyperparameters using Optuna library
# singularity exec --nv /storage-data/singularity_containers/pt-2.3_llm.sif python3 train_with_optima.py
Required files
- inference.py: Script for making predictions using the trained model.
You can download the inference.py file as:
wget https://github.com/NSCC-Slovakia/NCC_HPC-FineTuning-Examples/raw/main/NER/inference.py
Model Fine-tuning Details
- Model: DistilBERT (Base, Uncased)
- Training Parameters:
- Learning rate: 2e-5
- Batch size: 16
- Number of epochs: 5
- Evaluation Metrics: Loss
To run inference.py within the SLURM scheduler environment copy and modify following script to run_inference.sh and submit job to a compute node by command sbatch run_inference.sh
. It is required to specify path to trained models from previous example, note the models/final_model part.
#!/bin/bash
#SBATCH --account= # Project number
#SBATCH --job-name= # Name of the job in SLURM
#SBATCH --partition=gpu # Selected partition
#SBATCH --gres=gpu:1 # Number of GPUs
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node= # Number of MPI ranks
#SBATCH --cpus-per-task= # Number of OMP threads per MPI rank
#SBATCH --output job%J.out # Standard output
#SBATCH --error job%J.err # Standard error
# Load modules
module purge
module load singularity
# Disable users python packages
export PYTHONNOUSERSITE=1
# User input variables
trained_models_path=/home/user/NER/models/final_model
# Execute bge_retrieval
singularity exec --nv /storage-data/singularity_containers/pt-2.3_llm.sif python3 inference.py --model_path ${trained_models_path}
Note
Customize paths and file names as necessary based on your project structure and requirements. The downloaded files will be saved in cache for subsequent runs.
BERT Sentiment Analysis Training¶
Bidirectional Encoder Representations from Transformers (BERT), developed by Google, is a state-of-the-art AI encoder model. It is a public (open-source) model with a relatively simple architecture, that is used in various tasks mainly thanks to its flexibility. Publicly available github repository provides scripts to train a BERT model for sentiment analysis using the Twitter dataset. It includes data preprocessing, model training, evaluation, and inference.
Required files
- sentiment_analysis_bert_train.py: Script to preprocess the Twitter dataset, train a BERT model for sentiment analysis, evaluate the model, and save the trained model.
- Twitter_Data.csv: CSV file containing the dataset used for sentiment analysis.
You can download the sentiment_analysis_bert_train.py and Twitter_Data.csv files as:
wget https://github.com/NSCC-Slovakia/NCC_HPC-FineTuning-Examples/raw/main/Sentiment%20analysis/sentiment_analysis_bert_train.py
wget https://github.com/NSCC-Slovakia/NCC_HPC-FineTuning-Examples/raw/main/Sentiment%20analysis/Twitter_Data.csv
To run sentiment_analysis_bert_train.py within the SLURM scheduler environment copy and modify following script to run_sentiment_analysis.sh and submit job to a compute node by command sbatch run_sentiment_analysis.sh
.
#!/bin/bash
#SBATCH --account= # Project number
#SBATCH --job-name= # Name of the job in SLURM
#SBATCH --partition=gpu # Selected partition
#SBATCH --gres=gpu:1 # Number of GPUs
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node= # Number of MPI ranks
#SBATCH --cpus-per-task= # Number of OMP threads per MPI rank
#SBATCH --output job%J.out # Standard output
#SBATCH --error job%J.err # Standard error
# Load modules
module purge
module load singularity
# Disable users python packages
export PYTHONNOUSERSITE=1
# Execute train.py
singularity exec --nv /storage-data/singularity_containers/pt-2.3_llm.sif python3 sentiment_analysis_bert_train.py --batch_size 64
Note
The dataset used in this project is obtained from Kaggle. You can access the dataset at Twitter Sentiment Dataset. Customize paths and file names as necessary based on your project structure and requirements.