Fine-tuning GPT3 using OpenAI API
Overview
Teaching: 15 min min
Exercises: 0 minQuestions
How to train GPT3 using OpenAI API Platform
Objectives
Introduction
- GPT3 is very popular nowaday and OpenAI has option for user to finetune it in their cloud. This tutorial helps you step by step on how to train GPT3 using OpenAI API.
- You can download Q&A data to csv, tsv format and later convert it to jsonl format
- This is a PhD project where we will be fine-tuning GPT3 using OpenAI API from Linux terminal, with given API token.
Request OpenAI API Key
- First go to https://openai.com/api/
- Sign up for an account
- With newly created account, you will be having $18 credit to use for 3 months.
- The pricing to use GPT3 for fine-tuning and application can be found here
- On top right, click on your user account and Request the API key and save it somewhere. Note, you can request as many keys as wanted but using the latest key only
Download the data as CSV format
- Prepare the data, for example from here as csv format with 2 columns, one for question and one for answer.
- Format of the csv can be as following with prompt and completion for Q&A.
- Sommetime, prompt can be the header of the paragraph and completion can be the content of the paragraph used to support the prompt.
Using OpenAI API to train GPT3 model.
- Here we will be using Linux terminal to work with OpenAI API.
- You can use either Linux terminal from your M2 account via ssh, Open OnDemand platform or either locally via Anacona Navigator
- I will be showing you both ways:
Using Anaconda Navigator:
- Once you download and install Anaconda Navigator to your Windows, Macs, Linux, open it and select CMD.exe Prompt:
- The command line interface (CLI) appears:
- Now install openai using command:
pip install openai
- Next, go to where you save the csv data, prepared from the previous step. For example, you save the file “Westminster_Catechism.csv” to “c:\SMU\PROJECTS\DrewDickens", then in the command prompt, type:
cd c:/SMU/PROJECTS/DrewDickens/
- Use OpenAI API to convert csv file to jsonl format, select “Y” for all question
$ openai tools fine_tunes.prepare_data -f Westminster_Catechism.csv
-
You will see the new file with jsonl extension created: “Westminster_Catechism_prepared.jsonl”
-
Set your API key with abc is the API key retrieved above
set OPENAI_API_KEY=abc
- Fine-tuning GPT3 using ada model:
openai api fine_tunes.create -t Westminster_Catechism_prepared.jsonl -m ada
if your model is completed, it should show this:
- Use the trained model with any question:
openai api completions.create -m ada:ft-personal-2022-12-07-17-10-04 -p "What are the punishments of sin in the world to come?"
Using M2
It is very much the same as using ManeFrame 2 or any other HPC to fine-tune GPT3 with openai api.
- After login to M2, Request a compute node:
$ srun -N1 -p standard-mem-s -c2 --mem=5G --pty $SHELL
- Load python
module load python/3
- Create conda environment with python 3.8
$ conda create -n openai python==3.8
- then activate the conda environment to install openai
$ source activate openai
$ pip install openai
- set the API Key
export OPENAI_API_KEY=abc
- Next, go to where you save the csv data, prepared from the previous step. For example, you save the file “Westminster_Catechism.csv” to “/work/users/tuev”, then in the command prompt, type:
cd /work/users/tuev
- Use OpenAI API to convert csv file to jsonl format, select “Y” for all question
$ openai tools fine_tunes.prepare_data -f Westminster_Catechism.csv
-
You will see the new file with jsonl extension created: “Westminster_Catechism_prepared.jsonl”
-
Fine-tuning GPT3 using ada model:
openai api fine_tunes.create -t Westminster_Catechism_prepared.jsonl -m ada
- Use the trained model with any question:
openai api completions.create -m ada:ft-personal-2022-12-07-17-10-04 -p "What are the punishments of sin in the world to come?"
Key Points
OpenAI, API, Python