Fine-tuning GPT3 using OpenAI API

Overview

Teaching: 15 min min
Exercises: 0 min

Questions

How to train GPT3 using OpenAI API Platform

Objectives

Introduction

GPT3 is very popular nowaday and OpenAI has option for user to finetune it in their cloud. This tutorial helps you step by step on how to train GPT3 using OpenAI API.
You can download Q&A data to csv, tsv format and later convert it to jsonl format
This is a PhD project where we will be fine-tuning GPT3 using OpenAI API from Linux terminal, with given API token.

First go to https://openai.com/api/
Sign up for an account
With newly created account, you will be having $18 credit to use for 3 months.
The pricing to use GPT3 for fine-tuning and application can be found here
On top right, click on your user account and Request the API key and save it somewhere. Note, you can request as many keys as wanted but using the latest key only

Prepare the data, for example from here as csv format with 2 columns, one for question and one for answer.
Format of the csv can be as following with prompt and completion for Q&A.
Sommetime, prompt can be the header of the paragraph and completion can be the content of the paragraph used to support the prompt.

Here we will be using Linux terminal to work with OpenAI API.
You can use either Linux terminal from your M2 account via ssh, Open OnDemand platform or either locally via Anacona Navigator
I will be showing you both ways:

Once you download and install Anaconda Navigator to your Windows, Macs, Linux, open it and select CMD.exe Prompt:

pip install openai

Next, go to where you save the csv data, prepared from the previous step. For example, you save the file “Westminster_Catechism.csv” to “c:\SMU\PROJECTS\DrewDickens", then in the command prompt, type:

cd c:/SMU/PROJECTS/DrewDickens/

$ openai tools fine_tunes.prepare_data -f Westminster_Catechism.csv

You will see the new file with jsonl extension created: “Westminster_Catechism_prepared.jsonl”
Set your API key with abc is the API key retrieved above

set OPENAI_API_KEY=abc

openai api fine_tunes.create -t Westminster_Catechism_prepared.jsonl -m ada

if your model is completed, it should show this:

openai api completions.create -m ada:ft-personal-2022-12-07-17-10-04 -p "What are the punishments of sin in the world to come?"

It is very much the same as using ManeFrame 2 or any other HPC to fine-tune GPT3 with openai api.

$ srun -N1 -p standard-mem-s -c2 --mem=5G --pty $SHELL

module load python/3

$ conda create -n openai python==3.8

$ source activate openai
$ pip install openai

export OPENAI_API_KEY=abc

Next, go to where you save the csv data, prepared from the previous step. For example, you save the file “Westminster_Catechism.csv” to “/work/users/tuev”, then in the command prompt, type:

cd /work/users/tuev

$ openai tools fine_tunes.prepare_data -f Westminster_Catechism.csv

You will see the new file with jsonl extension created: “Westminster_Catechism_prepared.jsonl”
Fine-tuning GPT3 using ada model:

openai api fine_tunes.create -t Westminster_Catechism_prepared.jsonl -m ada

openai api completions.create -m ada:ft-personal-2022-12-07-17-10-04 -p "What are the punishments of sin in the world to come?"

Key Points

OpenAI, API, Python