This lesson is being piloted (Beta version)

SMU Research Computing

Using ArcPy to partition grid into smaller boxes

Overview

Teaching: 5 min min
Exercises: 0 min
Questions
  • Using ArcPy to divide the shapefile into smaller boxes

Objectives

1. Using ArcPy - a Python API from ArcGIS to create input GIS data

Introduction

Step 1: Create the boundary

image

Step 2: Split the big polygon into smaller 49mi2 grid:

image

image

import geopandas as gpd
dir = "/home/tuev/Projects/Makris/GIS/"
shape = gpd.read_file(dir+"DFW77.shp")
for i in shape.PageName:
    shapeout = shape[shape.PageName==i]
    shapeout.to_file(dir+i+".shp")

Step 4: Apply the same Grid Index Feature to split 300 grids to 0.16mi2 grid

# Import modules
import arcpy, os
from arcpy import env
import numpy as np
from zipfile import ZipFile

# Set environment settings to folders:
dir="c:/SMU/PROJECTS/Makris_cellphone/GIS/Miami/49mi2/" # <== This needs to be changed and make sure "/" is used instead of "\"
os.chdir(dir)

os.mkdir("../output0404")
arcpy.env.workspace = dir
output_folder = "../output0404/"

# Create the list of name of unique shapefile:
List1 = os.listdir(dir)
List2 = list()
for i in List1:
    pathname,extension = os.path.splitext(dir+i)
    filename = pathname.split('/')
    List2.append(filename[-1])

FinalList = np.unique(List2)

# Create the output folder output0404 and use ArcPy to generate the files

for i in FinalList:
    print(i)
    #Set local variables
    outFeatureClass = output_folder+i
    inFeatures = i
    
    polygonWidth = "0.4 miles"
    polygonHeight = "0.4 miles"
    
    # Execute GridIndexFeatures:
    arcpy.GridIndexFeatures_cartography(outFeatureClass,inFeatures,"","","",
                                        polygonWidth,polygonHeight)

The following grids are created:

image

Step 5: Zip the 1500 files into 300 zip files as input request for Vista Nears using the same python notebook from Step 4

os.chdir(dir+output_folder)
for i in FinalList:
    with ZipFile(i+'_Miami.zip','w') as zipObj:
        zipObj.write(i+'.shp')
        zipObj.write(i+'.shx')
        zipObj.write(i+'.dbf')
        zipObj.write(i+'.sbn')
        zipObj.write(i+'.sbx')
        zipObj.write(i+'.prj')

It is ready to use these zipped shapefile image

Key Points

  • ArcPy, ArcGIS Pro


Uploading GIS file to request download from NEAR using API

Overview

Teaching: 5 min min
Exercises: 0 min
Questions
  • How to upload GIS data to NEAR website using API?

Objectives

2. Uploading GIS data automatically to NEAR website for job creation

Introduction

POSTMAN

image

image

image

image

https://uberretailapi.uberads.com/v1/uberretailapi/createJobWithFile

image

{"pipReportType":"PIN_REPORT",
"reportName":"F17_2021Q1",
"polygonInputOptions": { "polygonFormat": "ESRI_SHAPEFILE_ZIP",
"polygonNameAliasElement": "PageName" },
"startDateTime": "2021-01-01 00:00:00",
"endDateTime": "2021-03-31 23:59:59"
}

image

Note that: the reportName can be changed to match with the input shapefile. The polygonNameAliasElement=”PageName” is fixed with the shapefile variable names The start and end DateTime can be altered

image

Python

image

import requests
import os
import json

url = "https://uberretailapi.uberads.com/v1/uberretailapi/createJobWithFile"
headers = {'Authorization': 'Bearer ********'}    
j=0
n=1
dir1 = '/work/group/makris_lab/GIS/shapefile_zip/DFW/'
listfile = os.listdir(dir1)
while j<=len(listfile)-1:    
    dict1 = dict({"pipReportType":"PIN_REPORT",
                  "reportName":f"{listfile[j]}",
                  "polygonInputOptions": { "polygonFormat": "ESRI_SHAPEFILE_ZIP","polygonNameAliasElement": "PageName" },
                  "startDateTime": "2021-03-01 00:00:00",
                  "endDateTime": "2021-03-31 23:59:59"})
    
    payload = {'jsonRequest':str(dict1).replace("'",'"')}
    files=[('polygonFile',(f"{listfile[j]}",open(f'{dir1}{listfile[j]}','rb'),'application/zip'))]    
    response = requests.request("POST", url, headers=headers, data=payload, files=files)            
    if "True" in str(json.loads(response.text).values()):
        print("Succeeded. Submitting job to download ",listfile[j])
        j+=1
        n=1
    else:
        print("Failure. Resubmitting job ", listfile[j], " ", n,  " times")
        n+=1

Key Points

  • Postman, Python, API, upload


Downloading requested jobs from NEAR using API

Overview

Teaching: 5 min min
Exercises: 0 min
Questions
  • How to automatically download jobs from NEAR website using API?

Objectives

3. Using API and job array to download jobs from NEAR website using M2

Once you submitted the GIS file and request for job download, you can have several options to download these jobs:

Manually Download the submitted jobs:

Using OnDemand web portal (Remote Desktop) and open Firefox, Go to near.com website and signin using Nicos’s username & password then to retrieve the submitted jobs:

https://vista.um.co/users/sign_in

Check if your job has spawned the report or not to download to M2 directory:

image

Automatically download the submitted jobs:

When you have few hundreds to few thousands for jobs to be downloaded, it is encourage to use API to download the data.

Fortunately, there is an option to use API from NEAR to download these jobs automatically.

What do you need?

import requests
import json
from sys import argv

script, rin  = argv
jid = int(rin)

print(jid)
url = ["https://uberretailapi.uberads.com/v1/uberretailapi/getJobStatus?jobId="+str(jid)][0]
print(url)
payload={'jobId': str(jid)}
files=[]
headers = {
  'Authorization': 'Bearer *********'
}
response = requests.request("GET", url, headers=headers, data=payload, files=files)
link = json.loads(response.text)['pipReportResults']['reportUrl']
output =[json.loads(response.text)['reportName'][:-4]+'.tsv.gz'][0]


open(output, "wb").write(requests.get(link).content)
#!/bin/bash
#SBATCH -J download       # job name to display in squeue
#SBATCH --array=1-301
#SBATCH -p dtn      # requested partition
#SBATCH -c 1 --mem=5G
#SBATCH -t 1000              # maximum runtime in minutes
#SBATCH --exclusive
#SBATCH --mail-user tuev@smu.edu
#SBATCH --mail-type=end

rin=3096351
argi=$((rin+$SLURM_ARRAY_TASK_ID))
module load python/3
python download_near_api.py $argi
sbatch jobs.sh
python download_near_api.py 3096580

Key Points

  • Python, API, download


Tunneling Jupyter Lab in SuperPOD

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How to use Jupter Lab in SuperPOD?

Objectives
  • Learn port forwarding technique to enable Jupter Lab

4. Jupter Lab on SuperPOD

The following procedure are for Window and MacOS, Linux

Using Visual Studio Code terminal for any systems:

When not logged into M2, ssh with “-D” for Dynamic Forwarding with port 8080, “-C” for compression, to M2 login node:

$ ssh -C -D 8080 username@m2.smu.edu

Once in login node, request a compute node, load the library and activate conda env as usual, then run Jupyter lab instance:

$ srun -p v100x8 -N1 -c1 --mem=16gb --pty $SHELL
$ module load python/3 
$ conda activate myenv
$ jupyter lab --ip=0.0.0.0 --no-browser

Using Window OS’s MobaXTerm

For Window, I use MobaXTerm (https://mobaxterm.mobatek.net/) and Firefox to configure port-forwarding

Setup in MobaXTerm

Open MobaXTerm and Select Tunneling tab:

image

image

The Graphical port forwarding tool appears, Click on play button

image

The Duo screen appears, enter 1 to authenticate the Duo Once you pass the Duo screen, the port forwarding tool enabled:

image

Leave the port-forwarding screen opened and we switch to Firefox

Setup Firefox to enable proxy viewing (similar for MacOS as well)

Open Firefox, my version is 104.0.2. Use combination Alt+T+S to open up the settings tab. Scroll to bottom and select Settings from Network Settings:

image

Test Proxy

Go back to MobaXTerm and login into SuperPOD using regular SSH Request a compute node with container

$ srun -N1 -G1 -c10 --mem=64G --time=12:00:00 --pty $SHELL

Load cuda, cudnn and activate any of your conda environment, for example Tensorflow_2.9

$ module load spack conda
$ module load cuda-11.4.4-gcc-10.3.0-ctldo35 cudnn-8.2.4.15-11.4-gcc-10.3.0-eluwegp
$ source activate ~/tensorflow_2.9   

Make sure to install jupyter

$ pip install jupyter   

Next insert the following command:

$ jupyter notebook --ip=0.0.0.0 --no-browser
# or
$ jupyter lab --ip=0.0.0.0 --no-browser   

The following screen appears

image

Copy the highlighted URLs to Firefox, you will see Jupyter Notebook port forward to this:

image

Select TensorflowGPU29 kernel notebook and Check GPU device:

image

Key Points

  • Jupter Lab, Port-Forwarding


Using NGC Container in SuperPOD

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How to use NGC Container in SuperPOD?

Objectives
  • Learn how to master NGC Container useage in SuperPOD

5. Using NVIDIA NGC Container in SuperPOD

What is Container?

Docker Container

NVIDIA NGC Container

ENROOT

It is very convenient to download docker and NGC container to SuperPOD. Here I would like to introduce a very effective tool name enroot

Importing docker container to SuperPOD from docker hub

$ enroot import docker://ubuntu
$ enroot create ubuntu.sqsh
$ enroot start ubuntu

#Type ls to see the content of container:
# ls

bin   dev  home  lib32  libx32  mnt  proc  run   srv  tmp    usr
boot  etc  lib   lib64  media   opt  root  sbin  sys  users  var

Exercise

Go to dockerhub, search for any container, for example lolcow then use enroot to contruct that container environment

enroot import docker://godlovedc/lolcow
enroot create godlovedc+lolcow.sqsh
enroot start godlovedc+lolcow

image

Download Tensorflow container

image

The following information was copied to the memory when selecting the 22.12-tf2 version:

nvcr.io/nvidia/tensorflow:22.12-tf2-py3
$ cd $WORK/sqsh
$ enroot import docker://nvcr.io#nvidia/tensorflow:22.12-tf2-py3

The sqsh file nvidia+tensorflow+22.12-tf2-py3.sqsh is created.

$ enroot create nvidia+tensorflow+22.12-tf2-py3.sqsh

Working with NGC container in Interactive mode:

Once the container is import and created into your folder in SuperPOD, you can simply activate it from login node when requesting a compute node:

$ srun -N1 -G1 -c10 --mem=64G --time=12:00:00 --container-image $WORK/sqsh/nvidia+tensorflow+22.12-tf2-py3.sqsh --container-mounts=$WORK --pty $SHELL

Check the GPU enable:

$ python
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Exit the container using exit command.

Working with NGC container in Batch mode

#!/bin/bash
#SBATCH -J Testing       # job name to display in squeue
#SBATCH -o output-%j.txt    # standard output file
#SBATCH -e error-%j.txt     # standard error file
#SBATCH -p batch -c 12 --mem=20G --gres=gpu:1     # requested partition
#SBATCH -t 1440              # maximum runtime in minutes
#SBATCH -D /link-to-your-folder/

srun --container-image=/work/users/tuev/sqsh/nvidia+tensorflow+22.12-tf2-py3.sqsh --container-mounts=$WORK python testing.py
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

Working with NGC container in Jupyter Lab

root@bcm-dgxa100-0001:/workspace# jupyter lab --allow-root --no-browser --ip=0.0.0.0

The following URL appear with its token

Or copy and paste this URL:
        http://hostname:8888/?token=fd6495a28350afe11f0d0489755bc3cfd18f8893718555d2

Note that you must replace hostname to the corresponding node that you are in, this case is bcm-dgxa100-0001.

Therefore, you should change the above address to and paste to Firefox:

http://bcm-dgxa100-0001:8888/?token=fd6495a28350afe11f0d0489755bc3cfd18f8893718555d2

Note: you should select the default Python 3 (ipykernel) instead of any other kernels for running the container.

image

Tip: Once forwarding to Jupter Lab, you are placed in container’s root. It’s recommended to create a symlink for your folder in order to navigate away:

$ ln -s $WORK work

Key Points

  • NGC Container


Fine-tuning GPT3 using OpenAI API

Overview

Teaching: 15 min min
Exercises: 0 min
Questions
  • How to train GPT3 using OpenAI API Platform

Objectives

6. Using OpenAI API to train and verify the AI generative model

Introduction

Request OpenAI API Key

image

image

Download the data as CSV format

image

Using OpenAI API to train GPT3 model.

Using Anaconda Navigator:

image

image

pip install openai
cd c:/SMU/PROJECTS/DrewDickens/
$ openai tools fine_tunes.prepare_data -f Westminster_Catechism.csv
set OPENAI_API_KEY=abc
openai api fine_tunes.create -t Westminster_Catechism_prepared.jsonl -m ada

if your model is completed, it should show this:

image

openai api completions.create -m ada:ft-personal-2022-12-07-17-10-04 -p "What are the punishments of sin in the world to come?"

image

Using M2

It is very much the same as using ManeFrame 2 or any other HPC to fine-tune GPT3 with openai api.

$ srun -N1 -p standard-mem-s -c2 --mem=5G --pty $SHELL
module load python/3
$ conda create -n openai python==3.8
$ source activate openai
$ pip install openai
$ conda install jupyter -y
$ python -m ipykernel install --user --name openai --display-name "OpenAI"
export OPENAI_API_KEY=abc
cd /work/users/tuev
$ openai tools fine_tunes.prepare_data -f Westminster_Catechism.csv
openai api fine_tunes.create -t Westminster_Catechism_prepared.jsonl -m ada
openai api completions.create -m ada:ft-personal-2022-12-07-17-10-04 -p "What are the punishments of sin in the world to come?"

and the output is:

Sin is the loss of righteousness before the throne of God in Christ

More information on fine-tuning OpenAI can be found here

Key Points

  • OpenAI, API, Python


Creating an AI chatbot using HuggingFace pretrained

Overview

Teaching: 15 min min
Exercises: 0 min
Questions
  • How to create an AI chatbot

Objectives

7. Creating an AI chatbot using HuggingFace pretrained in your ManeFrame or SuperPOD

Introduction

image

Download pretrained model from HuggingFace and create your own chatbot

import transformers
nlp = transformers.pipeline("conversational",
                            model="microsoft/DialoGPT-large")

cont=1
while cont==1:
    input_text = input("Ask me a question!")
    print(nlp(transformers.Conversation(input_text), pad_token_id=57007))
    if input("Continue?").lower()=="no":
        print("Goodbye")
        break

Running the model

python chatbot_DialoGPT-large.py

Following is the inference from running the chatbot model:

Downloading config.json: 100%|███████████████████████████████████████████████████████████████████| 1.47k/1.47k [00:00<00:00, 1.40MB/s]
Downloading pytorch_model.bin: 100%|███████████████████████████████████████████████████████████████| 334M/334M [00:06<00:00, 58.0MB/s]
Downloading tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████| 205/205 [00:00<00:00, 171kB/s]
Downloading vocab.json: 100%|██████████████████████████████████████████████████████████████████████| 941k/941k [00:00<00:00, 5.09MB/s]
Downloading merges.txt: 100%|██████████████████████████████████████████████████████████████████████| 337k/337k [00:00<00:00, 2.30MB/s]
Downloading special_tokens_map.json: 100%|██████████████████████████████████████████████████████████| 99.0/99.0 [00:00<00:00, 100kB/s]
Ask me a question!do you believe in God
Conversation id: 7417a2a5-412b-4117-8dd5-7cf2237d5811 
user >> do you believe in God 
bot >> i don't believe in god, but i do believe in the existence of a god. 

Continue?yes
Ask me a question!what is the meaning of life
Conversation id: 421afd4c-8bcc-400a-8e86-5a97a61685a6 
user >> what is the meaning of life 
bot >> i don't know. i'm not sure. what is the purpose of life?

Continue?no

You can exit the conversation anytime by replying “no” to the “Continue” question

How many models can I run?

microsoft/DialoGPT-small
microsoft/DialoGPT-medium
microsoft/DialoGPT-large
facebook/blenderbot_small-90M
facebook/blenderbot-400M-distill
facebook/blenderbot-1B-distill

or even personal model can be used:

vuminhtue/DialoGPT-large-HarryPotter3
rlatt/DialoGPT-large-King-James-Bible-test

Key Points

  • HuggingFace, chatbot


Question Answering with BERT using content

Overview

Teaching: 15 min min
Exercises: 0 min
Questions
  • How to create an AI chatbot with content using BERT

Objectives

8. Question Answering with BERT using content

In this problem we use the given text content and construct Q&A based on that.

8.1 Q&A with Adam and Eve content.

Assuming we have the Adam and Eve’s story as following, saved as Stody1_Adam_Eve.txt

The story of Adam and Eve is a well-known biblical narrative that appears in the Book of Genesis. According to the story, God created Adam, the first man, and placed him in the Garden of Eden, a paradise where all of his needs were provided for. However, God saw that Adam was alone and decided to create a partner for him, so he created Eve, the first woman, from one of Adam's ribs.
Adam and Eve lived in the Garden of Eden and enjoyed a close relationship with God, but they were given one commandment: they were not allowed to eat from the tree of the knowledge of good and evil. However, one day, a serpent came to Eve and convinced her to eat from the forbidden tree, telling her that it would make her wise. Eve ate the fruit and gave some to Adam, who also ate it.
After they ate from the tree, Adam and Eve became aware of their nakedness and were ashamed. They tried to hide from God, but God knew what they had done and cursed them, expelling them from the Garden of Eden and condemning them to a life of toil and hardship. The story of Adam and Eve is often interpreted as an allegory for the fall of humanity and the origin of sin and suffering.

We have list of questions for the model to answer, saved as Questions-Adam.txt

"Who is Adam",
"Where is Garden of Eden?",
"Who invented Apple?",
"Do you believe in God?"

We build the model saves as qabert-Adam-Eve.py

from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering
import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
qa_model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")

def answer_question(question, text):
  # Function that simplifies answering a question
  for question in questions:
    # Concatenate the question and the textx
    inputs = tokenizer(question, text, add_special_tokens = True, return_tensors = 'tf')
    # Get the input ids (numbers) and convert to tokens (words)
    input_ids = inputs["input_ids"].numpy()[0]
    text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
    # Run the pretrained model to get the logits (raw scores) for the scores
    output = qa_model(inputs)

    # Get the most likely beginning and end
    answer_start = tf.argmax(output.start_logits, axis = 1).numpy()[0]
    answer_end = (tf.argmax(output.end_logits, axis = 1)+1).numpy()[0]
    # Turn the tokens from the ids of the input string, indexed by the start and end tokens back into a string
    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

    print("Question {} \nAnswer: {}".format(question, answer))

with open("Story1_Adam_Eve.txt") as f:
    bow1 = f.read()
    
with open('Questions-Adam.txt') as f1:
    q1 = f1.read()
questions = q1.split("\n")    

answer_question(questions, bow1)

The response would be:

image

We can see that the response is very much following the given content, how about we have another story of Apple Inc and asking the same questions?

8.2 Q&A with Apple content.

Here is our content copied from Wikipedia for Apple Inc:

Apple Computers, Inc. was founded on April 1, 1976, by college dropouts Steve Jobs and Steve Wozniak, who brought to the new company a vision of changing the way people viewed computers.
Jobs and Wozniak wanted to make computers small enough for people to have them in their homes or offices. Simply put, they wanted a computer that was user-friendly.
Jobs and Wozniak started out building the Apple I in Jobs' garage and sold them without a monitor, keyboard, or casing (which they decided to add on in 1977). 
The Apple II revolutionized the computer industry with the introduction of the first-ever color graphics. Sales jumped from $7.8 million in 1978 to $117 million in 1980, the year Apple went public.
Wozniak left Apple in 1983 due to a diminishing interest in the day-to-day running of Apple Computers. Jobs then hired PepsiCo's John Sculley to be president. However, this move backfired and after much controversy with Sculley, Jobs left in 1985 and went on to new and bigger things.
He founded his own company NeXT Software and he also bought Pixar from George Lucas, which would later become a huge success in computer animation of such movies as Toy Story, A Bug's Life, Monsters, Inc., and Finding Nemo, but not the bible Adam and Eve

Running similar python file with similar question from Questions-Adam.txt we have following answer:

image

Comparing the same question, “Who invented Apple”, under different context, we have different response from the chatbot model

Key Points

  • Question Answering, chatbot, BERT


Install ArcPy to M2 JHub

Overview

Teaching: 5 min min
Exercises: 0 min
Questions
  • How to install Arcpy to ManeFrame2

Objectives

In order to install ArcPy to ManeFrame 2 HPC, one need to install ArcGIS Server. User need to be added to ESRI account (using smudallas domain) in order to be authorized to run arcpy. SMU users, check with Jessie Zarazaga to be added to ESRI SMU server

Following are the steps to install ArcGIS Server

Step 1. Create a license file:

Go to https://my.esri.com/ and login using SMU username/password. Once done, navigate to My Organization (smudallas) and click on Licensing

image

Select the appropriate license:

image

Download the license file:

image

Save it somewhere, for example: ArcGISImageServer_ArcGISServer_1007035.prvc

image

Step 2. Download ArcGIS Server:

Hover to Downloads tab and select ArcGIS Enterprise Linux to download to M2:

image

Here I download version 11.0 at the time of writing this document: image

Step 3. Setup ArcGIS Server:

Request a compute node, here I use va001 all the time.

$ ssh -X va001

Sometime you may need to change the Soft and Hard limitation of the node using: Check with Amit or Richard should you need help with the node:

$ ulimit -Hu 26000
$ ulimit -Su 26000

Navigate to installation folder and run Setup file:

$ ./Setup

Step 4. Authorize the license file:

There are 2 ways to authorize the license file from step 1.

Method 1: Using GUI:

$ ./authorizeSoftware -s

Method 2: using silent mode.

$ ./ authorizeSoftware <-f .prvc> <-e email> <-o filename.txt>

Upload the filename.txt to esri website (following its instruction) to obtain the license file: authorization.ecp Validate the license:

$ ./authorizeSoftware -f authorization.ecp

Make sure it works:

$ ./authorizeSoftware -s

The GUI appears for you to manually install ArcGIS Server to your /home/username/arcgis directory

Step 5. Install conda environment:

Request a compute node without -X

$ module load python/3
$ conda create -n arcpy_env -c esri arcgis-server-py3=11.0
$ export ARCGISHOME=/work/users/tuev/arcgis/server
$ source activate arcpy_env
$ import arcpy

Test to make sure it works

Step 6. Create Jupyter Kernel:

$ source activate arcpy_env
$ conda install -y -c conda-forge kernda
$ python -m ipykernel install --user --name arcpy_env --display-name "ArcPy11"
$ kernda /users/tuev/.local/share/jupyter/kernels/arcpy_env/kernel.json -o

Modify /users/tuev/.local/share/jupyter/kernels/arcpy_env/kernel.json and make sure the following lines are added:

{
  "argv": [
    "bash",
    "-c",
    "source \"/software/spackages/linux-centos8-x86_64/gcc-8.3.1/anaconda3-5.1.0-c3p5et4cpo7jaiahacqa3pqwhop7tiik/bin/activate\" \"/home/tuev/.conda/envs/arcpy1\" && exec /home/tuev/.conda/envs/arcpy1/bin/python -m ipykernel_launcher -f '{connection_file}' "
  ],
  "env": {"ARCGISHOME":"/work/users/tuev/arcgis/server"},
  "display_name": "MyArcPy510",
  "language": "python"
}

Step 7. Use Port-Forwarding and Request for Jupyter Notebook running on M2/SuperPOD

image

Key Points

  • ArcPy