From b7f57d55d64b88e1b73163f2333d3401e3852935 Mon Sep 17 00:00:00 2001 From: mngom2 <50598247+mngom2@users.noreply.github.com> Date: Thu, 8 Feb 2024 10:03:07 -0600 Subject: [PATCH 01/10] Create gpytorch_on_Polaris.md Gpytorch on Polaris --- gpytorch_on_Polaris.md | 107 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 107 insertions(+) create mode 100644 gpytorch_on_Polaris.md diff --git a/gpytorch_on_Polaris.md b/gpytorch_on_Polaris.md new file mode 100644 index 0000000..796f955 --- /dev/null +++ b/gpytorch_on_Polaris.md @@ -0,0 +1,107 @@ +# Guide of Login to Polaris Load Environments, Run Jobs, and Install GPytorch + +#### 1. Login and queue a job +Login to Polaris +``` +ssh alcfusername@polaris.alcf.anl.gov +``` + + +Start a sessiom, for example Interactive ssh Session on a Compute Node +``` +qsub -I -l nodes=1:ppn=4 -l walltime=1:00:00 -q debug -l filesystems=eagle:home -A datascience +``` + where: +* -A for project name +* -q for single-gpu or full-node. If using full-node, it would be better to add `-M ` as well, so that you will receive an email when your job is starting. +* -n number of resources (n gpu or n node) +* -t number of minutes you want to have +* -I indicates an interactive session. One can also remove -I and specify a executable bash script for it to run directly on the compute node + +If queueing for an interactive session, once it is running, we can use `qstat -u ` **on a service node** to see our job id and allocated node. + + +#### 2. Once on a Compute Node, Load Modules + +Load the Conda Environment (Module) with Pytorch, since our GPytorch has Pytorch dependency +``` +module load conda/pytorch +conda activate +``` +* Notice that we can check the available modules with "module avail" and check the loaded modules with "module list" + +Create a virtual environment with python and activate it +``` +python -m venv --system-site-packages path_to_myenv +source path_to_myenv/bin/activate +``` +Now the bash prompt should show that we're in the environment we just created, and we're good to use pip install +``` +pip install gpytorch==version +``` + +### After the First Time +After the first time, to run the files, simply activate the python_venv on a compute node with +``` +module load conda +source path_to_myenv/bin/activate +``` + + +## Using Jupyter Notebook to Run GPytorch on Polaris +Here is the guide: + +### Approach 1 - Use Jupyter Hub +1. Go to [Jupyter Hub of ALCF](https://jupyter.alcf.anl.gov/), click Login Polaris +2. Queue up on a debug node + +**For the first time only, one needs to set up the environment and Kernel by follow these extra steps** + +4. Once Jupyter Notebook is launched on a compute node, click "New" and open a **terminal** +5. Run +``` +module load conda +conda activate +source /bin/activate +python -m ipykernel install --user --name python_venv +``` +Note: depending on the system and environment, you might need to install the "ipykernel" package first. The python_venv that I just created has the ipykernel module. + +Go back to your .ipynb file, change kernel to python_venv from the dropdown menu, and we'll be good to run GPytorch! + +### Approach 2 - Use ssh tunnel +To use ssh tunnel, we first need to be in an interactive session on a compute node. See Part 1, "Log in and queue a job" for more details on this. + +After on a compute node, follow these steps: +1. On the compute node terminal, do +``` +module load conda +conda activate +jupyter notebook +``` +You should see a line like `http://localhost:XXXX/`, where XXXX is the port number that jupyter notebook is launched on the compute node, usually the default 8888. If it is not 8888, replace 8888 in the following with your port number. + +2. Then, on a **new, local terminal**, do +``` +export PORT_NUM=8889 +ssh -L $PORT_NUM:localhost:8888 +ssh -L 8888:localhost:8888 your_compute_node + + +3. Finally, navigate to localhost:8889 in your browser, and you should see a jupyter notebook +Notice that for the first time doing this, one might need to input some password or weird key. Just follow the direction on that page. + +(Essentially, the above steps, using ssh, sets the local port 8889 to listen to the allocated compute node port 8888 which we initiated a jupyter notebook.) + +**For the first time only, one needs to set up the environment and Kernel by follow these extra steps** + +Click "New" and open a **terminal**, and run +``` +module load conda +conda activate +source /bin/activate +python -m ipykernel install --user --name python_venv +``` +Note: depending on the system and environment, you might need to install the "ipykernel" package first. The python_venv that I just created has the ipykernel module. + +Go back to your .ipynb file, change kernel to python_venv from the dropdown menu, and we'll be good to run GPytorch! From f4d01579c840fb4a9ab3e6d60391097be6492b9d Mon Sep 17 00:00:00 2001 From: mngom2 <50598247+mngom2@users.noreply.github.com> Date: Thu, 8 Feb 2024 10:04:41 -0600 Subject: [PATCH 02/10] Update gpytorch_on_Polaris.md --- gpytorch_on_Polaris.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/gpytorch_on_Polaris.md b/gpytorch_on_Polaris.md index 796f955..aed3286 100644 --- a/gpytorch_on_Polaris.md +++ b/gpytorch_on_Polaris.md @@ -86,9 +86,10 @@ You should see a line like `http://localhost:XXXX/`, where XXXX is the port numb export PORT_NUM=8889 ssh -L $PORT_NUM:localhost:8888 ssh -L 8888:localhost:8888 your_compute_node +navigate to localhost:8889 in your browser +``` - -3. Finally, navigate to localhost:8889 in your browser, and you should see a jupyter notebook +You should see a jupyter notebook Notice that for the first time doing this, one might need to input some password or weird key. Just follow the direction on that page. (Essentially, the above steps, using ssh, sets the local port 8889 to listen to the allocated compute node port 8888 which we initiated a jupyter notebook.) From 21bf1b3aec2f7ab0e098b4c02178f7aba337640a Mon Sep 17 00:00:00 2001 From: mngom2 <50598247+mngom2@users.noreply.github.com> Date: Thu, 8 Feb 2024 10:22:33 -0600 Subject: [PATCH 03/10] Create gpytorch_on_aurora Aurora --- gpytorch_on_aurora | 54 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) create mode 100644 gpytorch_on_aurora diff --git a/gpytorch_on_aurora b/gpytorch_on_aurora new file mode 100644 index 0000000..2b489a7 --- /dev/null +++ b/gpytorch_on_aurora @@ -0,0 +1,54 @@ +# Guide of Login to Aurora Load Environments, Run Jobs, and Install GPytorch + +#### 1. Login and queue a job +Login to Aurora +``` +ssh @bastion.alcf.anl.gov +``` +Then, after entering your passcode +ssh @login.aurora.alcf.anl.gov + +(We are supposion you already set the environment variables that provide access to the proxy host. ) + +Start a sessiom, for example Interactive ssh Session on a Compute Node +``` +qsub -I -q EarlyAppAccess -l select=1,walltime=60:00 -A Aurora_deployment +``` + + + +#### 2. Once on a Compute Node, Load Modules + +``` +module use /soft/modulefiles +module load frameworks +python3 -m venv --system-site-packages env_gpytorch +source env_gpytorch/bin/activate +python3 -m pip install gpytorch +``` + +Create a 'activation_env.sh' file that contains +``` +module use /soft/modulefiles +module load frameworks +source env_gpytorch/bin/activate +``` +and do `source activation_env.sh` to activate your environment for subsequent runs. + +## Running on GPUs +To run on GPUs, one needs to add +``` +import intel_extension_for_pytorch as ipex + ``` +and +set the device as follows: + +``` +if torch.cuda.is_available(): + device = torch.device('cuda') +elif torch.xpu.is_available(): + device = torch.device('xpu') +else: + device = torch.device('cpu') +``` +(One might need to install an earlier version of GPytorch for multiple GPUs running.) From bb8399802860621d0077d5623ff8dedafcb25c72 Mon Sep 17 00:00:00 2001 From: mngom2 <50598247+mngom2@users.noreply.github.com> Date: Thu, 8 Feb 2024 10:23:20 -0600 Subject: [PATCH 04/10] Rename gpytorch_on_aurora to gpytorch_on_aurora.md --- gpytorch_on_aurora => gpytorch_on_aurora.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename gpytorch_on_aurora => gpytorch_on_aurora.md (100%) diff --git a/gpytorch_on_aurora b/gpytorch_on_aurora.md similarity index 100% rename from gpytorch_on_aurora rename to gpytorch_on_aurora.md From 8bfb0a6664dfc5223e4f5cee6f4dbced1596e822 Mon Sep 17 00:00:00 2001 From: mngom2 <50598247+mngom2@users.noreply.github.com> Date: Thu, 8 Feb 2024 10:23:44 -0600 Subject: [PATCH 05/10] Update gpytorch_on_aurora.md --- gpytorch_on_aurora.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gpytorch_on_aurora.md b/gpytorch_on_aurora.md index 2b489a7..9488dd5 100644 --- a/gpytorch_on_aurora.md +++ b/gpytorch_on_aurora.md @@ -6,7 +6,9 @@ Login to Aurora ssh @bastion.alcf.anl.gov ``` Then, after entering your passcode +``` ssh @login.aurora.alcf.anl.gov +``` (We are supposion you already set the environment variables that provide access to the proxy host. ) From ad708954d6bfa2234b9350d2c45312d95367c2e3 Mon Sep 17 00:00:00 2001 From: mngom2 <50598247+mngom2@users.noreply.github.com> Date: Thu, 8 Feb 2024 10:24:14 -0600 Subject: [PATCH 06/10] Update gpytorch_on_aurora.md --- gpytorch_on_aurora.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gpytorch_on_aurora.md b/gpytorch_on_aurora.md index 9488dd5..9e6504d 100644 --- a/gpytorch_on_aurora.md +++ b/gpytorch_on_aurora.md @@ -38,12 +38,12 @@ source env_gpytorch/bin/activate and do `source activation_env.sh` to activate your environment for subsequent runs. ## Running on GPUs -To run on GPUs, one needs to add +To run on GPUs, one needs to add to code ``` import intel_extension_for_pytorch as ipex ``` and -set the device as follows: +set the device as follows in the code: ``` if torch.cuda.is_available(): From fa6ec3bdfdb579df831f52036d7f0859b4205153 Mon Sep 17 00:00:00 2001 From: mngom2 <50598247+mngom2@users.noreply.github.com> Date: Thu, 8 Feb 2024 14:48:47 -0600 Subject: [PATCH 07/10] Update gpytorch_on_Polaris.md --- gpytorch_on_Polaris.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gpytorch_on_Polaris.md b/gpytorch_on_Polaris.md index aed3286..3d203c7 100644 --- a/gpytorch_on_Polaris.md +++ b/gpytorch_on_Polaris.md @@ -25,7 +25,7 @@ If queueing for an interactive session, once it is running, we can use `qstat -u Load the Conda Environment (Module) with Pytorch, since our GPytorch has Pytorch dependency ``` -module load conda/pytorch +module load conda conda activate ``` * Notice that we can check the available modules with "module avail" and check the loaded modules with "module list" From 6e95539cb7683b7749807b3f2a5c8b790a6c2c50 Mon Sep 17 00:00:00 2001 From: Marieme Ngom <50598247+mngom2@users.noreply.github.com> Date: Tue, 17 Dec 2024 12:16:07 -0600 Subject: [PATCH 08/10] Update gpytorch_on_aurora.md --- gpytorch_on_aurora.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gpytorch_on_aurora.md b/gpytorch_on_aurora.md index 9e6504d..a4ee539 100644 --- a/gpytorch_on_aurora.md +++ b/gpytorch_on_aurora.md @@ -12,7 +12,7 @@ ssh @login.aurora.alcf.anl.gov (We are supposion you already set the environment variables that provide access to the proxy host. ) -Start a sessiom, for example Interactive ssh Session on a Compute Node +Start a session, for example Interactive ssh Session on a Compute Node ``` qsub -I -q EarlyAppAccess -l select=1,walltime=60:00 -A Aurora_deployment ``` From da10df434f2153648f626de3d7542624b3738ba5 Mon Sep 17 00:00:00 2001 From: Marieme Ngom <50598247+mngom2@users.noreply.github.com> Date: Wed, 18 Dec 2024 14:06:11 -0600 Subject: [PATCH 09/10] Update gpytorch_on_aurora.md --- gpytorch_on_aurora.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/gpytorch_on_aurora.md b/gpytorch_on_aurora.md index a4ee539..e9b3728 100644 --- a/gpytorch_on_aurora.md +++ b/gpytorch_on_aurora.md @@ -10,15 +10,13 @@ Then, after entering your passcode ssh @login.aurora.alcf.anl.gov ``` -(We are supposion you already set the environment variables that provide access to the proxy host. ) +(We are supposing you already set the environment variables that provide access to the proxy host. ) Start a session, for example Interactive ssh Session on a Compute Node ``` -qsub -I -q EarlyAppAccess -l select=1,walltime=60:00 -A Aurora_deployment +qsub -I -q [your_Queue] -l select=1,walltime=60:00 -A [your_ProjectName] ``` - - #### 2. Once on a Compute Node, Load Modules ``` @@ -29,6 +27,7 @@ source env_gpytorch/bin/activate python3 -m pip install gpytorch ``` +#### Optional Create a 'activation_env.sh' file that contains ``` module use /soft/modulefiles From 1f42608bafdb3383abfd2841fb1d6b1db2b39fa7 Mon Sep 17 00:00:00 2001 From: Marieme Ngom <50598247+mngom2@users.noreply.github.com> Date: Wed, 18 Dec 2024 14:08:22 -0600 Subject: [PATCH 10/10] Update gpytorch_on_aurora.md --- gpytorch_on_aurora.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gpytorch_on_aurora.md b/gpytorch_on_aurora.md index e9b3728..f1091d7 100644 --- a/gpytorch_on_aurora.md +++ b/gpytorch_on_aurora.md @@ -10,7 +10,7 @@ Then, after entering your passcode ssh @login.aurora.alcf.anl.gov ``` -(We are supposing you already set the environment variables that provide access to the proxy host. ) +(We are supposing you already set the environment variables that provide access to the proxy host. Go to docs/aurora/getting-started-on-aurora.md for more information) Start a session, for example Interactive ssh Session on a Compute Node ```