The Ultimate Beginner’s Guide to FAU HPC: From Zero to A100

So, you’ve received an invitation to use the High-Performance Computing (HPC) cluster at FAU (likely a Tier3 project). You want to run Deep Learning, VLM, or RL experiments, but you are staring at a black terminal screen and don't know where to start.
Don't worry. I went through the exact same struggle—configuring SSH keys, getting "Permission Denied" errors, and wondering why PyTorch couldn't see the GPU.
This guide will take you from receiving the email to training on an NVIDIA A100, step-by-step.
Step 1: Accept the Invitation
Log in to the FAU HPC Portal using your standard IdM credentials (e.g.,
abc1234d).Go to User -> Your Invitations and accept the project invitation.
Wait overnight. The system runs a synchronization script every night. You usually cannot log in immediately after accepting; your home directory needs time to be created.
Step 2: Generate and Upload Your SSH Key
HPC systems don't use passwords; they use "keys." You need to generate a lock (Public Key) and a key (Private Key).
For Windows (PowerShell) / Mac / Linux:
Open your terminal and run:
ssh-keygen -t rsa -b 4096
Press Enter to save it in the default location.
Press Enter twice to skip setting a password (useful for automation).
Display your public key:
Windows:
type %userprofile%\.ssh\id_rsa.pubMac/Linux:
cat ~/.ssh/id_rsa.pub
Copy everything (starting with
ssh-rsaand ending with your username).
Upload to Portal:
Go back to the HPC Portal.
Go to User -> Your Accounts.
Find your HPC account (e.g.,
abc1234d), click on it, and paste your key into the "Add new SSH Key" section.Wait 15-20 minutes for the key to sync to the servers.
Step 3: Configure VS Code (The "Pro" Setup)
Do not try to use the raw terminal for everything. Use VS Code with the Remote - SSH extension. It allows you to edit code on the server as if it were on your laptop.
The "ProxyJump" Trick
Direct access to GPU nodes (like tinyx) is often blocked from outside the university network. We need to jump through a "gatekeeper" server called csnhr.
In VS Code, install the Remote - SSH extension.
Click the blue
><icon (bottom left) -> Open Configuration File -> Select your.ssh/config.Paste the following configuration (Replace
abc1234dwith YOUR HPC username):
# 1. The Gatekeeper (Jump Host)
Host csnhr
HostName csnhr.nhr.fau.de
User abc1234d
IdentityFile ~/.ssh/id_rsa
IdentitiesOnly yes
PasswordAuthentication no
# 2. Woody (CPU Frontend - Good for data transfer)
Host woody
HostName woody.nhr.fau.de
User abc1234d
ProxyJump csnhr
IdentityFile ~/.ssh/id_rsa
IdentitiesOnly yes
# 3. TinyX (Tier3 GPU Frontend - RUN YOUR EXPERIMENTS HERE)
Host tinyx
HostName tinyx.nhr.fau.de
User abc1234d
ProxyJump csnhr
IdentityFile ~/.ssh/id_rsa
IdentitiesOnly yes
Save the file.
Click the blue
><icon -> Connect to Host -> Select tinyx.
Step 4: Know Your Territory ($HOME vs $WORK)
Once logged in, you need to know where to put your files. This is the most common mistake beginners make.
$HOME (
/home/hpc/...):Size: Very small (100GB).
Use for: Config files, scripts, source code.
NEVER put: Datasets, Conda environments, or Model checkpoints here. You will run out of space immediately.
$WORK (
/home/woody/...):Size: Huge (1TB+?).
Use for: EVERYTHING BIG. Install Miniforge here. Download datasets here.
How to find it: Run
echo $WORKin the terminal.
Always switch to WORK before doing anything:
cd $WORK
Step 5: Setting Up the Environment (The Right Way)
Do not use the default Python. Do not use Anaconda (it's too bloated). Use Miniforge.
Download and Install (in the terminal on
tinyx):cd $WORK wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh bash Miniforge3-Linux-x86_64.shCrucial: When asked for the installation path, ensure it says
/home/woody/.... If it says/home/hpc/..., edit it manually!Type
yesto initialize.
Restart your terminal (close and reopen the terminal pane in VS Code).
Create your Environment: (e.g., vlm_env)
mamba create -n vlm_env python=3.10 mamba activate vlm_env
Step 6: Installing PyTorch (The Trap!)
Here is where many fail.
Compute nodes (GPUs) have NO INTERNET. You must install packages on the Login Node (
tinyx).Mamba sometimes defaults to CPU versions. Use
pipto force the CUDA version.
The Golden Command (run this on tinyx):
mamba activate vlm_env
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers accelerate bitsandbytes
Note: We verify installation on the Login Node, but we run it on the Compute Node.
Step 7: Accessing GPUs
You are currently on tinyx (a shared login node). DO NOT run training here. You must request a Compute Node.
Option A: Interactive Mode (Debugging/Testing)
Use salloc to get a GPU for a short time (e.g., 30 mins).
To check what GPUs are available:
sinfo -o "%20P %G"
(You might see partitions like a100, v100, rtx3080).
To request an A100 (The Beast):
salloc --partition=a100 --gres=gpu:a100:1 --time=00:30:00
To request a V100 (Reliable):
salloc --partition=v100 --gres=gpu:v100:1 --time=00:30:00
Once inside (prompt changes to tgXXX), reactivate your environment and test:
mamba activate vlm_env
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}'); print(f'Device: {torch.cuda.get_device_name(0)}')"
If it says True and NVIDIA A100, you win!
Option B: Batch Jobs (Real Training)
For long training runs (e.g., 24 hours), create a script called run.sh:
#!/bin/bash
#SBATCH --job-name=vlm_train
#SBATCH --output=logs/%j.out
#SBATCH --partition=a100 # or v100
#SBATCH --gres=gpu:a100:1 # or gpu:v100:1
#SBATCH --time=24:00:00
source $WORK/miniforge3/bin/activate vlm_env
python train.py
Submit it with: sbatch run.sh
Summary Cheat Sheet
Connect: VS Code ->
tinyx.Workspace:
cd $WORK.Install: Run installs on
tinyx(Login Node).Debug:
salloc ...to get an interactive GPU.Train:
sbatch run.shfor long jobs.
Good luck with your experiments! 🚀




