Skip to main content

3. Train Adaptive Router

PartiNet's architecture requires a two-step training regime. This step assumes you have already trained PartiNet to identify particles. Next you will train PartiNet's adaptive router to discriminate between hard and easy micrographs for real-time routing during detection. This step is almost identical to the previous training step, except that your model weight MUST be a .pt file from step 1.

train workflow

Quick Start

Apptainer/Singularity
apptainer exec --nv --no-home \
-B /data oras://ghcr.io/wehi-researchcomputing/partinet:main-singularity partinet train step2 \
--weight /data/partinet_trainstep1/exp/weights/last.pt \
--data /data/cryo_training.yaml \
--project /data/partinet_trainstep2 \
--epochs 10
Docker
docker run --gpus all -v /data:/data \
ghcr.io/wehi-researchcomputing/partinet:main partinet train step2 \
--weight /data/partinet_trainstep1/exp/weights/last.pt \
--data /data/cryo_training.yaml \
--project /data/partinet_trainstep2 \
--epochs 10
Local Installation
partinet train step2 \
--weight /data/partinet_trainstep1/exp/weights/last.pt \
--data /data/cryo_training.yaml \
--project /data/partinet_trainstep2 \
--epochs 10

Parameters

Required Parameters

ParameterDescription
--weightPath to the trained weights file from Step 1. You must use the .pt file from your completed Step 1 training (e.g., last.pt). The pre-trained public weights cannot be used for this step.
--dataPath to the YAML configuration file containing your training dataset information (see Data Preparation for format details).
--projectOutput directory where training results, checkpoints, and logs will be saved.
--epochsNumber of training epochs. For training the adaptive router, this only needs to be between 10-20 epochs.

Optional Parameters

ParameterDefaultDescription
--workers8Number of data loading workers. Adjust based on your CPU cores and I/O performance.
--deviceNoneSpecify GPU device (e.g., 0, 1, or 0,1 for multiple GPUs). If not specified, uses all available GPUs.
--batch16Training batch size. If you encounter out-of-memory (OOM) errors, reduce this value (try 8 or 4).
Step 2 Requirement

The --weight parameter for Step 2 training must point to a checkpoint from Step 1 training (e.g., last.pt or best.pt). You cannot use the pre-trained public weights or train from scratch for this step.

Training Duration

Step 2 training is much slower than Step 1. You only need 10-20 epochs to train the adaptive router effectively. Using more epochs is unlikely to improve performance significantly.

Training Output

See Training Output Reference for details about the files generated during training, monitoring progress, and resuming interrupted runs.

To resume this step from a checkpoint:

Apptainer/Singularity
apptainer exec --nv --no-home \
-B /data oras://ghcr.io/wehi-researchcomputing/partinet:main-singularity partinet train step2 \
--weight /data/partinet_trainstep2/exp3/weights/last.pt \
--data /data/cryo_training.yaml \
--project /data/partinet_trainstep2 \
--epochs 10
Docker
docker run --gpus all -v /data:/data \
ghcr.io/wehi-researchcomputing/partinet:main partinet train step2 \
--weight /data/partinet_trainstep2/exp3/weights/last.pt \
--data /data/cryo_training.yaml \
--project /data/partinet_trainstep2 \
--epochs 10
Local Installation
partinet train step2 \
--weight /data/partinet_trainstep2/exp3/weights/last.pt \
--data /data/cryo_training.yaml \
--project /data/partinet_trainstep2 \
--epochs 10