Guideline
Welcome to the TADD repository >_<
All pretrained checkpoints are available through the button below.
Abnormal Detection Benchmark
To conduct abnormal detection benchmarks on the TADD dataset using STMixer, please follow the official instructions below:
-
Clone the STMixer repository
Clone the official STMixer repository to your local machine:
git clone https://github.com/MCG-NJU/STMixer.git
-
Prepare the environment
Install the required dependencies as outlined in the STMixer documentation (compatible with PyTorch, CUDA, etc.). -
Download and configure YAML files
We provide officially curated YAML configuration files tailored for TADD abnormal detection:- TADD Abnormal Detection YAMLs – Designed for detecting anomalies in cabin and platform settings
data_root
: Path to your local TADD dataset directorybatch_size
: Number of samples per batch (e.g., 8, 16) based on your hardware capacitynum_workers
: Number of workers for data loading (e.g., 4 or 8 depending on CPU cores)pretrained
: Path to pretrained weights (optional, use checkpoints from our website)
-
Modify configuration for TADD
Within the selected YAML file, please update the following parameters:data_root
: Set to your local TADD dataset directoryann_file_train
/ann_file_val
: Set to the paths of the downloaded TADD annotation files from the GitHub repository belowpretrained
: Point to the correct checkpoint from the URL below if using our weightsinput_size
: Adjust to match TADD video resolution (e.g., 224x224)
-
Download pretrained weights (optional)
You may use the pretrained weights provided on our project website:
https://yanc3113.github.io/TADD/checkpoints.html
These checkpoints are compatible with the STMixer configurations listed above. -
Download TADD annotation files
Official annotation files for abnormal detection are hosted on GitHub:
https://github.com/Yanc3113/TADD/.../AnomalyDetection/annotations
These files include labels for normal and abnormal events within TADD. Developers are encouraged to customize the labels based on their specific anomaly detection needs. -
Launch training or evaluation
Use the following command to start evaluation:
python tools/test.py --config configs/tadd_abnormal/cabin_abnormal.yaml --checkpoint path/to/checkpoint.pth
Or for training:
python tools/train.py --config configs/tadd_abnormal/cabin_abnormal.yaml
-
Hardware & environment
All experiments were conducted using:- 4 × NVIDIA A100 (L20) GPUs
- CUDA ≥ 11.6, PyTorch ≥ 1.12
- Python ≥ 3.8
torch.distributed.launch
ortorchrun
interfaces.
We encourage the community to adapt and extend the STMixer model for custom abnormal detection tasks using the TADD dataset.
Few-shot Learning Benchmark
To conduct few-shot action recognition (FSAR) benchmarks on the TADD dataset, please follow the official instructions below:
-
Clone the baseline repositories
We support benchmarking with the following few-shot learning frameworks:- MoLo (Alibaba): Transformer-based few-shot learner with motion-level modeling.
- Knowledge Prompting for FSAR: Meta-learning framework with semantic prompting and temporal modeling.
-
Frame rate and temporal settings
The TADD dataset is provided at a standard 30 FPS (frames per second). This is directly related to the following key configuration parameters in YAML:clip_len
: Number of frames per video clip (e.g., 8, 16, 32)frame_interval
: Interval between consecutive frames (e.g., 1, 2)num_segments
: Used for TSN-style temporal sampling (optional)
clip_len: 16
andframe_interval: 2
, the total clip covers ≈ 1.0 second of real time. -
Reference YAML configuration files
We provide officially curated few-shot configuration files tailored for TADD:- MoLo YAMLs – Focused on 3-way / 5-way / N-way × K-shot settings
- FSAR YAMLs – Designed for semantic-prompt-based learning
n_shots
: Number of support samples per class (e.g., 1, 5)n_ways
: Number of classes per episode (e.g., 3, 5)query_per_class
: Number of query samples used per class for evaluationmax_epochs
: Number of training iterations (optional)video_format
: Eitherrgb
orrgb+depth
for TADDnum_workers
: Number of workers per dataloaderbackbone
: e.g., ViT, MoLo, or TSN
-
Precomputed features and custom generation
Due to GitHub storage limitations, the raw image dataset extracted from TADD videos exceeds 300GB and is not hosted in this repository. Developers must locally extract video frames and optionally generate feature `.pt` files for few-shot tasks.-
Precomputed proposal-level features (
proposals_fea_pth
)
- Cabin: car_combined_proposals_fea.pt
- Platform: plat_combined_proposals_fea.pt -
CLIP region-level visual features (
CLIP_visual_fea_reg
)
- Cabin: VitOutput_RGB_20s_frame_fps_5
- Platform: VitOutput_plat -
Custom feature generation
You may generate your own `.pt` feature files by using the provided script:
i2pt.py
This script uses CLIP to extract proposal-level and region-level visual features from image frames. -
Video-to-frame conversion (required)
Since raw frame images are not provided due to size constraints (>300GB), please use the following shell script to extract frames at 5 FPS usingffmpeg
. This step is required before runningi2pt.py
:IN_DATA_DIR="/path/to/your/raw/videos/" OUT_DATA_DIR="/path/to/output/frames_fps5/" if [[ ! -d "${OUT_DATA_DIR}" ]]; then echo "${OUT_DATA_DIR} not exist. Create it." mkdir -p "${OUT_DATA_DIR}" fi for category_dir in "${IN_DATA_DIR}"*/; do category_name=$(basename "${category_dir}") for video in "${category_dir}"*; do if [[ -f "${video}" ]]; then video_name=${video##*/} if [[ $video_name == *.webm ]]; then video_name=${video_name:0:-5} elif [[ $video_name == *.mp4 ]]; then video_name=${video_name:0:-4} elif [[ $video_name == *.avi ]]; then video_name=${video_name:0:-4} else echo "error: ${video_name}" continue fi out_video_dir="${OUT_DATA_DIR}/${category_name}/${video_name}" mkdir -p "${out_video_dir}" ffmpeg -i "${video}" -vf fps=5 "${out_video_dir}/frame_%06d.png" fi done done
Note: If you are using our provided `.pt` features, frame extraction is not required. However, for custom tasks or updated CLIP models, we strongly encourage generating your own features from extracted frames.
-
Precomputed proposal-level features (
-
Customize your own few-shot task
The annotation files used for training and evaluation are located at:
https://github.com/Yanc3113/TADD/.../Annotation4VideoClassification
These include full video-to-label mappings. You may easily construct custom tasks by:- Using item category only (e.g., drinks, packages, wallets)
- Using a narrower range of categories during evaluation
- Isolating specific camera viewpoints (platform vs. cabin)
-
Run the benchmark
For MoLo, navigate to the root directory and run:
cd /root/MoLo-master
python runs/run.py --cfg configs/projects/MoLo/ucf101/3way_1shot_car_rgb.yaml
Make sure the paths in the YAML are set to your local TADD directory. -
Hardware & environment
All experiments were conducted using:- 4 × NVIDIA A100 (L20) GPUs
- CUDA ≥ 11.6, PyTorch ≥ 1.12
- Python ≥ 3.8
torch.distributed.launch
ortorchrun
interfaces.
We encourage the community to design custom few-shot splits or meta-datasets using the flexible TADD label structure and provided baselines.
Video Classification Benchmark
To evaluate video classification models on the TADD dataset, please follow the official steps below.
-
Clone the mmaction2 repository
git clone https://github.com/open-mmlab/mmaction2.git
-
Prepare the environment
Install the required dependencies following the mmaction2 documentation (compatible with PyTorch, MMEngine, etc.). -
Download and select configuration files
We use the official Swin-based recognition configs released in 2025. The supported base YAMLs are:swin_base_p244_w877_in1k_pre_8xb8_amp_32x2x1_30e_kinetics400_rgb.py
swin_large_p244_w877_in22k_pre_8xb8_amp_32x2x1_30e_kinetics400_rgb.py
swin_tiny_p244_w877_in1k_pre_8xb8_amp_32x2x1_30e_kinetics400_rgb.py
configs/recognition/swin
in mmaction2. -
Modify configuration for TADD
Within the selected YAML file, please update:data_root
: Set to your local TADD dataset directory.ann_file_train
/ann_file_val
: Use the official TADD annotation files listed below.pretrained
: If using our weights, point to the correct checkpoint from the URL below.
-
Download pretrained weights (optional)
You may use the pretrained weights provided on our project website:
https://yanc3113.github.io/TADD/checkpoints.html
These checkpoints are compatible with the Swin-based configurations listed above. -
Use TADD annotation files
Official video-level annotation files (train/val) are hosted on GitHub:
https://github.com/Yanc3113/TADD/.../Annotation4VideoClassification
These include labels for all actions, objects, and scenes within TADD. Developers are encouraged to customize the labels based on their task needs—for example, focusing only on specific action types or object categories. -
Launch training or evaluation
Use the following command to start evaluation:
bash tools/dist_test.sh
--eval top_k_accuracy
Or for training:
bash tools/dist_train.sh
We encourage the community to build custom video classification benchmarks based on TADD by adapting the YAML files and annotation formats.