Guideline

Welcome to the TADD repository >_<
All pretrained checkpoints are available through the button below.

Abnormal Detection Benchmark

To conduct abnormal detection benchmarks on the TADD dataset using STMixer, please follow the official instructions below:

  1. Clone the STMixer repository
    Clone the official STMixer repository to your local machine:
    git clone https://github.com/MCG-NJU/STMixer.git

  2. Prepare the environment
    Install the required dependencies as outlined in the STMixer documentation (compatible with PyTorch, CUDA, etc.).

  3. Download and configure YAML files
    We provide officially curated YAML configuration files tailored for TADD abnormal detection: Download these files and adjust the following key parameters to suit your setup:
    • data_root: Path to your local TADD dataset directory
    • batch_size: Number of samples per batch (e.g., 8, 16) based on your hardware capacity
    • num_workers: Number of workers for data loading (e.g., 4 or 8 depending on CPU cores)
    • pretrained: Path to pretrained weights (optional, use checkpoints from our website)
    Ensure the YAMLs are compatible with TADD’s video directory and label format.

  4. Modify configuration for TADD
    Within the selected YAML file, please update the following parameters:
    • data_root: Set to your local TADD dataset directory
    • ann_file_train / ann_file_val: Set to the paths of the downloaded TADD annotation files from the GitHub repository below
    • pretrained: Point to the correct checkpoint from the URL below if using our weights
    • input_size: Adjust to match TADD video resolution (e.g., 224x224)

  5. Download pretrained weights (optional)
    You may use the pretrained weights provided on our project website:
    https://yanc3113.github.io/TADD/checkpoints.html
    These checkpoints are compatible with the STMixer configurations listed above.

  6. Download TADD annotation files
    Official annotation files for abnormal detection are hosted on GitHub:
    https://github.com/Yanc3113/TADD/.../AnomalyDetection/annotations
    These files include labels for normal and abnormal events within TADD. Developers are encouraged to customize the labels based on their specific anomaly detection needs.

  7. Launch training or evaluation
    Use the following command to start evaluation:
    python tools/test.py --config configs/tadd_abnormal/cabin_abnormal.yaml --checkpoint path/to/checkpoint.pth
    Or for training:
    python tools/train.py --config configs/tadd_abnormal/cabin_abnormal.yaml

  8. Hardware & environment
    All experiments were conducted using:
    • 4 × NVIDIA A100 (L20) GPUs
    • CUDA ≥ 11.6, PyTorch ≥ 1.12
    • Python ≥ 3.8
    Multi-GPU training is supported via torch.distributed.launch or torchrun interfaces.

We encourage the community to adapt and extend the STMixer model for custom abnormal detection tasks using the TADD dataset.

Few-shot Learning Benchmark

To conduct few-shot action recognition (FSAR) benchmarks on the TADD dataset, please follow the official instructions below:

  1. Clone the baseline repositories
    We support benchmarking with the following few-shot learning frameworks:

  2. Frame rate and temporal settings
    The TADD dataset is provided at a standard 30 FPS (frames per second). This is directly related to the following key configuration parameters in YAML:
    • clip_len: Number of frames per video clip (e.g., 8, 16, 32)
    • frame_interval: Interval between consecutive frames (e.g., 1, 2)
    • num_segments: Used for TSN-style temporal sampling (optional)
    For example, with clip_len: 16 and frame_interval: 2, the total clip covers ≈ 1.0 second of real time.

  3. Reference YAML configuration files
    We provide officially curated few-shot configuration files tailored for TADD:
    • MoLo YAMLs – Focused on 3-way / 5-way / N-way × K-shot settings
    • FSAR YAMLs – Designed for semantic-prompt-based learning
    Each YAML file includes key parameters such as:
    • n_shots: Number of support samples per class (e.g., 1, 5)
    • n_ways: Number of classes per episode (e.g., 3, 5)
    • query_per_class: Number of query samples used per class for evaluation
    • max_epochs: Number of training iterations (optional)
    • video_format: Either rgb or rgb+depth for TADD
    • num_workers: Number of workers per dataloader
    • backbone: e.g., ViT, MoLo, or TSN
    All YAMLs are fully compatible with TADD’s frame directory and label format.

  4. Precomputed features and custom generation
    Due to GitHub storage limitations, the raw image dataset extracted from TADD videos exceeds 300GB and is not hosted in this repository. Developers must locally extract video frames and optionally generate feature `.pt` files for few-shot tasks.
    • Precomputed proposal-level features (proposals_fea_pth)
      - Cabin: car_combined_proposals_fea.pt
      - Platform: plat_combined_proposals_fea.pt

    • CLIP region-level visual features (CLIP_visual_fea_reg)
      - Cabin: VitOutput_RGB_20s_frame_fps_5
      - Platform: VitOutput_plat

    • Custom feature generation
      You may generate your own `.pt` feature files by using the provided script:
      i2pt.py
      This script uses CLIP to extract proposal-level and region-level visual features from image frames.

    • Video-to-frame conversion (required)
      Since raw frame images are not provided due to size constraints (>300GB), please use the following shell script to extract frames at 5 FPS using ffmpeg. This step is required before running i2pt.py:
      IN_DATA_DIR="/path/to/your/raw/videos/"
      						  OUT_DATA_DIR="/path/to/output/frames_fps5/"
      						  
      						  if [[ ! -d "${OUT_DATA_DIR}" ]]; then
      							  echo "${OUT_DATA_DIR} not exist. Create it."
      							  mkdir -p "${OUT_DATA_DIR}"
      						  fi
      						  
      						  for category_dir in "${IN_DATA_DIR}"*/; do
      							  category_name=$(basename "${category_dir}")
      							  for video in "${category_dir}"*; do
      								  if [[ -f "${video}" ]]; then  
      									  video_name=${video##*/}
      									  if [[ $video_name == *.webm ]]; then
      										  video_name=${video_name:0:-5}
      									  elif [[ $video_name == *.mp4 ]]; then
      										  video_name=${video_name:0:-4}
      									  elif [[ $video_name == *.avi ]]; then
      										  video_name=${video_name:0:-4}
      									  else
      										  echo "error: ${video_name}"
      										  continue
      									  fi
      									  out_video_dir="${OUT_DATA_DIR}/${category_name}/${video_name}"
      									  mkdir -p "${out_video_dir}"
      									  ffmpeg -i "${video}" -vf fps=5 "${out_video_dir}/frame_%06d.png"
      								  fi
      							  done
      						  done

    Note: If you are using our provided `.pt` features, frame extraction is not required. However, for custom tasks or updated CLIP models, we strongly encourage generating your own features from extracted frames.


  5. Customize your own few-shot task
    The annotation files used for training and evaluation are located at:
    https://github.com/Yanc3113/TADD/.../Annotation4VideoClassification
    These include full video-to-label mappings. You may easily construct custom tasks by:
    • Using item category only (e.g., drinks, packages, wallets)
    • Using a narrower range of categories during evaluation
    • Isolating specific camera viewpoints (platform vs. cabin)

  6. Run the benchmark
    For MoLo, navigate to the root directory and run:
    cd /root/MoLo-master
    python runs/run.py --cfg configs/projects/MoLo/ucf101/3way_1shot_car_rgb.yaml
    Make sure the paths in the YAML are set to your local TADD directory.

  7. Hardware & environment
    All experiments were conducted using:
    • 4 × NVIDIA A100 (L20) GPUs
    • CUDA ≥ 11.6, PyTorch ≥ 1.12
    • Python ≥ 3.8
    Multi-GPU training is supported by MoLo via torch.distributed.launch or torchrun interfaces.

We encourage the community to design custom few-shot splits or meta-datasets using the flexible TADD label structure and provided baselines.

Video Classification Benchmark

To evaluate video classification models on the TADD dataset, please follow the official steps below.

  1. Clone the mmaction2 repository
    git clone https://github.com/open-mmlab/mmaction2.git

  2. Prepare the environment
    Install the required dependencies following the mmaction2 documentation (compatible with PyTorch, MMEngine, etc.).

  3. Download and select configuration files
    We use the official Swin-based recognition configs released in 2025. The supported base YAMLs are:
    • swin_base_p244_w877_in1k_pre_8xb8_amp_32x2x1_30e_kinetics400_rgb.py
    • swin_large_p244_w877_in22k_pre_8xb8_amp_32x2x1_30e_kinetics400_rgb.py
    • swin_tiny_p244_w877_in1k_pre_8xb8_amp_32x2x1_30e_kinetics400_rgb.py
    These can be found under configs/recognition/swin in mmaction2.

  4. Modify configuration for TADD
    Within the selected YAML file, please update:
    • data_root: Set to your local TADD dataset directory.
    • ann_file_train / ann_file_val: Use the official TADD annotation files listed below.
    • pretrained: If using our weights, point to the correct checkpoint from the URL below.

  5. Download pretrained weights (optional)
    You may use the pretrained weights provided on our project website:
    https://yanc3113.github.io/TADD/checkpoints.html
    These checkpoints are compatible with the Swin-based configurations listed above.

  6. Use TADD annotation files
    Official video-level annotation files (train/val) are hosted on GitHub:
    https://github.com/Yanc3113/TADD/.../Annotation4VideoClassification
    These include labels for all actions, objects, and scenes within TADD. Developers are encouraged to customize the labels based on their task needs—for example, focusing only on specific action types or object categories.

  7. Launch training or evaluation
    Use the following command to start evaluation:
    bash tools/dist_test.sh --eval top_k_accuracy
    Or for training:
    bash tools/dist_train.sh

We encourage the community to build custom video classification benchmarks based on TADD by adapting the YAML files and annotation formats.