Strata by HTML5 UP

Guideline

Welcome to the TADD repository >_<
All pretrained checkpoints are available through the button below.

Checkpoint

Home

Abnormal Detection Benchmark

To conduct abnormal detection benchmarks on the TADD dataset using STMixer, please follow the official instructions below:

Clone the STMixer repository
Clone the official STMixer repository to your local machine:
git clone https://github.com/MCG-NJU/STMixer.git

Prepare the environment
Install the required dependencies as outlined in the STMixer documentation (compatible with PyTorch, CUDA, etc.).

Download and configure YAML files
We provide officially curated YAML configuration files tailored for TADD abnormal detection:
- TADD Abnormal Detection YAMLs – Designed for detecting anomalies in cabin and platform settings
Download these files and adjust the following key parameters to suit your setup:
- data_root: Path to your local TADD dataset directory
- batch_size: Number of samples per batch (e.g., 8, 16) based on your hardware capacity
- num_workers: Number of workers for data loading (e.g., 4 or 8 depending on CPU cores)
- pretrained: Path to pretrained weights (optional, use checkpoints from our website)
Ensure the YAMLs are compatible with TADD’s video directory and label format.

Modify configuration for TADD
Within the selected YAML file, please update the following parameters:
- data_root: Set to your local TADD dataset directory
- ann_file_train / ann_file_val: Set to the paths of the downloaded TADD annotation files from the GitHub repository below
- pretrained: Point to the correct checkpoint from the URL below if using our weights
- input_size: Adjust to match TADD video resolution (e.g., 224x224)

Download pretrained weights (optional)
You may use the pretrained weights provided on our project website:
https://yanc3113.github.io/TADD/checkpoints.html
These checkpoints are compatible with the STMixer configurations listed above.

Download TADD annotation files
Official annotation files for abnormal detection are hosted on GitHub:
https://github.com/Yanc3113/TADD/.../AnomalyDetection/annotations
These files include labels for normal and abnormal events within TADD. Developers are encouraged to customize the labels based on their specific anomaly detection needs.

Launch training or evaluation
Use the following command to start evaluation:
python tools/test.py --config configs/tadd_abnormal/cabin_abnormal.yaml --checkpoint path/to/checkpoint.pth
Or for training:
python tools/train.py --config configs/tadd_abnormal/cabin_abnormal.yaml

Hardware & environment
All experiments were conducted using:
- 4 × NVIDIA A100 (L20) GPUs
- CUDA ≥ 11.6, PyTorch ≥ 1.12
- Python ≥ 3.8
Multi-GPU training is supported via torch.distributed.launch or torchrun interfaces.

We encourage the community to adapt and extend the STMixer model for custom abnormal detection tasks using the TADD dataset.

Few-shot Learning Benchmark

To conduct few-shot action recognition (FSAR) benchmarks on the TADD dataset, please follow the official instructions below:

Clone the baseline repositories
We support benchmarking with the following few-shot learning frameworks:
- MoLo (Alibaba): Transformer-based few-shot learner with motion-level modeling.
- Knowledge Prompting for FSAR: Meta-learning framework with semantic prompting and temporal modeling.

Frame rate and temporal settings
The TADD dataset is provided at a standard 30 FPS (frames per second). This is directly related to the following key configuration parameters in YAML:
- clip_len: Number of frames per video clip (e.g., 8, 16, 32)
- frame_interval: Interval between consecutive frames (e.g., 1, 2)
- num_segments: Used for TSN-style temporal sampling (optional)
For example, with clip_len: 16 and frame_interval: 2, the total clip covers ≈ 1.0 second of real time.

Reference YAML configuration files
We provide officially curated few-shot configuration files tailored for TADD:
- MoLo YAMLs – Focused on 3-way / 5-way / N-way × K-shot settings
- FSAR YAMLs – Designed for semantic-prompt-based learning
Each YAML file includes key parameters such as:
- n_shots: Number of support samples per class (e.g., 1, 5)
- n_ways: Number of classes per episode (e.g., 3, 5)
- query_per_class: Number of query samples used per class for evaluation
- max_epochs: Number of training iterations (optional)
- video_format: Either rgb or rgb+depth for TADD
- num_workers: Number of workers per dataloader
- backbone: e.g., ViT, MoLo, or TSN
All YAMLs are fully compatible with TADD’s frame directory and label format.

Precomputed features and custom generation
Due to GitHub storage limitations, the raw image dataset extracted from TADD videos exceeds 300GB and is not hosted in this repository. Developers must locally extract video frames and optionally generate feature `.pt` files for few-shot tasks.
- Precomputed proposal-level features (proposals_fea_pth)
  - Cabin: car_combined_proposals_fea.pt
  - Platform: plat_combined_proposals_fea.pt
- CLIP region-level visual features (CLIP_visual_fea_reg)
  - Cabin: VitOutput_RGB_20s_frame_fps_5
  - Platform: VitOutput_plat
- Custom feature generation
  You may generate your own `.pt` feature files by using the provided script:
  i2pt.py
  This script uses CLIP to extract proposal-level and region-level visual features from image frames.
- Video-to-frame conversion (required)
  Since raw frame images are not provided due to size constraints (>300GB), please use the following shell script to extract frames at 5 FPS using ffmpeg. This step is required before running i2pt.py:
```
IN_DATA_DIR="/path/to/your/raw/videos/"
						  OUT_DATA_DIR="/path/to/output/frames_fps5/"
						  
						  if [[ ! -d "${OUT_DATA_DIR}" ]]; then
							  echo "${OUT_DATA_DIR} not exist. Create it."
							  mkdir -p "${OUT_DATA_DIR}"
						  fi
						  
						  for category_dir in "${IN_DATA_DIR}"*/; do
							  category_name=$(basename "${category_dir}")
							  for video in "${category_dir}"*; do
								  if [[ -f "${video}" ]]; then  
									  video_name=${video##*/}
									  if [[ $video_name == *.webm ]]; then
										  video_name=${video_name:0:-5}
									  elif [[ $video_name == *.mp4 ]]; then
										  video_name=${video_name:0:-4}
									  elif [[ $video_name == *.avi ]]; then
										  video_name=${video_name:0:-4}
									  else
										  echo "error: ${video_name}"
										  continue
									  fi
									  out_video_dir="${OUT_DATA_DIR}/${category_name}/${video_name}"
									  mkdir -p "${out_video_dir}"
									  ffmpeg -i "${video}" -vf fps=5 "${out_video_dir}/frame_%06d.png"
								  fi
							  done
						  done
```
Note: If you are using our provided `.pt` features, frame extraction is not required. However, for custom tasks or updated CLIP models, we strongly encourage generating your own features from extracted frames.

Customize your own few-shot task
The annotation files used for training and evaluation are located at:
https://github.com/Yanc3113/TADD/.../Annotation4VideoClassification
These include full video-to-label mappings. You may easily construct custom tasks by:
- Using item category only (e.g., drinks, packages, wallets)
- Using a narrower range of categories during evaluation
- Isolating specific camera viewpoints (platform vs. cabin)

Run the benchmark
For MoLo, navigate to the root directory and run:
cd /root/MoLo-master
python runs/run.py --cfg configs/projects/MoLo/ucf101/3way_1shot_car_rgb.yaml
Make sure the paths in the YAML are set to your local TADD directory.

Hardware & environment
All experiments were conducted using:
- 4 × NVIDIA A100 (L20) GPUs
- CUDA ≥ 11.6, PyTorch ≥ 1.12
- Python ≥ 3.8
Multi-GPU training is supported by MoLo via torch.distributed.launch or torchrun interfaces.

We encourage the community to design custom few-shot splits or meta-datasets using the flexible TADD label structure and provided baselines.

Video Classification Benchmark

To evaluate video classification models on the TADD dataset, please follow the official steps below.

Clone the mmaction2 repository
git clone https://github.com/open-mmlab/mmaction2.git

Prepare the environment
Install the required dependencies following the mmaction2 documentation (compatible with PyTorch, MMEngine, etc.).

Download and select configuration files
We use the official Swin-based recognition configs released in 2025. The supported base YAMLs are:
- swin_base_p244_w877_in1k_pre_8xb8_amp_32x2x1_30e_kinetics400_rgb.py
- swin_large_p244_w877_in22k_pre_8xb8_amp_32x2x1_30e_kinetics400_rgb.py
- swin_tiny_p244_w877_in1k_pre_8xb8_amp_32x2x1_30e_kinetics400_rgb.py
These can be found under configs/recognition/swin in mmaction2.

Modify configuration for TADD
Within the selected YAML file, please update:
- data_root: Set to your local TADD dataset directory.
- ann_file_train / ann_file_val: Use the official TADD annotation files listed below.
- pretrained: If using our weights, point to the correct checkpoint from the URL below.

Download pretrained weights (optional)
You may use the pretrained weights provided on our project website:
https://yanc3113.github.io/TADD/checkpoints.html
These checkpoints are compatible with the Swin-based configurations listed above.

Use TADD annotation files
Official video-level annotation files (train/val) are hosted on GitHub:
https://github.com/Yanc3113/TADD/.../Annotation4VideoClassification
These include labels for all actions, objects, and scenes within TADD. Developers are encouraged to customize the labels based on their task needs—for example, focusing only on specific action types or object categories.

Launch training or evaluation
Use the following command to start evaluation:
bash tools/dist_test.sh --eval top_k_accuracy
Or for training:
bash tools/dist_train.sh

We encourage the community to build custom video classification benchmarks based on TADD by adapting the YAML files and annotation formats.

Home