A COMPREHENSIVE QUALITY ASSESSMENT FRAMEWORK FOR A SYNTHETIC VIDEO DATA IN ACTION RECOGNITION SYSTEMS

D.M. Galstyan

Authors

D.M. Galstyan National Polytechnic University of Armenia Author

Keywords:

synthetic video quality, action recognition, video generation evaluation, temporal consistency, motion realism

Abstract

Synthetic video generation has become essential for training action recognition systems when real data is scarce. However, evaluating whether generated videos are actually useful for training remains challenging. Current methods rely on metrics like Fréchet Video Distance (FVD), which only measure distribution similarity and miss critical aspects like physical realism, temporal consistency, and actual performance improvement. This research presents a multi-dimensional quality assessment framework designed specifically for synthetic action videos. The framework evaluates six dimensions: perceptual quality, temporal consistency, motion realism, semantic correctness, diversity, and downstream task utility. It has been tested on over 50,000 synthetic videos generated by GANs, diffusion models, and flow-based approaches across UCF-101, HMDB-51, and Kinetics-400. Results show strong correlation with human expert judgments (0.91 Spearman) and accurately predict model performance improvements (0.89 Pearson). Most importantly, temporal consistency and motion realism are far better predictors of usefulness than perceptual quality, challenging current practices. The framework successfully identifies specific failures-temporal jitter, physics violations, semantic drift-that traditional metrics miss, providing actionable insights for improving the synthetic data quality.

A COMPREHENSIVE QUALITY ASSESSMENT FRAMEWORK FOR A SYNTHETIC VIDEO DATA IN ACTION RECOGNITION SYSTEMS

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

Most read articles by the same author(s)

Similar Articles

issn

editors

Make a Submission