A COMPREHENSIVE QUALITY ASSESSMENT FRAMEWORK FOR A SYNTHETIC VIDEO DATA IN ACTION RECOGNITION SYSTEMS

Authors

  • D.M. Galstyan National Polytechnic University of Armenia Author

Keywords:

synthetic video quality, action recognition, video generation evaluation, temporal consistency, motion realism

Abstract

Synthetic video generation has become essential for training action recognition systems when real data is scarce. However, evaluating whether generated videos are actually useful for training remains challenging. Current methods rely on metrics like Fréchet Video Distance (FVD), which only measure distribution similarity and miss critical aspects like physical realism, temporal consistency, and actual performance improvement. This research presents a multi-dimensional quality assessment framework designed specifically for synthetic action videos. The framework evaluates six dimensions: perceptual quality, temporal consistency, motion realism, semantic correctness, diversity, and downstream task utility. It has been tested on over 50,000 synthetic videos generated by GANs, diffusion models, and flow-based approaches across UCF-101, HMDB-51, and Kinetics-400. Results show strong correlation with human expert judgments (0.91 Spearman) and accurately predict model performance improvements (0.89 Pearson). Most importantly, temporal consistency and motion realism are far better predictors of usefulness than perceptual quality, challenging current practices. The framework successfully identifies specific failures-temporal jitter, physics violations, semantic drift-that traditional metrics miss, providing actionable insights for improving the synthetic data quality.

Downloads

Published

21.02.2026

Issue

Section

Articles

Similar Articles

1-10 of 12

You may also start an advanced similarity search for this article.