EVALUATING OPEN-SOURCE IMAGE CAPTIONING MODELS WITH MULTIPLE METRICS ON THE IAPR TC-12 DATASET
Keywords:
Artificial Intelligence, image captioning, natural language processing, evaluation metrics, computer visionAbstract
In recent years, the development of image captioning AI models has been a focal point in the fields of computer vision and natural language processing (NLP). The paper presents a thorough comparative analysis of several state-of-the-art image captioning AI models, employing a diverse array of evaluation metrics, including CIDEr-D, BLEU-4, METEOR, ROUGE-L, SPICE, and Wu-Palmer similarity. The study is centered on the evaluation of image captioning models using the IAPR TC-12 dataset, a well-established benchmark for assessing visual content understanding. By leveraging multiple evaluation metrics, it was possible to gain a multifaceted understanding of the models' performance, encompassing both syntactic and semantic dimensions of generated captions. Comparative analysis highlights that different metrics capture distinct facets of image captioning quality with each shedding light on specific aspects of model performance.
In summary, this paper offers a valuable resource for researchers in the fields of computer vision and natural language processing. This comprehensive assessment of image captioning models using multiple evaluation metrics and the IAPR TC-12 dataset provides a deeper understanding of the current capabilities and limitations of AI-driven approaches for generating descriptive image captions. This analysis paves the way for future advancements in this rapidly evolving domain.