DOIONLINE

DOIONLINE NO - IJAECS-IRAJ-DOIONLINE-18013

Publish In
International Journal of Advances in Electronics and Computer Science-IJAECS
Journal Home
Volume Issue
Issue
Volume-8,Issue-5  ( May, 2021 )
Paper Title
Image Caption Generation with Novel Application Domains
Author Name
Hetal Tiwari, Gaurav Upadhyay, Dharmendra Mangal
Affilition
Student, Department of Computer Science and Engineering, Medi-Caps University Assistant Professor (S.G), Department of Computer Science and Engineering, Medi-Caps University
Pages
63-67
Abstract
Generating a natural language description of an image is attracting a lot of interest these days primarily because of its importance in practical applications and also because it connects two major fields of artificial intelligence namely computer vision and natural language processing. Existing approaches are either top-to-down, which starts from a gist of an image and converts it into words, or bottom-to-up, which comes up with words describing various aspects of an image and then combines them. In this paper, an algorithm that uses a top-down approach through a hybrid system with the help of ResNet50 architecture (a multi-layer Convolutional Neural Network (CNN)) for image feature extraction and a Long Short Term Memory (LSTM) to accurately structure meaningful sentences has been employed. The efficiency of our proposed model is showcased using the Flickr 8K dataset. In this experiment, the Flickr 8K dataset was utilized which was found to be generating sensible and accurate captions in a majority of cases. Once the base model for caption generation is made ready, the experiment is further extended into two applications, namely, image retrieval using Pearson’s Correlation Coefficient and text-to-speech conversion for the visually impaired. The generated captions are stored in a database. Then an image retrieval system using Pearson's Correlation Coefficient is built which aims at retrieving images from the database that are similar to the query caption using Pearson’s Correlation Coefficient. Apart from this, for our second application, the text of the captions is converted into speech for the visually impaired using google’s TTS API. Keywords - Image Captioning, Convolutional Neural Networks (CNN), Residual Neural Network (ResNet), Recurrent Neural Network(RNN), Long Short Term Memory (LSTM).
  View Paper