Open Access Open Access  Restricted Access Subscription Access

Silent Speech Recognition

Adithi H S, Pratheeksha K N, Aaliya Waseem

Abstract


Lip reading using artificial intelligence aims to convert visual speech movements into readable text without relying on audio signals. This project presents an AI-based lip reading system that predicts spoken text from video input using deep learning techniques. The system processes uploaded video files and extracts visual features of lip movements, which are analyzed using pre-trained models such as LipNet with LSTM and CTC-based decoding. A Streamlit-based web interface is developed to allow users to upload videos and view the predicted text output easily. The proposed system demonstrates the ability to recognize and translate visual speech into characters, making it useful for assisting hearing-impaired individuals and speech recognition in noisy environments.


Full Text:

PDF

References


Y. Assael et al., “LipNet: Sentence-level lipreading,” arXiv:1611.01599, 2016.

J. Chung and A. Zisserman, “Lip reading in the wild,” ACCV, 2016.

S. Fenghour et al., “Lip reading sentences using deep learning,” IEEE Access, 2020.

B. Martinez et al., “Lipreading using temporal convolutional networks,” ICASSP, 2020.

Y. Lu and H. Li, “Automatic lip-reading system,” Applied Sciences, 2019.

N. Deshmukh et al., “Vision-based lip reading system using deep learning,” CCGE, 2021.

X. Zhao et al., “Mutual information maximization for effective lip reading,” FG, 2020. [8] N. Deshmukh, A. Ahire, S. H. Bhandari, A. Mali, and K. Warkari, “Vision-based lip reading system using deep learning,” in 2021 International Conference on Computing, Communication and Green Engineering (CCGE), pp. 1-6, 2021.

S. M. H. Chowdhury, M. Rahman, M. T. Oyshi, and M. A. Hasan, “Text extraction through video lip reading using deep learning,” in 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART), pp. 240-243, 2019.

B. Martinez, P. Ma, S. Petridis, and M. Pantic, “Lipreading using temporal convolutional networks,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6319-6323, 2020.

Y. Lu and H. Li, “Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory,” Appl. Sci. 2019, Vol. 9, p. 1599, Apr. 2019.


Refbacks

  • There are currently no refbacks.