Audio-Visual Speech Recognition Using 3d Convolutional Neural Networks

Belhan C.; Fikirdanis D.; Cimen O.; Pasinli P.; Akgun Z.; Yayci Z.O.; Turkan M.

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14365/3510

Full metadata record

DC Field	Value	Language
dc.contributor.author	Belhan C.	-
dc.contributor.author	Fikirdanis D.	-
dc.contributor.author	Cimen O.	-
dc.contributor.author	Pasinli P.	-
dc.contributor.author	Akgun Z.	-
dc.contributor.author	Yayci Z.O.	-
dc.contributor.author	Turkan M.	-
dc.date.accessioned	2023-06-16T14:59:33Z	-
dc.date.available	2023-06-16T14:59:33Z	-
dc.date.issued	2021	-
dc.identifier.isbn	9.78E+12	-
dc.identifier.uri	https://doi.org/10.1109/ASYU52992.2021.9599016	-
dc.identifier.uri	https://hdl.handle.net/20.500.14365/3510	-
dc.description	IEEE SMC Society;IEEE Turkey Section	en_US
dc.description	2021 Innovations in Intelligent Systems and Applications Conference, ASYU 2021 -- 6 October 2021 through 8 October 2021 -- 174400	en_US
dc.description.abstract	Lip reading, described as extracting speech data from the observable deeds in the face, particularly the jaws, lips, tongue and teeth, is a very challenging task. It is indeed a beneficial skill that helps people to comprehend and interpret the content of other people's speech, when it is not sufficient to recognize either audio or expression. Even experts require a certain level of experience and need an understanding of visual expressions to interpret spoken words. However, this may not be efficient enough for the process. Nowadays, lip sequences can be converted into expressive words and phrases with the aid of computers. Thus, the usage of neural networks (NNs) is increased rapidly in this field. The main contribution of this study is to use Short-Time Fourier Transformed (STFT) audio data as an extra image input and employing 3D Convolutional NNs (CNNs) for feature extraction. This generates features considering the change in consecutive frames and makes use of visual and auditory data together with the attributes from the image. After testing several experimental scenarios, it turns out to be the proposed method has a strong promise for further development in this research domain. © 2021 IEEE.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.relation.ispartof	Proceedings - 2021 Innovations in Intelligent Systems and Applications Conference, ASYU 2021	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	3D convolutional neural network	en_US
dc.subject	audio-visual speech recognition	en_US
dc.subject	automatic speech recognition	en_US
dc.subject	Lip reading	en_US
dc.subject	short-time Fourier Transform	en_US
dc.subject	Convolution	en_US
dc.subject	Speech	en_US
dc.subject	Speech recognition	en_US
dc.subject	3d convolutional neural network	en_US
dc.subject	Audiovisual speech recognition	en_US
dc.subject	Automatic speech recognition	en_US
dc.subject	Convolutional neural network	en_US
dc.subject	Fourier	en_US
dc.subject	Lip reading	en_US
dc.subject	Neural-networks	en_US
dc.subject	Short time Fourier transforms	en_US
dc.subject	Speech data	en_US
dc.subject	Spoken words	en_US
dc.subject	Convolutional neural networks	en_US
dc.title	Audio-Visual Speech Recognition Using 3d Convolutional Neural Networks	en_US
dc.type	Conference Object	en_US
dc.identifier.doi	10.1109/ASYU52992.2021.9599016	-
dc.identifier.scopus	2-s2.0-85123175238	-
dc.authorscopusid	57224918896	-
dc.authorscopusid	57419656500	-
dc.authorscopusid	57420178400	-
dc.authorscopusid	57420002100	-
dc.authorscopusid	57419831800	-
dc.authorscopusid	14069326000	-
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
dc.identifier.scopusquality	N/A	-
dc.identifier.wosquality	N/A	-
item.openairetype	Conference Object	-
item.grantfulltext	reserved	-
item.languageiso639-1	en	-
item.cerifentitytype	Publications	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.fulltext	With Fulltext	-
crisitem.author.dept	05.10. Mechanical Engineering	-
Appears in Collections:	Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Files in This Item:

File	Size	Format
2605.pdf Restricted Access	5.87 MB	Adobe PDF	View/Open

Show simple item record

CORE Recommender

SCOPUS^TM
Citations

2

checked on Mar 26, 2025

Page view(s)

72

checked on Mar 31, 2025

Download(s)

6

checked on Mar 31, 2025

Google Scholar^TM

Check

Files in This Item:

SCOPUSTM Citations

Page view(s)

Download(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM