M.Sc. Tezi Görüntüleme

Student: Erkan GÜNERHAN
Supervisor: Prof. Dr. Cemal KÖSE
Department: Bilgisayar Mühendisliği
Institution: Graduate School of Natural and Applied Sciences
University: Karadeniz Technical University Turkey
Title of the Thesis: SPEAKER RECOGNITION WITH LONG SHORT TERM MEMORY TYPE DEEP NEURAL NETWORKS
Level: M.Sc.
Acceptance Date: 27/2/2023
Number of Pages: 57
Registration Number: i4110
Summary:

      Human voice, It is personal and biometric, like a fingerprint, face shape or retina. There are fixed resonance points in the human voice. These fixed points do not change when talking or singing. Using this feature, it is aimed to determine the identity of the speaker from among many different speakers. Analogue sound is converted into digital form according to the number of samples. As a sampling rate, 16 kHz is sufficient to convey the characteristics of the human voice. VoxForge and LibriSpeech datasets were used as datasets in this study. Mel Frequency Cepstral Coefficients (MFCC), which is widely used among feature extraction algorithms for speaker recognition, has been preferred. Then it was trained with Artificial Neural Network (ANN), which is machine learning for neural network training. It has also been trained with Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) networks, which are types of deep learning. Results were compared for 20 speakers and 30 speakers. It is trained by adding additional hyperparameters to increase the accuracy percentage for the LSTM network. Results were compared for 20 and 30 users separately. It is trained by adding additional hyper parameters to increase the accuracy percentage for the LSTM network, which gives better results in time series. In general, it has been determined that the LSTM network gives higher results. With the addition of additional hyperparameters, the percentage of accuracy for 20 speakers increased from 95.2 to 99.5.

      Keywords: Speaker Recognition, MFCC, Artificial Neural Network, Long Short Term

Memory, Deep Learning, Hyperparameter.