M.Sc. Tezi Görüntüleme

Student: GÖKHAN DİLEK
Supervisor: Asst. Prof. Dr. İbrahim SAVRAN
Department: Bilgisayar Mühendisliği
Institution: Graduate School of Natural and Applied Sciences
University: Karadeniz Technical University Turkey
Title of the Thesis: DETECTION AND CORRECTION OF READ ERRORS IN SEQUENCES OBTAINED WITH THE NEW GENERATION SEQUENCE METHOD
Level: M.Sc.
Acceptance Date: 19/3/2021
Number of Pages: 111
Registration Number: i3868
Summary:

      Bioinformatics studies have focused on genetic disease research, disease detection and DNA sequencing methods in order to find solutions to detected diseases. The main problems in the analysis of genetic data arise from the complexity of these data sequences due to their size. This complexity is due to the data reading error made by Next Generation Sequencing ,YND, devices. Since large data arrays cannot be read at once, data can be sequenced in chunks. With the help of YND devices, it has been made possible to read large genetic data. In addition, YND devices perform erroneous reading between 1% and 3% during the reading of genetic data sequences. In this study, a method for the detection of Braf Murin Sarcoma Viral Oncogene Homologous B1 gene mutation, which is one of the most common cancer diseases today, has been proposed. In this method, the healthy BRAF gene shared through the National Biotechnology Information Center was used. Synthetic data were produced proportional to the error rates by simulating YND readings. The faulty gene was read at the specified depth size and recorded in fasta format. The sequences were compared with the reference sequence, errors were detected and corrections were applied.

      

Key Words: Big Data, Bioinformatics, Next-Generation Sequencing, DNA Sequencing, Kmer method, Bloom Filter, Hash Functions, BRAF Gene.