Implementasi Algoritma LSTM pada Aplikasi Optical Character Recognition Berbasis Website Menggunakan Tesseract OCR

Authors

  • Alpha Fausta Ikrar Setyadi Universitas Kristen Satya Wacana
  • Yeremia Alfa Susetyo Universitas Kristen Satya Wacana

Keywords:

Optical Character Recognition, Long Short-Term Memory, Tesseract

Abstract

The practicality of digital document processing has made various companies and organizations switch physical documents to digital. However, the process of extracting data from physical documents manually requires effort and is prone to input errors due to human error. Optical Character Recognition (OCR) technology can be a solution to this problem. OCR is used to recognize letters or characters in an image, and then stored into text data on a computer. In this research, the implementation of OCR technology on a web-based application with Long Short-Term Memory method. Based on accuracy testing, the average error value at the character level is 6.56% and at the word level is 9.98%. From the results obtained, it shows that the application of OCR technology with Long Short-Term Memory method on web applications can be the right solution in the process of extracting data from physical document.

References

Andreas, Y., Gunadi, K., & Purbowo, A. N. (2020). Implementasi Tesseract OCR untuk Pembuatan Aplikasi Pengenalan Nota pada Android. JURNAL INFRA, 8(1).

Bukhari, S. S., Francis, S., Kamath, C. N. N., & Dengel, A. (2018). An investigative analysis of different LSTM libraries for supervised and unsupervised architectures of OCR training. Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, 2018-August, 447–452. https://doi.org/10.1109/ICFHR-2018.2018.00084

Cahyo Santoso, B., Natasya, Y., Willian, S., & Alfando, F. (2020). Tinjauan Pustaka Sistematis terhadap Basis Data MongoDB. JII: Jurnal Inovasi Informatika Universitas Pradita, 5(2), 132–142.

Firdaus, A., Syamsu Kurnia, M., Shafera, T., Firdaus, W. I., Teknik, J., Politeknik, K., & Sriwijaya -Palembang, N. (2021). Implementasi Optical Character Recognition (OCR) Pada Masa Pandemi Covid-19. Jurnal JUPITER, 13(2), 188–194.

Hartanto, S., Sugiharto, A., Sukmawati, D., & Endah, N. (2014). Optical Character Recognition Menggunakan Algoritma Template Matching Correlation. Jurnal Masyarakat Informatika, 5(9), 1–14.

Idrees, S., & Hassani, H. (2021). Exploiting script similarities to compensate for the large amount of data in training tesseract lstm: Towards kurdish ocr. Applied Sciences (Switzerland), 11(20). https://doi.org/10.3390/app11209752

Lestari, I. N. T., & Mulyana, D. I. (2022). Implementation of Ocr (Optical Character Recognition) Using Tesseract in Detecting Character in Quotes Text Images. Journal of Applied Engineering and Technological Science, 4(1), 58–63.

Lestari, S., & Fakhri Pratama, M. (2022). Penerapan Metode Long Short-Term Memory Pada Pendataan Warga Berbasis Android. Journal of Computer System and Informatics (JoSYC), 3(4), 156–161. https://doi.org/10.47065/josyc.v3i4.1951

Mursari, L. R., & Wibowo, A. (2021). The Effectiveness of Image Preprocessing on Digital Handwritten Scripts Recognition with The Implementation of OCR Tesseract. Computer Engineering and Applications, 10(3).

Nurhaliza, S. S., Subali, M., Etp, L., & Rozi, D. (2022). Analisis Kinerja Optical Character Recognition untuk Membaca Dokumen Secara Otomatis. In Seminar Nasional Teknologi Informasi dan Komunikasi STI&K (SeNTIK) (Vol. 6, Issue 1).

Singh, J., & Bhushan, B. (2019). Real Time Indian License Plate Detection using Deep Neural Networks and Optical Character Recognition using LSTM Tesseract. IEEE 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), 347–352.

Smith, R. (2007). An Overview of the Tesseract OCR Engine. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 2, 619–633. https://doi.org/10.1109/ICDAR.2007.4376991

Toha, M. R., & Triayudi, A. (2022). Penerapan Membaca Tulisan di dalam Gambar Menggunakan Metode OCR Berbasis Website pada e-KTP. Jurnal Sains Dan Teknologi, 11, 175–183. https://doi.org/10.23887/jst-undiksha.v11i1

Tsimpiris, A., Varsamis, D., & Pavlidis, G. (2022). Tesseract OCR evaluation on Greek food menus datasets. International Journal of Computing and Optimization, 9(1), 13–32. https://doi.org/10.12988/ijco.2022.9829

Ujwala B S, & Sumathi K. (2019). A Novel Approach Towards Implementation Of Optical Character Recognition Using LSTM And Adaptive Classifier. JNNCE Journal of Engineering & Management (JJEM), 3(2), 59–68. https://doi.org/10.37312/JJEM.2019.030206

Yuwono, B. (2010). Image Smoothing Menggunakan Mean Filtering, Median Filtering, Modus Filtering dan Gaussian Filtering. Telematika : Jurnal Informatika Dan Teknologi Informasi, 7(1). https://doi.org/https://doi.org/10.31315/telematika.v7i1.416

Published

2023-04-30

How to Cite

Setyadi, A. F. I., & Susetyo, Y. A. (2023). Implementasi Algoritma LSTM pada Aplikasi Optical Character Recognition Berbasis Website Menggunakan Tesseract OCR. Jurnal Teknologi Sistem Informasi Dan Aplikasi, 6(2), 63–71. Retrieved from https://openjournal.unpam.ac.id/index.php/JTSI/article/view/29235