Implementasi Algoritma LSTM pada Aplikasi Optical Character Recognition Berbasis Website Menggunakan Tesseract OCR

Alpha Fausta Ikrar Setyadi; Yeremia Alfa Susetyo

Implementasi Algoritma LSTM pada Aplikasi Optical Character Recognition Berbasis Website Menggunakan Tesseract OCR

Penulis

Alpha Fausta Ikrar Setyadi Universitas Kristen Satya Wacana
Yeremia Alfa Susetyo Universitas Kristen Satya Wacana

Kata Kunci:

Pengenalan Karakter Optik, Long Short-Term Memory, Tesseract

Abstrak

Pengolahan dokumen digital yang lebih praktis membuat berbagai instansi dan organisasi beralih dokumen fisik menjadi digital. Namun proses ekstraksi data dari dokumen fisik secara manual membutuhkan usaha yang tidak mudah dan rentan akan terjadinya kesalahan input akibat human error. Teknologi Optical Character Recognition (OCR) dapat menjadi solusi dari permasalahan ini. OCR digunakan untuk mengenali huruf atau karakter yang ada pada suatu gambar, untuk kemudian disimpan menjadi data teks pada komputer. Pada penelitian ini, dilakukan implementasi teknologi OCR pada aplikasi berbasis website dengan metode Long Short-Term Memory. Berdasarkan pengujian akurasi diperoleh rata-rata nilai error pada tingkat karakter sebesar 6.56% dan pada tingkat kata sebesar 9,98%. Dari hasil yang didapat menunjukkan bahwa penerapan teknologi OCR dengan metode Long Short-Term Memory pada aplikasi website dapat menjadi solusi yang tepat dalam proses ekstraksi data dari dokumen fisik.

Referensi

Andreas, Y., Gunadi, K., & Purbowo, A. N. (2020). Implementasi Tesseract OCR untuk Pembuatan Aplikasi Pengenalan Nota pada Android. JURNAL INFRA, 8(1).

Bukhari, S. S., Francis, S., Kamath, C. N. N., & Dengel, A. (2018). An investigative analysis of different LSTM libraries for supervised and unsupervised architectures of OCR training. Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, 2018-August, 447â€“452. https://doi.org/10.1109/ICFHR-2018.2018.00084

Cahyo Santoso, B., Natasya, Y., Willian, S., & Alfando, F. (2020). Tinjauan Pustaka Sistematis terhadap Basis Data MongoDB. JII: Jurnal Inovasi Informatika Universitas Pradita, 5(2), 132â€“142.

Firdaus, A., Syamsu Kurnia, M., Shafera, T., Firdaus, W. I., Teknik, J., Politeknik, K., & Sriwijaya -Palembang, N. (2021). Implementasi Optical Character Recognition (OCR) Pada Masa Pandemi Covid-19. Jurnal JUPITER, 13(2), 188â€“194.

Hartanto, S., Sugiharto, A., Sukmawati, D., & Endah, N. (2014). Optical Character Recognition Menggunakan Algoritma Template Matching Correlation. Jurnal Masyarakat Informatika, 5(9), 1â€“14.

Idrees, S., & Hassani, H. (2021). Exploiting script similarities to compensate for the large amount of data in training tesseract lstm: Towards kurdish ocr. Applied Sciences (Switzerland), 11(20). https://doi.org/10.3390/app11209752

Lestari, I. N. T., & Mulyana, D. I. (2022). Implementation of Ocr (Optical Character Recognition) Using Tesseract in Detecting Character in Quotes Text Images. Journal of Applied Engineering and Technological Science, 4(1), 58â€“63.

Lestari, S., & Fakhri Pratama, M. (2022). Penerapan Metode Long Short-Term Memory Pada Pendataan Warga Berbasis Android. Journal of Computer System and Informatics (JoSYC), 3(4), 156â€“161. https://doi.org/10.47065/josyc.v3i4.1951

Mursari, L. R., & Wibowo, A. (2021). The Effectiveness of Image Preprocessing on Digital Handwritten Scripts Recognition with The Implementation of OCR Tesseract. Computer Engineering and Applications, 10(3).

Nurhaliza, S. S., Subali, M., Etp, L., & Rozi, D. (2022). Analisis Kinerja Optical Character Recognition untuk Membaca Dokumen Secara Otomatis. In Seminar Nasional Teknologi Informasi dan Komunikasi STI&K (SeNTIK) (Vol. 6, Issue 1).

Singh, J., & Bhushan, B. (2019). Real Time Indian License Plate Detection using Deep Neural Networks and Optical Character Recognition using LSTM Tesseract. IEEE 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), 347â€“352.

Smith, R. (2007). An Overview of the Tesseract OCR Engine. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 2, 619â€“633. https://doi.org/10.1109/ICDAR.2007.4376991

Toha, M. R., & Triayudi, A. (2022). Penerapan Membaca Tulisan di dalam Gambar Menggunakan Metode OCR Berbasis Website pada e-KTP. Jurnal Sains Dan Teknologi, 11, 175â€“183. https://doi.org/10.23887/jst-undiksha.v11i1

Tsimpiris, A., Varsamis, D., & Pavlidis, G. (2022). Tesseract OCR evaluation on Greek food menus datasets. International Journal of Computing and Optimization, 9(1), 13â€“32. https://doi.org/10.12988/ijco.2022.9829

Ujwala B S, & Sumathi K. (2019). A Novel Approach Towards Implementation Of Optical Character Recognition Using LSTM And Adaptive Classifier. JNNCE Journal of Engineering & Management (JJEM), 3(2), 59â€“68. https://doi.org/10.37312/JJEM.2019.030206

Yuwono, B. (2010). Image Smoothing Menggunakan Mean Filtering, Median Filtering, Modus Filtering dan Gaussian Filtering. Telematika : Jurnal Informatika Dan Teknologi Informasi, 7(1). https://doi.org/https://doi.org/10.31315/telematika.v7i1.416

Jurnal Teknologi Sistem Informasi dan Aplikasi Vol. 6 No. 2 April 2023

Unduhan

Diterbitkan

2023-04-30

Cara Mengutip

Setyadi, A. F. I., & Susetyo, Y. A. (2023). Implementasi Algoritma LSTM pada Aplikasi Optical Character Recognition Berbasis Website Menggunakan Tesseract OCR. Jurnal Teknologi Sistem Informasi Dan Aplikasi, 6(2), 63–71. Diambil dari https://openjournal.unpam.ac.id/index.php/JTSI/article/view/29235

Unduh Sitasi

Terbitan

Vol 6 No 2 (2023): Jurnal Teknologi Sistem Informasi dan Aplikasi

Bagian

Article

Lisensi

Artikel ini berlisensi Creative Commons Attribution-NonCommercial 4.0 International License.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Jurnal Teknologi Sistem Informasi dan Aplikasi have CC BY-NC or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.

In developing strategy and setting priorities, Jurnal Teknologi Sistem Informasi dan Aplikasi recognize that free access is better than priced access, libre access is better than free access, and libre under CC BY-NC or the equivalent is better than libre under more restrictive open licenses. We should achieve what we can when we can. We should not delay achieving free in order to achieve libre, and we should not stop with free when we can achieve libre.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License

YOU ARE FREE TO:

Share - copy and redistribute the material in any medium or format
Adapt - remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms

Implementasi Algoritma LSTM pada Aplikasi Optical Character Recognition Berbasis Website Menggunakan Tesseract OCR

Penulis

Kata Kunci:

Abstrak

Referensi

Unduhan

Diterbitkan

Cara Mengutip

Terbitan

Bagian

Lisensi

YOU ARE FREE TO:

certificate

template

AdditionalMenu

indexing

GoogleScholarCitation

statistics

SupportingTools

SupportedBy

Terbitan Terkini

Bahasa