Web Harvesting for Data Retrieval on Scientific Journal Sites

Authors

  • I Gede Surya Rahayuda Institute of Technology and Business STIKOM Bali
  • Ni Putu Linda Santiari Institute of Technology and Business STIKOM Bali

DOI:

https://doi.org/10.32493/informatika.v6i1.10077

Keywords:

web harvesting, web mining, parsing, bootstrap, journal

Abstract

Publishing scientific articles online in journals is a must for researchers or academics. In choosing the journal of purpose, the researcher must look at important information on the journal's web, such as indexing, scope, fee, quarter and other information. This information is generally not collected in one page, but spread over several pages in a web journal. This will be complicated when researchers have to look at information in several journals, moreover, the information in these journals may change at any time. In this research, web harvesting design is conducted to retrieve information on web journals. With web harvesting, information that is spread across several pages can be collected into one, and researchers do not need to worry if the information has changed, because the information collected is the last or updated information. Harvesting technique is done by taking the page URL of the page, starting the source code from where the information is retrieved and end source code until the information stops being retrieved. Harvesting technique was successfully developed based on the web bootstrap framework. The test data is taken from several scientific journal webs. The information collected includes name, description, accreditation, indexing, scope, publication rate, publication charge, template and quarter. Based on tests carried out using black box testing, it is known that all the features made are as expected.

References

Aleryani, A. Y. (2016). Comparative Study between Data Flow Diagram and Use Case Diagram. International Journal of Scientific and Research Publications, 6(3).

Chifu, E. S., & Letia, T. S. (2015). Web Harvesting and Sentiment Analysis of Consumer Feedback. Acta Technica Napocensis Electronics and Telecommunications, 56(3).

Chong, H.-Y., & Diamantopoulos, A. (2020). Integrating Advanced Technologies to Uphold Security of Payment: Data Flow Diagram. Automation in Construction, 114.

Eason, O. K. (2016). Information Systems Development Methodologies Transitions: An Analysis of Waterfall to Agile Methodology. University of New Hampshire.

Gaikwad, S. S., & Adkar, P. (2019). A Review Paper on Bootstrap Framework. IRE Journals, 2(10).

Haralson, D. (2016). Automating Website Crawling using Web Scraping Techniques Provided by PHP. Helsinki Metropolia University of Applied Sciences.

Henard, C., & Papadakis, M. (2016). Comparing White-Box and Black-Box Test Prioritization. International Conference on Software Engineering (ICSE).

Indra, E., Steffanily, & Dinesh, T. (2019). Designing Android Gaming News & Information Application Using Java-Based Web Scraping Technique. Journal of Physics: Conference Series.

Irhamn, F., & Siahaan, D. (2019). Object-Oriented Data Flow Diagram Similarity Measurement Using Greedy Algorithm. International Conference on Cybernetics and Intelligent System (ICORIS).

Johnson, P. A., & Sieber, R. E. (2012). Automated Web Harvesting to Collect and Analyse User Generated Content for Tourism. Current Issues in Tourism, 15(3).

Josi, A., Abdillah, L. A., & Suryayusra. (2014). Penerapan Teknik Web Scraping pada Mesin Pencari Artikel Ilmiah. Jurnal Sistem Informasi, 5, 6.

Krause, J. (2020). Introduction to Bootstrap. Apress, Berkeley, CA.

Rabby, S. I. (2017). The Web Application Based On Web Scraping. Stamford University Bangladesh.

Rani, S. B. A. S. U. (2017). A detailed study of Software Development Life Cycle (SDLC) Models. International Journals of Engineering and Computer Science, 6(7).

Roman, A. (2018). Black-Box Testing Techniques. Springer, Cham.

Sahria, Y. (2020). Implementasi Teknik Web Scraping pada Jurnal SINTA untuk Analisis Topik Penelitian Kesehatan Indonesia. Proceeding of The 11th University Research Colloquium 2020: Bidang Sains Dan Teknologi.

Saurkar, A. V., & Pathare, K. G. (2018). An Overview On Web Scraping Techniques And Tools. International Journal on Future Revolution in Computer Science and Communication Engineering, 4(4).

T, K. (2019). Personalized Content Extraction and Text Classification Using Effective Web Scraping Techniques. International Journal of Web Portals, 12.

Xu, S., & Chen, L. (2016). A Comparative Study on Black-Box Testing with Open Source Applications. International Conference on Software Engineering.

Zhao, B. (2017). Web Scraping. Springer International Publishing AG.

Downloads

Published

2021-03-31