Web Harvesting for Data Retrieval on Scientific Journal Sites
DOI:
https://doi.org/10.32493/informatika.v6i1.10077Keywords:
web harvesting, web mining, parsing, bootstrap, journalAbstract
Publishing scientific articles online in journals is a must for researchers or academics. In choosing the journal of purpose, the researcher must look at important information on the journal's web, such as indexing, scope, fee, quarter and other information. This information is generally not collected in one page, but spread over several pages in a web journal. This will be complicated when researchers have to look at information in several journals, moreover, the information in these journals may change at any time. In this research, web harvesting design is conducted to retrieve information on web journals. With web harvesting, information that is spread across several pages can be collected into one, and researchers do not need to worry if the information has changed, because the information collected is the last or updated information. Harvesting technique is done by taking the page URL of the page, starting the source code from where the information is retrieved and end source code until the information stops being retrieved. Harvesting technique was successfully developed based on the web bootstrap framework. The test data is taken from several scientific journal webs. The information collected includes name, description, accreditation, indexing, scope, publication rate, publication charge, template and quarter. Based on tests carried out using black box testing, it is known that all the features made are as expected.References
Aleryani, A. Y. (2016). Comparative Study between Data Flow Diagram and Use Case Diagram. International Journal of Scientific and Research Publications, 6(3).
Chifu, E. S., & Letia, T. S. (2015). Web Harvesting and Sentiment Analysis of Consumer Feedback. Acta Technica Napocensis Electronics and Telecommunications, 56(3).
Chong, H.-Y., & Diamantopoulos, A. (2020). Integrating Advanced Technologies to Uphold Security of Payment: Data Flow Diagram. Automation in Construction, 114.
Eason, O. K. (2016). Information Systems Development Methodologies Transitions: An Analysis of Waterfall to Agile Methodology. University of New Hampshire.
Gaikwad, S. S., & Adkar, P. (2019). A Review Paper on Bootstrap Framework. IRE Journals, 2(10).
Haralson, D. (2016). Automating Website Crawling using Web Scraping Techniques Provided by PHP. Helsinki Metropolia University of Applied Sciences.
Henard, C., & Papadakis, M. (2016). Comparing White-Box and Black-Box Test Prioritization. International Conference on Software Engineering (ICSE).
Indra, E., Steffanily, & Dinesh, T. (2019). Designing Android Gaming News & Information Application Using Java-Based Web Scraping Technique. Journal of Physics: Conference Series.
Irhamn, F., & Siahaan, D. (2019). Object-Oriented Data Flow Diagram Similarity Measurement Using Greedy Algorithm. International Conference on Cybernetics and Intelligent System (ICORIS).
Johnson, P. A., & Sieber, R. E. (2012). Automated Web Harvesting to Collect and Analyse User Generated Content for Tourism. Current Issues in Tourism, 15(3).
Josi, A., Abdillah, L. A., & Suryayusra. (2014). Penerapan Teknik Web Scraping pada Mesin Pencari Artikel Ilmiah. Jurnal Sistem Informasi, 5, 6.
Krause, J. (2020). Introduction to Bootstrap. Apress, Berkeley, CA.
Rabby, S. I. (2017). The Web Application Based On Web Scraping. Stamford University Bangladesh.
Rani, S. B. A. S. U. (2017). A detailed study of Software Development Life Cycle (SDLC) Models. International Journals of Engineering and Computer Science, 6(7).
Roman, A. (2018). Black-Box Testing Techniques. Springer, Cham.
Sahria, Y. (2020). Implementasi Teknik Web Scraping pada Jurnal SINTA untuk Analisis Topik Penelitian Kesehatan Indonesia. Proceeding of The 11th University Research Colloquium 2020: Bidang Sains Dan Teknologi.
Saurkar, A. V., & Pathare, K. G. (2018). An Overview On Web Scraping Techniques And Tools. International Journal on Future Revolution in Computer Science and Communication Engineering, 4(4).
T, K. (2019). Personalized Content Extraction and Text Classification Using Effective Web Scraping Techniques. International Journal of Web Portals, 12.
Xu, S., & Chen, L. (2016). A Comparative Study on Black-Box Testing with Open Source Applications. International Conference on Software Engineering.
Zhao, B. (2017). Web Scraping. Springer International Publishing AG.
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Jurnal Informatika Universitas Pamulang have CC-BY-NC or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.
In developing strategy and setting priorities, Jurnal Informatika Universitas Pamulang recognize that free access is better than priced access, libre access is better than free access, and libre under CC-BY-NC or the equivalent is better than libre under more restrictive open licenses. We should achieve what we can when we can. We should not delay achieving free in order to achieve libre, and we should not stop with free when we can achieve libre.
Jurnal Informatika Universitas Pamulang is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
YOU ARE FREE TO:
- Share : copy and redistribute the material in any medium or format
- Adapt : remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms