Document Type : Research Paper

Authors

1 Department of Computer Science, Technology University,Baghdad, Iraq

2 Department of Computer Science, Technology University, Baghdad, Iraq

Abstract

The World Wide Web (WWW) is a vast repository of knowledge, 
including intellectual, social, financial, and security-related data. Online 
information is typically accessed for instructional purposes. On the internet, 
information is accessible in a variety of formats and access interfaces. Because 
of this, indexing or semantic processing of the data via websites may be 
difficult. The method that seeks to resolve this issue is web data scraping. 
Unstructured web data can be converted into structured data using web data 
scraping so that it can be stored and examined in a central local database or 
spreadsheet. This paper offers a metadata scraping using a programmable 
Customized Search Engine (CSE) system, which can extract metadata from web 
pages (HTML pages) in the Google database and save it in an XML format for 
later analysis and retrieval. Documents that contain metadata are a relatively 
recent phenomenon on the web and increase the likelihood that users will find 
the information they need. 

Keywords

Main Subjects