Document Type : Research Paper
Authors
1 Department of Computer Science, Technology University,Baghdad, Iraq
2 Department of Computer Science, Technology University, Baghdad, Iraq
Abstract
The World Wide Web (WWW) is a vast repository of knowledge,
including intellectual, social, financial, and security-related data. Online
information is typically accessed for instructional purposes. On the internet,
information is accessible in a variety of formats and access interfaces. Because
of this, indexing or semantic processing of the data via websites may be
difficult. The method that seeks to resolve this issue is web data scraping.
Unstructured web data can be converted into structured data using web data
scraping so that it can be stored and examined in a central local database or
spreadsheet. This paper offers a metadata scraping using a programmable
Customized Search Engine (CSE) system, which can extract metadata from web
pages (HTML pages) in the Google database and save it in an XML format for
later analysis and retrieval. Documents that contain metadata are a relatively
recent phenomenon on the web and increase the likelihood that users will find
the information they need.
Keywords
Main Subjects