Surveying Information Technologies Currently in Use on the World Wide Web

Institution

Morehead State University

Abstract

The goal of this research is to assess the use of technologies and protocols across the World Wide Web today and in the future. This will be done through a robot agent, which will programmatically download, store and analyze websites and web server information. The results of these site examinations will be held in a database The first step was to setup the server hardware and software to host the database and to run the robot agent program used to examine the websites. The server was setup using a Linux operating system, it hosted a MySQL database server and an Apache web server. The robot agent was written in PERL, and has been successfully implemented in preliminary testing. A key element was development of a database model to hold the complex data that describes a website, its hosting server(s), domain, and the owners of those items, as well as the structure and contents of the web pages at that site. Sources for URLs to start examining were extracted programmatically from online ranking sites. Over 20,000 URLs, along with rating information for the sites, have been extracted and written to the database. In the next stages, more URLs will be gathered, and the robot agent will begin to spider the list of collected URLs, collecting data on each site. The next major step will then be to parse all of the HTML pages and server responses, and analyze what types of technologies are being implemented and how they are being used.

This document is currently not available here.

Share

COinS
 

Surveying Information Technologies Currently in Use on the World Wide Web

The goal of this research is to assess the use of technologies and protocols across the World Wide Web today and in the future. This will be done through a robot agent, which will programmatically download, store and analyze websites and web server information. The results of these site examinations will be held in a database The first step was to setup the server hardware and software to host the database and to run the robot agent program used to examine the websites. The server was setup using a Linux operating system, it hosted a MySQL database server and an Apache web server. The robot agent was written in PERL, and has been successfully implemented in preliminary testing. A key element was development of a database model to hold the complex data that describes a website, its hosting server(s), domain, and the owners of those items, as well as the structure and contents of the web pages at that site. Sources for URLs to start examining were extracted programmatically from online ranking sites. Over 20,000 URLs, along with rating information for the sites, have been extracted and written to the database. In the next stages, more URLs will be gathered, and the robot agent will begin to spider the list of collected URLs, collecting data on each site. The next major step will then be to parse all of the HTML pages and server responses, and analyze what types of technologies are being implemented and how they are being used.