Wednesday, July 1, 2009

Crawler or Web Spider

A spider web or crawler is a program that inspects the World Wide Web pages in a methodical and automated. One of the most common uses are given is to create a copy of all visited web pages for later processing by a search engine that indexes pages provide a quick search system. The bots often spider web (the most widely used of these).

Web spiders begin visiting a list of URLs, it identifies the hyperlinks on those pages and added to the list of URLs to visit on a recurring basis according to certain set of rules. Normal operation is that it gives the program an initial group of addresses, the spider download these addresses, analyzes the pages and look for links to new pages. Then download these new pages, examines its links, and so on.

Among the most common spiders of the web are:
* Create the index of a search engine.
* Analyze the links of a site to find broken links.
* Collect information for a certain type, pricing of products to compile a catalog.

No comments:

Post a Comment