Tag Archives: Hash Table

Webcrawler and Search Engine

Screen shot 2010-10-26 at 1.22.52 AM_1

We first designed a webcrawler in C++ using a provided HTTP interface to “crawl” a list of initial URLs to a maximum depth. Along with archiving URLs, the crawler stored the textual words on a webpage for searching purposes and a 100-word description of every page.

Next, we designed a search engine that preprocessed the results of the webcrawler to store data in one of three data structures:

• Array
• Hash Table
• AVL Dictionary
• Binary Search Dictionary

We used an HTTP server provided to format HTML pages to display results of queries, as well as display search times to compare the different data structures’ efficiency. We called the search engine Boogle, as a combination of “Boilermaker” (Purdue Boilermakers) and Google.