A Project report on
                     COMPUTER BASIC
                         Submitted to
  The Institute of Chartered Accountant of India
                      XXXXXXX Branch
     In Partial Fulfillment of the ITT 100 hours
                       Training




Guided By:                               Submitted By:
XXXXXXXXXXXX                                XXXXX XXXX
Center in Charge                            EROXXXXX
XXXXX
Faculty
ITT, Institute of
Chartered Accountants of India,
XXXXXXX Branch


                             Year 2012

  ITT 100 Hours Training Under Institute Of Chartered Accountants

                              Of India
CERTIFICATE


This is to certify that XXXXX XXXX a student of ITT 100 hours of

Institute of Chartered Accountants of India has prepared a project

on “COMPUTER BASICS” under my guidance.

She has fulfilled all the requirements needed for preparing the

project report.

I wish her all success in life.




Date:-                                     _________________________

Authorised Signature
                                            ITT Branch, XXXXXX
ACKNOWLEDGEMENT


There is no project that can be completed through an individual effort.

It always takes the contribution of a lot of people. The contribution of

some is direct and of others is indirect. I express my sincere gratitude

towards all those who help me directly and indirectly throughout the

project.


First and foremost, I would like to express my sincere appreciation

and gratitude to XXXXXXXX and XXXXXXX who in the role of

institutional guide offered me their precise guidance, motivation,

suggestion in completing this project work.


My sincere thanks also goes to my parents who have continuously

supported me in this effort.


Finally, I offer my thanks to my fellow group members as without

their co-operation it would not have been possible for the project

report to materialize.


                                        Registration No.:ERO-XXXXXXX
INTRODUCTION



The Internet is a global network of networks.People and organizations
connect into the Internet so they can access its massive store of
shared information.It is an inherently participative medium. Anybody
can publish information or create new services.The Internet is a
cooperative endeavor -- no organization is inchargeof the net.

By the turn of the century, information, including access to the
Internet, will be the basis for personal, economic, and political
advancement. The popular name for the Internet is the information
superhighway. Whether we want to find the latest financial news,
browse through library catalogs, exchange information with
colleagues, or join in a lively political debate, the Internet is the tool
that will take us beyond telephones, faxes, and isolated computers to a
burgeoning networked information frontier.


The Internet supplements the traditional tools we use to gather
information, Data Graphics, News and correspond with other people.
Used skillfully, the Internet shrinks the world and brings information,
expertise, and knowledge on nearly every subject imaginable straight
to your computer.
CONTENTS
Chapter 1:
  Introduction
  Executive Summary
  Objective of the Study
  Resarch Methodology


Chapter 2:
    Search Engine
    How search engine works
    Web Crawling
    Indexing
    Searching
    Bibliography
EXECUTIVESUMMARY
Topic:

     Internet

Sources of Data:

     Internet
     Books

Location of study:

     Guwahati

Institutional Guide:

     Himanshu Haloi(Centre-in-Charge)
     Sagar Nath(Faculty)

Objective:

     The project was basically prepared to view the working and
     importance of Internet.

Data Source:

     Secondary
OBJECTIVEOFTHE STUDY

To gain in-depth knowledge about Internet, both of which are

important tools in the areas of software development and

computer programming.



To understand the structure of flowcharting, the symbols and

steps required to prepare it and the different types of flowcharts.



To analyse the advantages and limitations of Internet and its

extensive use in different feilds.



To comprehend the meaning and types of decision tables and the

steps in the process of making it.



To understand the applications of decision tables in various

fields.
RESEARCHMETHODOLGY
Data is one of the most important and vital aspect of any research
studies. Researches conducted in different fields of study can be
different in methodology but every research is based on data which is
analyzed and interpreted to get information.

Data is the basic unit in statistical studies. Statistical information like
census, population variables, health statistics, and road accidents
records are all developed from data.

Data is important in computer science. Numbers, images and figures
in computer are all data.

Primary Data:

Data that has been collected from first-hand-experience is known as
primary data. Primary data has not been published yet and is more
reliable, authentic and objective. Primary data has not been changed
or altered by human beings, therefore its validity is greater than
secondary data.

Following are some of the sources of primary data.

Experiments: Experiments require an artificial or natural setting in
which to perform logical study to collect data. Experiments are more
suitable for medicine, psychological studies, nutrition and for other
scientific studies. In experiments the experimenter has to keep control
over the influence of any extraneous variable on the results.

Survey: Survey is most commonly used method in social sciences,
management, marketing and psychology to some extent. Surveys can
be conducted in different methods.

     Questionnaire: is the most commonly used method in survey.
     Questionnaires are a list of questions either open-ended or close
     -ended for which the respondent give answers. Questionnaire
can be conducted via telephone, mail, live in a public area, or in
    an institute, through electronic mail or through fax and other
    methods.
    Interview: Interview is a face-to-face conversation with the
    respondent. In interview the main problem arises when the
    respondent deliberately hides information otherwise it is an in
    depth source of information. The interviewer can not only
    record the statements the interviewee speaks but he can observe
    the body language, expressions and other reactions to the
    questions too. This enables the interviewer to draw conclusions
    easily.
    Observations: Observation can be done while letting the
    observing person know that he is being observed or without
    letting him know. Observations can also be made in natural
    settings as well as in artificially created environment.



Secondary Data:

    Data collected from a source that has already been published in
    any form is called as secondary data. The review of literature in
    nay research is based on secondary data.
    Mostly from books, journals, periodicals,internet and electronic
    media.




The methodology used in preparation of this project is mostly
secondary with the help of books and internet .
Search Engine


The World Wide Web is "indexed" through the use of search engines,
which are also referred to as "spiders," "robots," "crawlers," or
"worms". These search engines comb through the Web documents,
identifying text that is the basis for keyword searching.

The list below lists several search engines and how each one gathers
information, plus resources that evaluate the search engines:-


Alta Vista
Alta Vista, maintained by The Digital Equipment Corp., indexes the full
text of over 16 million pages including newsgroups. Check out the Alta
Vista Tips page.

Excite Netsearch
Excite includes approximately 1.5 million indexed pages, including
newsgroups. Check out
the Excite NetSearch handbook.

InfoSeek Net Search
Indexes full text of web pages, including
selected newsgroups and electronic
journals. Just under one-half million pages indexed.
Check out the InfoSeek Search Tips.
Inktomi
As of December 1995, the Inktomi search engine
offers a database of approximately 2.8 million indexed Web
documents and promises very fast search retrievals. Results are
ranked in order of how many of your searched terms are used on the
retrieved pages.

Lycos
Lycos indexes web pages (1.5 million +), web page titles, headings,
subheadings, URLs, and significant text.
Search results are returned in a ranked order.

Magellan
Magellan indexes over 80,000 web sites. Search results are ranked and
annotated.

Open Text Index
Indexes full text of approximately 1.3 million pages. Check out the
Open Text Help pages for tips on using this search engine.

WebCrawler
Maintained by America Online, WebCrawler indexes over 200,000
pages on approximately 75,000 web servers. URLs, titles, and
document content are indexed.

WWWW -- World Wide Web Worm
Approximately 250,000 indexed pages; indexed content includes
hypertext, URLs, and document titles.

Yahoo
A favorite directory and search engine, Yahoo has organized over
80,000 Web sites (including newsgroups) into 14 broad categories.
How Search Engine Works?
Each search engine works in a different way. Some engines scan
for information in the title or header of the document; others look
at the bold "headings" on the page for their information. However,
a search engine operates, in the following three steps:


  • Web Crawling:A special
    Software robots called
    spiders built list of word
    found on millions of web
    sites. When a spider is
    building its list , the
    process is called
    web crawling.

  • Indexing: After crawlin
     the contents of each page
     are then analyzed to
     determine how it
     should be indexed.

  • Searching: It means Building a query and submitting it
    through the search engine.
Web Crawling



A Web crawler is a computer program that browses the World
Wide Web in a methodical, automated manner or in an orderly
fashion. This process is called
Web crawling or spidering.
Many sites, in particular search
 engines, use spidering as a
means of providing up-to-date
 data. Web crawlers are mainly
used to create a copy of all the
visited pages for later processing
 by a search engine that will
index the downloaded pages to
provide fast searches. Crawlers
 can also be used for automating
 maintenance tasks on a Web site,
such as checking links or validating HTML code. Also, crawlers
can be used to gather specific types of information from Web
pages, such as harvesting e-mail addresses (usually for
sending spam).
A Web crawler is one type of bot, or software agent. In general, it
starts with a list of URLs to visit, called the seeds. As the crawler
visits these URLs, it identifies all the hyperlinks in the page and
adds them to the list of URLs to visit, called the crawl frontier.
URLs from the frontier are recursively visited according to a set of
policies.
Indexing


Search engine indexing collects, parses, and stores data to
facilitate fast and accurate information retrieval. An alternate
name for the process in the context of search engines designed to
find web pages on the Internet is web indexing. The purpose of
storing an index is to
optimize speed and
performance in finding
relevant documents for
a search query. Without
an index, the search engine
 would scan every document
 in the corpus, which would
 require considerable time
and computing power.
For example, while an index
 of 10,000 documents can
 be queried within
milliseconds, a sequential
 scan of every word in
10,000 large documents could take hours. The additional
computer storage required to store the index, as well as the
considerable increase in the time required for an update to take
place, are traded off for the time saved during information
retrieval.
Searching


When a user enters a query into a search engine, the engine
examines its index and provides a listing of best-matching web
pages according to criteria, usually with a short summary
containing the documents title
and sometimes part of the
text. Most search engines
support the use of the Boolean
operators AND, OR and
NOT to further specify the
search query. Some search
 engines provide an advanced
 feature called proximity
search which allows users
to define the distance between
keywords. The usefulness of
a search depends on the
relevance of
the result set it gives back.
While there may be millions
 of web pages that include a
 particular word or phrase,
some pages may be more
relevant, popular, or
authoritative than others.
Most search engines employ methods to rank the results to
provide the “best” results first.

More Related Content

PDF
IRJET - Chat-Bot Utilization for Health Consultancy
PDF
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
PDF
Sentiment analysis of comments in social media
PDF
IRJET - Election Result Prediction using Sentiment Analysis
PDF
A Study on the Applications and Impact of Artificial Intelligence in E Commer...
PDF
Empowerment Technologies - Module 3
PDF
Sentimental Emotion Analysis using Python and Machine Learning
PDF
SEARCH FOR ANSWERS IN DOMAIN-SPECIFIC SUPPORTED BY INTELLIGENT AGENTS
IRJET - Chat-Bot Utilization for Health Consultancy
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
Sentiment analysis of comments in social media
IRJET - Election Result Prediction using Sentiment Analysis
A Study on the Applications and Impact of Artificial Intelligence in E Commer...
Empowerment Technologies - Module 3
Sentimental Emotion Analysis using Python and Machine Learning
SEARCH FOR ANSWERS IN DOMAIN-SPECIFIC SUPPORTED BY INTELLIGENT AGENTS

What's hot (17)

PDF
SEARCH FOR ANSWERS IN DOMAIN-SPECIFIC SUPPORTED BY INTELLIGENT AGENTS
PDF
IRJET- Sentimental Analysis of Twitter Data for Job Opportunities
PDF
A42020106
PDF
Week 2 computers, web and the internet
PDF
Exploring Machine Learning for Libraries and Archives: Present and Future
PPTX
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
DOCX
Theories presented
PDF
IJSRED-V2I2P20
PPT
Accessing internet resources best practices
PDF
Online Data Preprocessing: A Case Study Approach
PDF
A Pedagogical Approach to Web Scale Discovery User Interface
PPT
Bioinformatioc: Information Retrieval
DOC
Example R&D Project Report
PDF
IRJET - Unauthorized Terror Attack Tracking System using Web Usage Mining
PDF
Decision Support for E-Governance: A Text Mining Approach
PDF
Enterprise Systems - MS809
PPTX
Automated metadata creation - Possibilities and pitfalls
SEARCH FOR ANSWERS IN DOMAIN-SPECIFIC SUPPORTED BY INTELLIGENT AGENTS
IRJET- Sentimental Analysis of Twitter Data for Job Opportunities
A42020106
Week 2 computers, web and the internet
Exploring Machine Learning for Libraries and Archives: Present and Future
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Theories presented
IJSRED-V2I2P20
Accessing internet resources best practices
Online Data Preprocessing: A Case Study Approach
A Pedagogical Approach to Web Scale Discovery User Interface
Bioinformatioc: Information Retrieval
Example R&D Project Report
IRJET - Unauthorized Terror Attack Tracking System using Web Usage Mining
Decision Support for E-Governance: A Text Mining Approach
Enterprise Systems - MS809
Automated metadata creation - Possibilities and pitfalls
Ad

Viewers also liked (20)

DOCX
Project Report on Computer (Basics, MS Word, MS Powerpoint, Email)
PPT
Introduction to Basic Computer Concepts Presentation
PPT
Computer presentation
PPT
basics of computer system ppt
PPTX
Microsoft word presentation
PPT
Computer project..................
PDF
Stratergic management mcom
DOCX
The Company Secretary
DOCX
COMPUTER MEMORY
PPT
History of the computer electronic age
DOCX
Fashion Design Project
PPT
Highlights in Computer History
PPS
Itt project
PPT
History of Computers - Grade 6
PPT
Computer History
PPT
Microsoft Excel Project 1 Presentation
PDF
Present your research project in 10 simple slides
DOCX
Computer science project work
PPTX
TYPES OF MEMORIES AND STORAGE DEVICE AND COMPUTER
PPT
History of Computers
Project Report on Computer (Basics, MS Word, MS Powerpoint, Email)
Introduction to Basic Computer Concepts Presentation
Computer presentation
basics of computer system ppt
Microsoft word presentation
Computer project..................
Stratergic management mcom
The Company Secretary
COMPUTER MEMORY
History of the computer electronic age
Fashion Design Project
Highlights in Computer History
Itt project
History of Computers - Grade 6
Computer History
Microsoft Excel Project 1 Presentation
Present your research project in 10 simple slides
Computer science project work
TYPES OF MEMORIES AND STORAGE DEVICE AND COMPUTER
History of Computers
Ad

Similar to COMPUTER BASICS (20)

PPTX
L47 slides
PPTX
DOCX
Final baxis patel
PPT
Analyse and present research information Jan 2007
PPTX
ICT in teacher education, fundamentals of computer
PPTX
Phd presentation
PPTX
Databasepowerpoint.pptx1
PPTX
Jabnernako
PPTX
Jabnernako
PDF
Mis pdf
DOC
1 01 Notes Internet Search Tools T
PDF
Unit 2 tk-technology in business research - thulasi krishna
PPT
Internet Searching Version2
PPTX
Information Search Strategy for Academic Research
PPTX
Internet & Library Use 2022 .pptx
PDF
Public Domain Software and Databases.pdf
DOC
Internet browsing techniques
PDF
IST 561 Spring 2007--Session7, Sources of Information
PDF
unit-5 (2).pdf
L47 slides
Final baxis patel
Analyse and present research information Jan 2007
ICT in teacher education, fundamentals of computer
Phd presentation
Databasepowerpoint.pptx1
Jabnernako
Jabnernako
Mis pdf
1 01 Notes Internet Search Tools T
Unit 2 tk-technology in business research - thulasi krishna
Internet Searching Version2
Information Search Strategy for Academic Research
Internet & Library Use 2022 .pptx
Public Domain Software and Databases.pdf
Internet browsing techniques
IST 561 Spring 2007--Session7, Sources of Information
unit-5 (2).pdf

Recently uploaded (20)

PDF
Journal of Dental Science - UDMY (2020).pdf
PDF
0520_Scheme_of_Work_(for_examination_from_2021).pdf
PPTX
ACFE CERTIFICATION TRAINING ON LAW.pptx
PPT
REGULATION OF RESPIRATION lecture note 200L [Autosaved]-1-1.ppt
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PDF
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
PDF
M.Tech in Aerospace Engineering | BIT Mesra
PDF
Farming Based Livelihood Systems English Notes
PPTX
Thinking Routines and Learning Engagements.pptx
DOCX
Ibrahim Suliman Mukhtar CV5AUG2025.docx
PDF
MICROENCAPSULATION_NDDS_BPHARMACY__SEM VII_PCI Syllabus.pdf
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PDF
Lecture on Viruses: Structure, Classification, Replication, Effects on Cells,...
PDF
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
PDF
Hospital Case Study .architecture design
PDF
Laparoscopic Colorectal Surgery at WLH Hospital
PDF
Journal of Dental Science - UDMY (2022).pdf
PDF
Civil Department's presentation Your score increases as you pick a category
PPTX
Macbeth play - analysis .pptx english lit
PDF
The TKT Course. Modules 1, 2, 3.for self study
Journal of Dental Science - UDMY (2020).pdf
0520_Scheme_of_Work_(for_examination_from_2021).pdf
ACFE CERTIFICATION TRAINING ON LAW.pptx
REGULATION OF RESPIRATION lecture note 200L [Autosaved]-1-1.ppt
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
M.Tech in Aerospace Engineering | BIT Mesra
Farming Based Livelihood Systems English Notes
Thinking Routines and Learning Engagements.pptx
Ibrahim Suliman Mukhtar CV5AUG2025.docx
MICROENCAPSULATION_NDDS_BPHARMACY__SEM VII_PCI Syllabus.pdf
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
Lecture on Viruses: Structure, Classification, Replication, Effects on Cells,...
LIFE & LIVING TRILOGY - PART (3) REALITY & MYSTERY.pdf
Hospital Case Study .architecture design
Laparoscopic Colorectal Surgery at WLH Hospital
Journal of Dental Science - UDMY (2022).pdf
Civil Department's presentation Your score increases as you pick a category
Macbeth play - analysis .pptx english lit
The TKT Course. Modules 1, 2, 3.for self study

COMPUTER BASICS

  • 1. A Project report on COMPUTER BASIC Submitted to The Institute of Chartered Accountant of India XXXXXXX Branch In Partial Fulfillment of the ITT 100 hours Training Guided By: Submitted By: XXXXXXXXXXXX XXXXX XXXX Center in Charge EROXXXXX XXXXX Faculty ITT, Institute of Chartered Accountants of India, XXXXXXX Branch Year 2012 ITT 100 Hours Training Under Institute Of Chartered Accountants Of India
  • 2. CERTIFICATE This is to certify that XXXXX XXXX a student of ITT 100 hours of Institute of Chartered Accountants of India has prepared a project on “COMPUTER BASICS” under my guidance. She has fulfilled all the requirements needed for preparing the project report. I wish her all success in life. Date:- _________________________ Authorised Signature ITT Branch, XXXXXX
  • 3. ACKNOWLEDGEMENT There is no project that can be completed through an individual effort. It always takes the contribution of a lot of people. The contribution of some is direct and of others is indirect. I express my sincere gratitude towards all those who help me directly and indirectly throughout the project. First and foremost, I would like to express my sincere appreciation and gratitude to XXXXXXXX and XXXXXXX who in the role of institutional guide offered me their precise guidance, motivation, suggestion in completing this project work. My sincere thanks also goes to my parents who have continuously supported me in this effort. Finally, I offer my thanks to my fellow group members as without their co-operation it would not have been possible for the project report to materialize. Registration No.:ERO-XXXXXXX
  • 4. INTRODUCTION The Internet is a global network of networks.People and organizations connect into the Internet so they can access its massive store of shared information.It is an inherently participative medium. Anybody can publish information or create new services.The Internet is a cooperative endeavor -- no organization is inchargeof the net. By the turn of the century, information, including access to the Internet, will be the basis for personal, economic, and political advancement. The popular name for the Internet is the information superhighway. Whether we want to find the latest financial news, browse through library catalogs, exchange information with colleagues, or join in a lively political debate, the Internet is the tool that will take us beyond telephones, faxes, and isolated computers to a burgeoning networked information frontier. The Internet supplements the traditional tools we use to gather information, Data Graphics, News and correspond with other people. Used skillfully, the Internet shrinks the world and brings information, expertise, and knowledge on nearly every subject imaginable straight to your computer.
  • 5. CONTENTS Chapter 1:  Introduction  Executive Summary  Objective of the Study  Resarch Methodology Chapter 2:  Search Engine  How search engine works  Web Crawling  Indexing  Searching  Bibliography
  • 6. EXECUTIVESUMMARY Topic: Internet Sources of Data: Internet Books Location of study: Guwahati Institutional Guide: Himanshu Haloi(Centre-in-Charge) Sagar Nath(Faculty) Objective: The project was basically prepared to view the working and importance of Internet. Data Source: Secondary
  • 7. OBJECTIVEOFTHE STUDY To gain in-depth knowledge about Internet, both of which are important tools in the areas of software development and computer programming. To understand the structure of flowcharting, the symbols and steps required to prepare it and the different types of flowcharts. To analyse the advantages and limitations of Internet and its extensive use in different feilds. To comprehend the meaning and types of decision tables and the steps in the process of making it. To understand the applications of decision tables in various fields.
  • 8. RESEARCHMETHODOLGY Data is one of the most important and vital aspect of any research studies. Researches conducted in different fields of study can be different in methodology but every research is based on data which is analyzed and interpreted to get information. Data is the basic unit in statistical studies. Statistical information like census, population variables, health statistics, and road accidents records are all developed from data. Data is important in computer science. Numbers, images and figures in computer are all data. Primary Data: Data that has been collected from first-hand-experience is known as primary data. Primary data has not been published yet and is more reliable, authentic and objective. Primary data has not been changed or altered by human beings, therefore its validity is greater than secondary data. Following are some of the sources of primary data. Experiments: Experiments require an artificial or natural setting in which to perform logical study to collect data. Experiments are more suitable for medicine, psychological studies, nutrition and for other scientific studies. In experiments the experimenter has to keep control over the influence of any extraneous variable on the results. Survey: Survey is most commonly used method in social sciences, management, marketing and psychology to some extent. Surveys can be conducted in different methods. Questionnaire: is the most commonly used method in survey. Questionnaires are a list of questions either open-ended or close -ended for which the respondent give answers. Questionnaire
  • 9. can be conducted via telephone, mail, live in a public area, or in an institute, through electronic mail or through fax and other methods. Interview: Interview is a face-to-face conversation with the respondent. In interview the main problem arises when the respondent deliberately hides information otherwise it is an in depth source of information. The interviewer can not only record the statements the interviewee speaks but he can observe the body language, expressions and other reactions to the questions too. This enables the interviewer to draw conclusions easily. Observations: Observation can be done while letting the observing person know that he is being observed or without letting him know. Observations can also be made in natural settings as well as in artificially created environment. Secondary Data: Data collected from a source that has already been published in any form is called as secondary data. The review of literature in nay research is based on secondary data. Mostly from books, journals, periodicals,internet and electronic media. The methodology used in preparation of this project is mostly secondary with the help of books and internet .
  • 10. Search Engine The World Wide Web is "indexed" through the use of search engines, which are also referred to as "spiders," "robots," "crawlers," or "worms". These search engines comb through the Web documents, identifying text that is the basis for keyword searching. The list below lists several search engines and how each one gathers information, plus resources that evaluate the search engines:- Alta Vista Alta Vista, maintained by The Digital Equipment Corp., indexes the full text of over 16 million pages including newsgroups. Check out the Alta Vista Tips page. Excite Netsearch Excite includes approximately 1.5 million indexed pages, including newsgroups. Check out the Excite NetSearch handbook. InfoSeek Net Search Indexes full text of web pages, including selected newsgroups and electronic journals. Just under one-half million pages indexed. Check out the InfoSeek Search Tips.
  • 11. Inktomi As of December 1995, the Inktomi search engine offers a database of approximately 2.8 million indexed Web documents and promises very fast search retrievals. Results are ranked in order of how many of your searched terms are used on the retrieved pages. Lycos Lycos indexes web pages (1.5 million +), web page titles, headings, subheadings, URLs, and significant text. Search results are returned in a ranked order. Magellan Magellan indexes over 80,000 web sites. Search results are ranked and annotated. Open Text Index Indexes full text of approximately 1.3 million pages. Check out the Open Text Help pages for tips on using this search engine. WebCrawler Maintained by America Online, WebCrawler indexes over 200,000 pages on approximately 75,000 web servers. URLs, titles, and document content are indexed. WWWW -- World Wide Web Worm Approximately 250,000 indexed pages; indexed content includes hypertext, URLs, and document titles. Yahoo A favorite directory and search engine, Yahoo has organized over 80,000 Web sites (including newsgroups) into 14 broad categories.
  • 12. How Search Engine Works? Each search engine works in a different way. Some engines scan for information in the title or header of the document; others look at the bold "headings" on the page for their information. However, a search engine operates, in the following three steps: • Web Crawling:A special Software robots called spiders built list of word found on millions of web sites. When a spider is building its list , the process is called web crawling. • Indexing: After crawlin the contents of each page are then analyzed to determine how it should be indexed. • Searching: It means Building a query and submitting it through the search engine.
  • 13. Web Crawling A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for sending spam). A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.
  • 14. Indexing Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. An alternate name for the process in the context of search engines designed to find web pages on the Internet is web indexing. The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Without an index, the search engine would scan every document in the corpus, which would require considerable time and computing power. For example, while an index of 10,000 documents can be queried within milliseconds, a sequential scan of every word in 10,000 large documents could take hours. The additional computer storage required to store the index, as well as the considerable increase in the time required for an update to take place, are traded off for the time saved during information retrieval.
  • 15. Searching When a user enters a query into a search engine, the engine examines its index and provides a listing of best-matching web pages according to criteria, usually with a short summary containing the documents title and sometimes part of the text. Most search engines support the use of the Boolean operators AND, OR and NOT to further specify the search query. Some search engines provide an advanced feature called proximity search which allows users to define the distance between keywords. The usefulness of a search depends on the relevance of the result set it gives back. While there may be millions of web pages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. Most search engines employ methods to rank the results to provide the “best” results first.