Search engine is a computer program that searches documents on the internet containing terms being searched by a user. This study discusses the search engines, an effective tool for library professionals. The study determines the various aspects of search engine including background of search engines, and how search engines work. Further, it analyses the internet search techniques, i.e., basic, advanced and refine search. The paper highlights the effective use in searching information on internet on the basis of Boolean operators AND, OR, NOT and proximity searching, etc. Finally, it highlights the categories of search engines.

Keywords:   Internet search engines  information retrieval  library professional  user interface

The world of information and communication technology (ICT) is undergoing rapid changes in the history of civilization. With the big advance in technology and the growth of the amount of content on internet, it has become difficult for users to find and utilise information and for content providers to classify and catalogue documents. It was very time consuming for users to browse and to get the required information from the net1. The internet has become a worldwide data communication system, changing the way people look for information. The internet has brought new forms of social interactions, networks, and online activities because of its accessibility and availability2. The growth of the internet has had a revolutionary effect on society. It changes the obstacle of the distance from the communication process. Web is a practical interface to a compound network of computers and data. According to Internet World Stats (IWS), in Asia alone, there are about 1386.2 millions of internet users which are the largest number of internet users (45.7 %). The growth of world total internet users in 2013-2014 is 741.0 % (Table 1)3

Search engines play a vital role in providing the exact or nascent digital information to the users. With the passage of time, several techniques and technologies have emerged for handling the information more speedily and effectively. Kumar & Kumar4 examined how Indian academics used search engines to retrieve information. Jansen & Spink5 examines characteristics and changes in web searching from nine studies of five web search engines based in the US and Europe. Singh6 analysed the search engines are the promoters of information diffusion. Brin & Page7 designed to crawl and index the web efficiently and produce much more satisfying search results than existing systems of search technique.The seminal work by McCown8, et al. evaluated three search engines namely Google, MSN, and Yahoo for harvesting OAI-PMH resource corpus using 10 million records from 776 OAI-PMH repositories. Holscher & Strube9 pointed out that experienced and novice users construct searches differently. Hellgren10 explored the implementation of the OAI-PMH and it reveals that users have come to expect instant and simple access to qualitative information resources through the use of internet search engines. Boston11 explored application of technologies such as the OAI-PMH to share deep web content through search engines and disclose that users can easily find information from the deep web using popular search engines. Cole & Warner12 provided an overview of emerging guidelines and best practices for OAI data providers or source provider.

Keane, O'Brien & Smyth13 focused search engines as one of the most used resources on the internet. Xiang & Margan14 described the design and implementation of light weight protocols and open source tools. These protocols and tools are employed to collect, organise, archive and disseminate information freely available on the Internet. Again, web users should be aware that limiting searches to single SEs results in missing substantial pieces of information ranked highly by other SEs and directories15 GLT’s cross–language search consists of three major processes: Query translation, search, and machine translation of result pages. Google is the most used web search engine16. A study by Sompel17 described that OAI-PMH repositories have been directly overlaid with an interface that allows users to navigate the contained metadata by means of a web browser.

The study determines the search engines is an effective tool for library professionals. The study covers the various aspects of search engines including background of search engines, and how search engines work. Further to know the categories of search engines. Internet search techniques, i.e. basic, advanced and refine search are analysed. Furthermore, the study tries to know their effective use in searching information on internet on the basis of Boolean operators AND, OR, NOT and proximity searching, etc.

The theoretical and hypothetical methods were used for information gathering. Data and information has been collected from national as well as international research journal in information and computer science subjects from various recognised websites. The time period of the study was from March-August 2015.

Search engines are tools for finding, classifying, and storing information on various websites on the internet. They can help in locating information of relevance on a particular subject by using various search methods18

Encyclopedia Britannica defines “search engine is a ‘computer program to find answers to queries in collection of information, which might be a library catalogue or a database but is most commonly the world wide web. A Web search engine produces a list of ‘pages’ computer files listed on the Web-that contain the terms in a query. Most search engines allow the user to join terms with ‘and’, ‘or’, and ‘not’ to refine queries. They may also search specifically for images, videos, or news articles or for names of websites’’19

According to Computing Dictionary, ‘search engine is a program that allows users to locate specified information from a database or mass of data. Search engine sites are extremely popular on the world wide web because they allow users to quickly shift through millions of documents on the internet’20

Dictionary of Computing and Digital Media defines search engine as, ‘a database from that allows a user to seek information on the internet by keyword. Search engines may look for titles of documents, URLs, headers, or text’21

In Que's Computer & internet Dictionary defines search engine as, ‘a program that locates needed information in a database, but especially an internet-accessible search service that enables users to search for information on the internet’22. According to Beiser, ‘a search engine enables a user of e-data resources to quickly locate the specific information desired from within a large volume of mostly unrelated extraneous information’23

Moreover, one can say that search engines have become an essential tool in internet usage, and particularly in searching the world wide web.Internet World Usage and population statistics June 30, 2014 estimated 3,035,749,340 peoples around the world were online or 42.3 % of the total world population Table 1. StatCounter Global Stats (SCGS) focused Google had 92.2 % of the total search engine market, Bing (3.73 %), Yahoo! (3.43 %), Baidu (0.53 %), Ask (0.42 %) and others search engines are using (0.13 %). Google’s world dominance is clear, but in leading markets China, Japan, Russia, and South Korea, local favorites draw many more searchers than Google (Fig.1)24

The creators of each search engine try to develop mechanisms that would allow their search engine to work more efficiently than others and, thus, make it more popular among the users. Though, there are some general rules that apply to the way every search engine works. Each search engine performs three main tasks: (a) Searches for web pages available in the www and stores information about them, (b) indexes the retrieved information about the web pages found and, consequently, a database is created, and (c) allows for the users to search its database/index through an interface providing searching facilities and options which the user can use at his or her discretion16

For the first task, search engines use computer programs (i.e., software) called bots also known as robots, spiders, (web) crawlers, worms, intelligent agents, knowledgebots or knowbots. Whatever name is used to refer to them, they all perform the same function: They ‘surf’ or ‘crawl’ the web by following links from one webpage or website to the next and collect information which they store in their database25. Further, new websites are made available constantly and search engines have to make sure that the results they display to their users are up-to-date so as for them to be competitive in the search engine market. Usually, spiders do not work one at a time to cover a search engine’s needs. At its peak performance, using four spiders, their system could crawl over 100 pages per second, generating around 600 kilobytes of data each second26. Spiders collect data to be analysed to produce indexes to be kept within the search engine’s database. What is to be indexed depends on how each search engine has decided to use the information available on each of the web pages collected. Some search engines uses the full-text provided, some keep part of the original mark-up tags and some others take into consideration both content and link when building indexes based on the three most popular models for information retrieval: Boolean, vector space and probabilistic. Each search engine will emphasise and base their indexes on different aspects and content of websites with different strategy followed both during the gathering and the indexing of information16

Some important categories of search engines can be summarised as:

These search engines compile their own searchable databases on the web. Crawler or worm programs generate databases by means of web robots. These robots are programs that reside on a host computer and retrieve information from sites on the web using standard protocols. In effect, they automatically travel the internet following links from documents and collecting information according to the HTML structure of the documents (i.e., URL, document title, keywords, etc.), about the resources, they come across (Table 2).

Directories are the yellow pages of the internet. They contain information that has been submitted to them by their indexers or by users who submit entries. The subject directories are often manually maintained, browsable, and searchable web-based interfaces27

Yahoo! is the most famous subject directory. Yahoo! has several subject headings. A subject directory contains information that is organised into categories and subcategories or topics and subtopics. Like a search engine, one can search a subject directory for all entries that contain a particular set of keywords. Although directories can be searched using keywords, it is often as easy to click on a category, and then click through specific subdirectories until one finds the desired subject. These directories offer access to the information that has been classified into certain categories. They group the internet websites into categories such as Internet tutorials, universities, museum, etc. (Table 3)28.

Meta-search engines, also known as mega indexes, do not have their own searchable database. They utilise databases maintained by other individual search engines. A meta-search engine accepts a single query from the user and sends it to multiple search engines in parallel. Such search engines are also called multi-threaded search engines. Ask Jeeves, MetaCrawler, Savvy Search, @Once!, All-in-One Search Page, Internet Sleuth, Magellan, Net Search, Dogpile, Metacrawler, Metafind, Metasearch, ixquick.com are some of the betterknown meta search engines (Table 4).

These tools are classified as ‘browsing companions’ or ‘browser search bots’. Software tools are similar to meta-search engines. One of their features is that results that can be saved to hard disk for subsequent retrieval. Examples include Web sleuth, Copernic 98, Mata Hari, etc.

Subject-specific search engine concentrate on one particular topic and often provide better access to information than powerful keyword indexes. Besides, subject-specific search engines as an organised and structured guide to Internet-based e-information resources that are carefully selected after a predefined process of evaluation and filtration in a subject area or specialty. For instance, Health A to Z, Math, Agri, Surf, Law Crawler, etc.Furthermore,subject portals are LibrarySpot, Librarian’s Index to Internet, Argus Clearing House, BIOME, BUBL, etc. (Table 5).

Geographic web search engines allow users to query on a particular geographic region. The concern of geographic area is to narrow down the searches. Also the web is being customised in different national and regional language giving space for the webpages in different languages. This too indicates the need of a geographic area specific search engine, since languages are geographic area bound. For example, as Chinese language content becomes more readily available on the Web, there are more than 10 Chinese search engines available on the net (Table 6).

In the past, a search engine located either crawler-based results or human-powered listings. Nowadays, it is very common for search engines to present both types of results. Usually, a hybrid search engine favours one type of listings over the other. For example, MSN search is more likely to present human-powered listings from LookSmart. However, it does also present crawler-based results, especially for more obscure queries29.

Basic features of internet search techniques are:

Subject directories are created and maintained by the directory's staff, allowing users to browse internet resources by different subject categories, and enable users to search by keywords within the contents of the directories. For example, a search conducted on Google search engine and one of the results happens to be in the Google’s Directory. Google will offer a link to that section of the directory. Similarly, results for a search conducted in the Google directory, are arranged according to PageRank, which is Google’s all-important measure of ‘link popularity’.

Searching is one of the most popular activities on the internet. Search engines have become an essential part of everyone’s lives. There are big three search engines of Google, Yahoo! and Bing already. When a user makes a query to the search engine, the query is checked against the search engine’s index of all the web pages and relevant documents with their URLs are returned as hits. These hits are ranked in order of relevance with the best results on the top. Most search engines offer two types of interfaces to search their databases, i.e., basic search and advanced search.

Most search engines offer a dialog box, pane or a dialog line where search terms can be keyed-in followed by options to either submit or clear the search. Most search engines query their database using keywords. A user enters a word or words called ‘keywords or search terms’ that he or she would like to search. The search engine then looks through its indexes in the database for matches. It might look in the title, description or entire text of a webpage.

Different search engines have different methods of refining queries. The advanced or refine search were located under titled ‘Refining Your Search Results-Yahoo Help’ as well as different search engines such as Google, Yahoo! and MSN, etc. were identified. Options for advanced search differ from one search engine to another, but some of the common features include ability to search on more than one word, to confine the search to a specified field and to exclude a word that is not required in a search by the user. A user may also search for proper names, phrases, and on words that are found within a certain proximity to other search terms30.

Boolean logic was devised by George Boole in 184731. Boolean operators link search terms: AND, OR and NOT. These are usually explained using Venn diagrams, such as:

(i) ‘AND’ Connector

In two terms the search strategy are linked using the logical operator AND then the output will be the items in which both the terms are used in title or abstract. This increases the specificity and ensures greater precision, which means only relevant documents will be listed on the computer screen as a result of the search (Fig. 2).

Moreover, the Boolean AND connection is search sometimes referred to as: intersection (\) or conjunction (L). For Example, if a user is searching for documents in "Potato Blight" it is required to select the combination search option and give the two words in the combination search connected by the term AND.

(ii) ‘OR’ Connector

The OR connector helps the user to search for documents using alternate terms. If, two terms are linked using OR logic in a search the output will be a list of documents which contain any one of these terms in title or abstract content (Fig. 3). Moreover, OR is used to link together synonyms, lexical and morphological variants, and terms that are close in meaning in the context of a particular search. However, one can use mathematical formula, i.e., alternative terms and notation for OR are: union (|); disjunction (V).

(iii) ‘NOT’ Connector

The use of NOT is to exclude particular terms. The output of such search will exclude documents, which contain the term right to the operator NOT in the search strategy. Besides, the use of the NOT connector is to avoid retrieving irrelevant documents. For example ‘promotion NOT advertising’ might be used to exclude items concerning the promotion of goods and services from a search on job promotion (Fig. 4). Though, one can use mathematical formula, that is, alternative terms and notation for NOT are: complement (–); negation (~).

The ability to query on phrases is very important in a search engine. A phrase is a group of words that must appear next to each other in a specified order. Most search engines support this feature. It can be used when the search terms appear in an exact order. To indicate a phrase, surround it in double quotation marks (“ ”). For example: “Web-based library services”

Proximity is the search technique used to find two words next to, near, or within a specified distance of each other within a document. It is used to specify the relative location of words in a document. These operators facilitate searching for words that must be in the same phrase, paragraph, or sentence in a record. For example, a search may require that two concepts be in the same sentence but not necessarily next to each other, as in a phrase. One such operator is NEAR which means that the terms that are entered should be within a certain number of words to each other.

Most search engines permit the use of parentheses to group related terms. This is particularly useful for clustering synonyms or for searching specific terms together before other terms is searched. Parentheses may be used in combination with other search techniques. For example: (Library Computerisation or Library ‘near’ Automation) and India.

Truncation (stemming) is a technique that facilitates search for multiple endings of a word.Most search engines that support this feature use certain symbols (called ‘wild card’) such as *,? $ or # at the end of the word root to indicate a truncated search.

The following examples show how to construct a search using truncation:

(a) class* (retrieves class, classification, classify, class no.)

(b) catalogu* (retrieves catalogue, catalogues, cataloguing, catalogue, catalogues, catalogued)

It is generally a good idea to truncate longer word roots. Truncating shorter roots, such as cat*, will usually result in several hits, with many irrelevant results. Wildcards can also be used for internal truncation. Internal truncation is generally used to search for words that may differ by one or two letters within a word. Some instances are words with British and American spellings. Following indicate how to use the wildcard feature:

(a) wom*n (retrieves woman or women)

(b) colo*r (retrieves colour or color)

(c) hono*r (retrieves honour or honor)

Wild cards (* in the example) may vary from search engines to search engines.

Case sensitive searching allows searches for words that differ in meaning based on different use of uppercase and lowercase letters. Most search engines are not case sensitive and will simply read all letters as lowercase. Others may distinguish between the word ‘aids’ and the disease ‘AIDS’. Using lower case is advised, because lower case will always retrieve upper case as well. For example, the month of March is very different from what a marching band does. A person from Poland is Polish, but you polish silver32

A typical web page is composed of the major fields: title, host (or site), domain, URL, and link. Where available, field searching on the web is a very powerful tool. It allows users to specify exactly where users want the search engine to look in the web document33. Further, ‘Fields’ are the various pieces of information that library databases (like Summon, EBSCO, ProQuest, etc.) keep for each item that they search. Each time users search for a keyword, the database looks for it in the fields.

A concept search is an automated information retrieval method that is used to search electronically stored unstructured text (for example, digital archives, email, scientific literature, etc.) for information that is conceptually similar to the information provided in a search query34. Moreover, it is a keyword search systems, concept-based search systems try to determine what a user means. In the best circumstances, a concept-based search returns hits on documents that are ‘about’ the subject/theme that is being explored, even if the words in the document do not precisely match the query35,6.

The natural language searching feature allows a user to search in the same language as it is spoken. Suppose a user wants to know who Mahatma Gandhi was? His/her query would be: “Who was Mahatama Gandhi?” The search engine that supports this technique retrieves relevant webpages that would answer this question36.

Millions of people around the world use search engine almost every day. To expand knowledge easily, there is no doubt that everyone must have something he did not know. In ancient times when people wanted to know something beyond their abilities, they often went for books or knowledgeable persons. But nowadays, one can simply put some keywords in search box and then in less than second, thousands of useful answers will show on computer screen. It saves the precious time to users.

But search engine also has disadvantages. Firstly, search engine provide way too much useless results. Sometimes one even cannot find anything useful from searching results. It wastes much time to pick up useful information from ocean of searching results. Second, those who use search engine frequently may become lazy. Every time they meet difficulties they just go for search engine. Third, search engine may bring people to various pornographic websites.

However, search engine can benefit a lot, one needs to use it carefully to gain what one wants and avoid harmful information.

Search-engines are among the most used resources on the internet. Google, for example, now hosts over eight billion items and returns answers to queries in a fraction of a second, thus realising some of the most incredible predictions envisioned by the pioneers of the world wide web37,7 . Further internet search engines are considered the biggest source of information and finds an important place in libraries as quickest means to access information at any time. But it requires the help of search engines for the effective and optimum use. However, search engine is an aid to find pin-pointed information to save time of the users.

1. Kadyan, S. & Singroha, R. Web 3.0 in library services: An utilitarian effects. J. of Inf. Manag., 2014, 1(2),159-66.

2. Zhang, J. & Fei, W. Search engines’ responses to several search feature selections. The Int. Inf. & Lib. Rev., 2010, 42(3), 212-25.

3. World Internet Usage Statistics. 2014. www.internetworldstats.com/stats.htm

4. Kumar, B.T.S. & Kumar, G.T. Search engines and their search strategies: The effective use by Indian academics. Program: Elec. Lib. and Inf. 2013, 47(4), 437-49.

5. Jansen, B.J. & Spink, A. How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Inf. Proc. & Manag., 2006, 42(1), 248-63.

6. Singh, J. Search engines: The propellers of information diffusion. Inter. J. of Inf. Dissem. & Tech., 2011, 1(1), 44-6.

7. Brin, S. & Page, L. Reprint of: The anatomy of a large-scale hypertextual web search engine. Computer Networks, 2012, 56(18), 3825-33.

8. McCown, F.; Liu, Xiaoming; Nelson, M.L. & Zubair, M. Search engine coverage of the OAI-PMH corpus. Internet Computing, 2006, 10(2), 66-73.

9. Hölscher, C. & Strube, G. Web search behavior of internet experts and newbies. Computer Networks 2000, 33(1), 337-46.

10. Hellgren, T. OAI compatibility: Exposing metadata of scientific publications, 2004. http://www2.db.dk/NIOD/hellgren.pdf.

11. Boston, Tony. Exposing the deep web to increase access to library collections, 2005. http://ausweb.scu.edu.au/aw05/index

12. Cole, Tim & Warner, Simeon, M. OAI-PMH repositories: Quality issues regarding metadata and protocol compliance, 2005. http://eprints.rc lis.org/arc hive/00005502/

13. Keane M.T.; O'Brien, M. & Smyth, B. Are people biased in their use of search engines. Communications of the ACM, 2008, 51(2), 49-52.

14. Xiang, X. & Margan, E.L. Leight-weight protocals and open source tools to implement digital library collections and services. D-Lib Magazine, 2005, 11(10). http://www.dlib.org/dlib/october05/morgan/lOmorgan.html

15. Isfandyari, Moghaddam, A. & Parirokh, M. A comparative study on overlapping of search results in metasearch engines and their common underlying search engines. Library Review, 2006, 55(5), 301-06.

16. Dudek, D.; Mastora, A. & Landoni, M. Is google the answer? A study into usability of search engines. Library Review, 2007, 56(3), 224-33.

17. Sompel, Herbert Van de; Young, Jeffrey A. & Hickey, Thomas B. Using the OAI-PMH differently. D-Lib Magazine, 2003, 9(7/8). http://www.dlib.org/dlib/july03/young/07young.html

18. Rowley, J. The Electronic Library, Ed. 4. Facet Publishing, London, 1998, pp.186-7.

19. Encyclopaedia Britannica. Search engines,1926. http://global.britannica.com/EBchecked/topic/1017484/search-engine

20. Ovlce, P.C. Computing dictionary, Ed. 4. Sandhills, Lincoln, NE., 1999. pp. 247-48.

21. Hansen, B. Dictionary of computing & digital media: Terms & acronyms. BPB Publications, New Delhi, 2000. pp. 276.

22. Pfaffenberger, B. Que's computer & lnternet dictionary. Ed. 4, Prentice-Hall of India, New Delhi, 1996. pp. 465.

23. Beiser, K. search engines for library websites. Lib. Tech. Rep., 2000, 36(3), 2-60.

24. Global State Counter. 2015. http://gs.statcounter.com/#desktop-search_engine-ww-yearly-2010-2015-bar

25. Barker, J. The BEST search engines, The Library-University of California, Berkeley, 2003. www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html

26. Franklin, C. How internet search engines work: How stuff works, 1998. http://computer.howstuffworks.com/search-engine.htm/printable

27. Winship, I.R. World wide web searching tools: An evaluation. Vine, 1995, 25(2), 49-54.

28. Sugunavaty, C. Web search engines. In Calliber-99: Academic Libraries in Internet Era, edited by PSG. Kumar & C.P. Vashishth, Nagpur, 1999. INFLIBNET, Ahmedabad. pp. 317.

29. Dong, X. & Su, L.T. Search engines on the www and information retrieval from the internet: A review and evaluation. Online and CD-ROM Rev., 1997, 2(2), 67-82.

30. IGNOU. Search engines. IGNOU Booklet, Unit-13, New Delhi, pp-348-55.

31. Hussain, A. & Raza, M.M. Online public access catalogue: It’s development, utility and limitations. IASLIC Bulletin, 2002, 47(4), 204-09.

32. Case sensitive searching. http://casesensitivesearch.com/2015

33. Field searching. http://www.siumed.edu/mrc/research/fieldsearching.html 2015

34. Wikipedia, Free Encyclopaedia. Concept search. 2015. http://en.wikipedia.org/wiki/Concept_search

35. Hussain, A. & Kumar, K. Search engines: An overview. ILA Bulletin, 2006, 42(3), 21-26.

36. Jansen, B.J. & Spink, A. How are we searching the www? A comparison of nine search engine transaction logs. Inf. Proc. & Manag., 2006, 42(1), 248-63.

37. Berners-Lee, T.; Cailliau, R.; Groff, J. & Pollermann, B. WWW: The information universe. Elec. Net. Res., Appli. & Poli., 1992, 2(1), 52-58.

Dr Akhtar Hussain is working as Librarian and Information Officer at King Saud University, Riyadh, Kingdom of Saudi Arabia. He received his doctoral degree of Library and Information Science in 2011 from Department of library & Information Science, Aligarh Muslim University, Aligarh, India. He has published in peer-reviewed International as well as National journals, conferences proceedings and books. His research areas include: Library automation, library management, IT application in libraries, web-based library and information services, etc.