Search Engine Article --- Peter Hayes

PETER HAYES EXAMINES THE POLITICS AND USES OF SEARCH ENGINES

We all know that the Internet is a mine of information, but like their industrial namesake dividing the raw rock from the precious minerals can sometimes takes quite some time and effort!

Naturally the giant mining machines of the Internet are the so-called "search engines" that help the user gain access to the kind of site that will either answer a question or give them background inform on a topic or theme.

We will look more at the strengths and weaknesses of the individual service providers in the second part of this series, but today we look at good searching practise and the best ways to look for information on the Internet.

The first thing that we need to explore is the obvious and the easy. Most major businesses have web addresses that sound like the company name (Sun computers says that it enjoys extra UK hits because people are looking for the newspaper site!) and I've simply guessed a few in my time.

Failing that they should come up on ALL the engines by their name and country alone. Equally some companies even buy up sound alikes/near misses to help the poor typer - although these will simply leap the correct location.

(On a similar note I've come across sites that are dead., but have been sold to others who use the address to make the user jump to them. Whether you approve of this is - or want to go there - is quite another matter!)

If you were to divide search engines in to two parts you could divide them in to those that use robots and those that use humans. Robots add sites to the service by looking for included words (including in special page headers officially called meta tags) and key phrases and are good at doing work in bulk, but tends to store the minor next to the major.

Perhaps the key to the success of Yahoo! (the number one and most visited engine) is the fact that all its sites are vetted by humans. This means that rarely will you come across a page of just pictures and basic hellos, equally they rarely include pages from free cyberspace providers. They do, however, include small sites just as long as they have some business connection or social function.

The first thing to realise when priming your "search box" is that words are prioritised: The first word having more meaning than the second. Equally you should start with the most specific number of words you can think of such as (musical+instruments+Hull+uk) and work your way down by clipping words off.

The next thing to remember is that not all engines are the same. In some cases Yahoo!'s exclusive policy works against it and the basic information is best found on the robotic sites. Equally there is often a backlog of material to be registered and the correct site may be locked up in that.

(This is the problem that many webmasters forget anyway, they register too late thinking that their site will appear in a few days. Five or six weeks if you are lucky!)

Being too general can lead to a whole mountain of information: The last thing you want is 10,000 sites to swim through for that vital piece of information or data!

The first thing that most people do is turn on a computer and then think. Too late. You need a strategy in advance. I was talking (with a friend) about a piece of exploitation television (on Bravo) and talking about "Confessions of a Taxi Driver" and the basic irony that the star, Barry Evans, was murdered while working as a real taxi driver. However I didn't know the full story - until I hit the keypad.

Here typing stuff like "Barry+Evans+actor" will come back empty on all the engines I tried. Although there is little harm in trying such a long shot in any search. Here I need to find a directory of actors and the films that they were in. I therefore worked through these key words and came across a site about British actors (www.uk.imdb.com) which included details of his career and death.

(He was hit over the head by a burglar at his home and died from his injuries, to the best of my knowledge the case remains unsolved.)

The problem with names is that you must be sure that you have the right one. Several authors use my name without my permission, trying to cash in on my name and reputation no doubt. Even more curiously some of them taking on subjects close to my heart!

Never ever use words such as "and", "or" or "the" because this will queer your pitch. Equally words such as "sex" or "mp3" will simply leave you with too much data. Be very careful when looking for sexual education material or health issues - obscene spoofs are not unknown.

(Spoofs often are used as links and form a kind of Internet humour - the Whitehouse site has several spoofs that appear to be real thing, until they are explored...)

Typing "not" will take out examples that don't fit the bill (Arsenal not soccer, for example), but this is hard word to use and control. In Yahoo double meanings are automatically divided out. Also the engine can easily come up with ties to words that you would never think of in a million years - including simple names.

Naturally there is a difference between information and correct information. I trust material that has been published in print and from news agencies far more than from a fifteen year old in his back bedroom. Nevertheless I've come across things that are plain wrong in heavyweight encyclopaedias or is simply opinion presented as fact.
Equally information can go out of date or the site falls in to disuse and the information is no longer valid. The claims of vested interests should also be judged as such and the Internet has many wild claims about commercial interests that would be challenged in other media.

Search engines can be dangerous, even if you are not looking for dodgy information. Without getting in to too much detail there are sexual practises that can be classified under their more innocent references. However most such sites have warnings that the content is for those over 18 - however it is not unknown for the engines to bypass this page.

If you are going to make commercial use of what you find you need to be careful. The biggest joke in this business is that "stealing from one is plagiarism, stealing from many is research!" The real truth is that we journalists borrow from each other all the time and is a common device when interviews do not go so well to include references made to others!

Naturally if you are presenting work on the Internet itself there is no harm in linking to the source of the material and giving credit to the person/organisation that first said it. Links are a different kettle of fish and you can include someone else's collection all you want - there is no copyright issue there.

One of the new ways of searching for information is through multimedia CD's. This can save a lot of time and are ideal for children, because you know that the site has been checked out and given some form of stamp of approval. With all the goodwill in the world even mainstream mediums can dip in to obscene language and show unpleasant scenes.

Tim Berners-Lee invented the hypertext system so that you could leap from document to document with the minimum of effort. However this breaks down on commercial sites who are hardly likely to plug a rival. Equally the most promising of name or title can lead to a dead end of weak and unhelpful sites. Technical subjects being by far the worst culprits.

The one thing that is debated is how much of the Internet is registered with search engines. The most popular theories say that about a sixth of all sites have some kind of a listing and about half could be reached by links. However this is probably just a guess. Nevertheless very little of the unmapped world is significant and a lot of it is just personal sites that will be largely irrelevant to those that don't know the people in question.

Having said that there has been times where a site has been next to useless on itself - but the links have been given have saved simply hours of work on my own part.

Trinity 2002 (C)

Home