PETER
HAYES EXAMINES THE POLITICS AND USES OF SEARCH ENGINES
We all know that the
Internet is a mine of information, but like their
industrial namesake dividing the raw rock from the
precious minerals can sometimes takes quite some time and
effort!
Naturally the giant mining machines of the Internet are
the so-called "search engines" that help the
user gain access to the kind of site that will either
answer a question or give them background inform on a
topic or theme.
We will look more at the strengths and weaknesses of the
individual service providers in the second part of this
series, but today we look at good searching practise and
the best ways to look for information on the Internet.
The first thing that we need to explore is the obvious
and the easy. Most major businesses have web addresses
that sound like the company name (Sun computers says that
it enjoys extra UK hits because people are looking for
the newspaper site!) and I've simply guessed a few in my
time.
Failing that they should come up on ALL the engines by
their name and country alone. Equally some companies even
buy up sound alikes/near misses to help the poor typer -
although these will simply leap the correct location.
(On a similar note I've come across sites that are dead.,
but have been sold to others who use the address to make
the user jump to them. Whether you approve of this is -
or want to go there - is quite another matter!)
If you were to divide search engines in to two parts you
could divide them in to those that use robots and those
that use humans. Robots add sites to the service by
looking for included words (including in special page
headers officially called meta tags) and key phrases and
are good at doing work in bulk, but tends to store the
minor next to the major.
Perhaps the key to the success of Yahoo! (the number one
and most visited engine) is the fact that all its sites
are vetted by humans. This means that rarely will you
come across a page of just pictures and basic hellos,
equally they rarely include pages from free cyberspace
providers. They do, however, include small sites just as
long as they have some business connection or social
function.
The first thing to realise when priming your "search
box" is that words are prioritised: The first word
having more meaning than the second. Equally you should
start with the most specific number of words you can
think of such as (musical+instruments+Hull+uk) and work
your way down by clipping words off.
The next thing to remember is that not all engines are
the same. In some cases Yahoo!'s exclusive policy works
against it and the basic information is best found on the
robotic sites. Equally there is often a backlog of
material to be registered and the correct site may be
locked up in that.
(This is the problem that many webmasters forget anyway,
they register too late thinking that their site will
appear in a few days. Five or six weeks if you are
lucky!)
Being too general can lead to a whole mountain of
information: The last thing you want is 10,000 sites to
swim through for that vital piece of information or data!
The first thing that most people do is turn on a computer
and then think. Too late. You need a strategy in advance.
I was talking (with a friend) about a piece of
exploitation television (on Bravo) and talking about
"Confessions of a Taxi Driver" and the basic
irony that the star, Barry Evans, was murdered while
working as a real taxi driver. However I didn't know the
full story - until I hit the keypad.
Here typing stuff like "Barry+Evans+actor" will
come back empty on all the engines I tried. Although
there is little harm in trying such a long shot in any
search. Here I need to find a directory of actors and the
films that they were in. I therefore worked through these
key words and came across a site about British actors
(www.uk.imdb.com) which included details of his career
and death.
(He was hit over the head by a burglar at his home and
died from his injuries, to the best of my knowledge the
case remains unsolved.)
The problem with names is that you must be sure that you
have the right one. Several authors use my name without
my permission, trying to cash in on my name and
reputation no doubt. Even more curiously some of them
taking on subjects close to my heart!
Never ever use words such as "and",
"or" or "the" because this will queer
your pitch. Equally words such as "sex" or
"mp3" will simply leave you with too much data.
Be very careful when looking for sexual education
material or health issues - obscene spoofs are not
unknown.
(Spoofs often are used as links and form a kind of
Internet humour - the Whitehouse site has several spoofs
that appear to be real thing, until they are explored...)
Typing "not" will take out examples that don't
fit the bill (Arsenal not soccer, for example), but this
is hard word to use and control. In Yahoo double meanings
are automatically divided out. Also the engine can easily
come up with ties to words that you would never think of
in a million years - including simple names.
Naturally there is a difference between information and
correct information. I trust material that has been
published in print and from news agencies far more than
from a fifteen year old in his back bedroom. Nevertheless
I've come across things that are plain wrong in
heavyweight encyclopaedias or is simply opinion presented
as fact.
Equally information can go out of date or the site falls
in to disuse and the information is no longer valid. The
claims of vested interests should also be judged as such
and the Internet has many wild claims about commercial
interests that would be challenged in other media.
Search engines can be dangerous, even if you are not
looking for dodgy information. Without getting in to too
much detail there are sexual practises that can be
classified under their more innocent references. However
most such sites have warnings that the content is for
those over 18 - however it is not unknown for the engines
to bypass this page.
If you are going to make commercial use of what you find
you need to be careful. The biggest joke in this business
is that "stealing from one is plagiarism, stealing
from many is research!" The real truth is that we
journalists borrow from each other all the time and is a
common device when interviews do not go so well to
include references made to others!
Naturally if you are
presenting work on the Internet itself there is no harm
in linking to the source of the material and giving
credit to the person/organisation that first said it.
Links are a different kettle of fish and you can include
someone else's collection all you want - there is no
copyright issue there.
One of the new ways of searching for information is
through multimedia CD's. This can save a lot of time and
are ideal for children, because you know that the site
has been checked out and given some form of stamp of
approval. With all the goodwill in the world even
mainstream mediums can dip in to obscene language and
show unpleasant scenes.
Tim Berners-Lee invented the hypertext system so that you
could leap from document to document with the minimum of
effort. However this breaks down on commercial sites who
are hardly likely to plug a rival. Equally the most
promising of name or title can lead to a dead end of weak
and unhelpful sites. Technical subjects being by far the
worst culprits.
The one thing that is debated is how much of the Internet
is registered with search engines. The most popular
theories say that about a sixth of all sites have some
kind of a listing and about half could be reached by
links. However this is probably just a guess.
Nevertheless very little of the unmapped world is
significant and a lot of it is just personal sites that
will be largely irrelevant to those that don't know the
people in question.
Having said that there has been times where a site has
been next to useless on itself - but the links have been
given have saved simply hours of work on my own part.
Trinity 2002 (C)
|