Jump to content
The Dark Mod Forums

Why do search engines not show all results?


Fidcal

Recommended Posts

Do you really want to dig through 5,800,000 results? ;)

 

But honestly: I think that the number of "results found" is just an estimate. The results shown on google are sorted by the amount of sites linking to them. This means if you would really like to see all results, you would have to search all sites, and then to check all links on all sites to count how much are referring to which site, and then have to sort them. Even a fast sort algorithm would need quite a while to sort 5,800,00 numbers. I mean, they could also be sorted during the search. But if a position of each result would be stored, for example in an integer, this would require more then 20MB of space, which has to be sent to you.

 

Now think of how long that would take, and how long a normal google search takes.

FM's: Builder Roads, Old Habits, Old Habits Rebuild

Mapping and Scripting: Apples and Peaches

Sculptris Models and Tutorials: Obsttortes Models

My wiki articles: Obstipedia

Texture Blending in DR: DR ASE Blend Exporter

Link to comment
Share on other sites

Google, and I believe Yahoo! as well, both filter your results, also. If you were to search for something, and I were to search for the same thing, we'd probably get two different results. (Most of it would match, but you'd have hits I don't have, and vice versa.) They try to filter out the stuff they don't think is relevant to you, based on your previous search history. They're probably also hoping that your answer lies in the first 400 hits, and they won't have to bother with the extra task of offering you 5,800,000 hits.

Link to comment
Share on other sites

Another thing that might be funny. I just did a little "google benchmark", the results:

 

Search word Hits in million Time needed in seconds

tits 402 0.14

thief 71.2 0.18

quantum mechanics 17.6 0.29

 

This does not always fit of course, but I think this shows that after the engine has found a certain number of results, it stops searching, shows the results found and extrapolates the number in regardence to the amount of web space searched.

FM's: Builder Roads, Old Habits, Old Habits Rebuild

Mapping and Scripting: Apples and Peaches

Sculptris Models and Tutorials: Obsttortes Models

My wiki articles: Obstipedia

Texture Blending in DR: DR ASE Blend Exporter

Link to comment
Share on other sites

Okay guys, let me rephrase my question:

 

Why do search engines not show all results? For instance, Google might say it has 1000 results but only shows about 400.

 

When doing serious research, browsing over a list of 400 to see what may be relevant might only take a few minutes. I want to see the other 600.

 

Why do they brag about how many they have found but refuse to show them? For what purpose?

 

If have moderate filtering set I think (hard to say because Google are so obscure about this) but how can this have any affect when searching for normal search terms?

 

Presumably it means some web sites will never show, never be seen by anyone because they are not in the top 400. The Information Highway has been reduced to Sample Street - whatever Google choose for you to see.

 

.

Link to comment
Share on other sites

Why do they brag about how many they have found but refuse to show them? For what purpose?

Because they haven't found them, they just assume they are there. A real search on all the content in the internet would probably take hours.

FM's: Builder Roads, Old Habits, Old Habits Rebuild

Mapping and Scripting: Apples and Peaches

Sculptris Models and Tutorials: Obsttortes Models

My wiki articles: Obstipedia

Texture Blending in DR: DR ASE Blend Exporter

Link to comment
Share on other sites

Okay guys, let me rephrase my question:

 

Why do search engines not show all results? For instance, Google might say it has 1000 results but only shows about 400.

 

When doing serious research, browsing over a list of 400 to see what may be relevant might only take a few minutes. I want to see the other 600.

 

Why do they brag about how many they have found but refuse to show them? For what purpose?

 

 

Big numbers it's better, in marketing.

If their search are limited to 400 results, this may appear as a big limitation for surfers (that are are also their customers). So they stamp bigger numbers in the beginning of the page that's only an estimation of the cardinality of the search, then you have a subset with your 400 results.

 

And there are also all the bubble filter, in their results:

 

https://en.wikipedia.org/wiki/Filter_bubble

Link to comment
Share on other sites

search engines don't search the internet, they use spiders to do the search, the search engines search a database of what the spiders find, they have them running all the times and from whenever the spiders started they just add the results of what they find to the results of what they found before as long as its unique then its added as a new item. they dont subtract if the old item is not found. (found loads of dead links and sites saying that page is no longer available when doing a search, but google still see's it as there as the link is stored on there search database) trouble is some sites actually pay google and yahoo to put them at the top of there search engines, so you can sometimes find the useful page you were looking for around page 8 to page 24 in the result of the search engine. Beyond page 24 you hit duplicates and dead links. I think the search engines are restricted to showing a maximum of 40 pages, even if the results go higher than 40 they seem to have a cut off point and wont go higher, even if the page number at the bottom of the pages says page 41, clicking the link will just send you to a blank page.

Link to comment
Share on other sites

I think you're all missing the point. Let's try a couple of examples.

 

Say I want to news about Syria or the spelling of platypus. I get lots of results and quickly find what I want on the first page.

 

But suppose I'm a university student researching say, the siberian tiger? It says there's about a million results but I only get 300 odd results. It adds there are more but they are similar and I can repeat the search with them included. It doesn't say you can continue the search or that it will highlight the ones I've not just been studying all morning. No, it does the entire search again.

 

Now, it shows 900-odd. Okay, I work forward. (Note that they have long since removed the option that is available on every other 'previous-next' options you will see by the billion everywhere, namely, to jump forward in large leaps or go to the end. No, you have to manually go forward ten at a time (yes, you can change that but then you need to enable cookies, probably javascript, etc.)

 

Right, I work to the end. These look very relevant to me and I'll really like to continue sifting through the results... But I can't; the other 999,000 are denied me.

 

How do they know there are a million results if they've not searched? They might have an index of their index listing occurrences but they can't do that for every occurrence of every combination of words and filters.

 

Why don't they store the position they reached in the search then offer a 'continue search' option? Why does dumb software decide for me what is relevant. Maybe I just want to see how people refer to the siberian tiger no matter how irrelevantly?

 

And how does anyone know they are irrelevant? Do you trust Google? Do you trust monoploy-controlled commercial software to make decisions for you with no override? Is the entire English-speaking half of planet Earch reliant on these idiots? (erm... yes.)

 

What harm is there in them showing what they think is irrelevant anyway? The more I continue searching the more adverts they can present to me and the more money they can make.

 

It can't be storage space. We're not talking about masses of images nor even text, nor even compressed text. We're talking about index links to the position they reached in their search.

 

Shall we give them the benefit of the doubt? Do we trust commerce in general to be open and honest with us?

 

Search engines are our only real window onto the internet. There is no other way of looking at what is there. This window is only open a tiny crack.

 

Listen, if I don't find the antidote for my siberian tiger then it's google's fault the species died out...

Link to comment
Share on other sites

There's also a possibility that the search engine is trying for an exact match. So say you do a search on "siberian tiger" and that is what you enter.

 

You receive results stating, "Page 1 of about 2,890,000 results (0.25 seconds) "

 

This is for "siberian tiger"

 

At the bottom of the page, there is a seperate list containing:

 

 

Searches related to siberian tiger

 

siberian tiger facts

siberian tiger habitat

white siberian tiger

bengal tiger

 

siberian tiger population

siberian tiger diet

siberian tiger pictures

white tiger

 

Many of these contain the words, "siberian tiger" but also contain an additional descriptor like, population, diet, habitat, facts, etc. Although the search terms you entered ARE an exact match for these other searches, e.g. siberian tiger diet, it could be that the way the information is parsed or grouped, it shows all these results together in the number that it found but it only displays on the initial pages, those that are exact match and don't contain extra descriptors which is why they're linked as related at the bottom of the page.

 

If you're searching for acedemic documentation, there are "other" ways of searching that you may be aware of. They display specific results in specific places.

 

You can also try:

 

http://scholar.google.com/ (---->click "metrics" at the top before you begin seaching)

 

you can also use a site modifier in the search box:

 

site:berkeley.edu "physics" (which turns up About 68,600 results (0.16 seconds) )

 

you can use an "excluded word" modifier (-)

 

particle physics -string -theory which won't include results containing "string theory"

 

you can also use a filetype modifier

 

filetype:pdf,doc "siberian tiger"

 

you can use AND and OR also as in any boolean search

 

you can also use the tilde (~) to locate related items to your search term

 

~siberian tiger

Edited by Lux
Link to comment
Share on other sites

Thanks, Lux. The main thing there I didn't know is the scholar version of google which is worth checking out now and again.

 

I note that scholars are allowed to up the results to 20 on a page. Gee, thanks google, that's feinted out for the other two billion users unless (I think) they sign in so google can track them. Their excuse is that it slows things down - yeah, by milliseconds. But Mr Google, don't you know we spend well over a full second jumping from page to page instead? More if I have to reach for my mouse to click the link and my siberian tiger has knocked it on the floor.

 

The related searches I always assumed were included in the general search - logically they should be but then again, this is google so I'll check them out. I have, of course, previously clicked on one if it catches my eye as absolutely nailing what I'm looking for.

 

Your particle physics -string -theory will also exclude anything with string in it and anything with theory in it. You should use particle physics -"string -theory"

 

I wonder what the 'related' tilde does? I mean, what could be related to siberian tiger that doesn't include the 'siberian tiger' anyway? Again, it might be something that logically ought to be in the general search so might be worth exploring.

 

Put it another way, if there were another independent search engine identical to google but they provide all results for the user to decide what's relevant and what isn't, then I would use that service instead. :)

 

 

 

 

 

.

.

Link to comment
Share on other sites

This should not be a concern, because anything that is beyond those 400 results can be found with a different search phrase because it's that irrelevant. Searching on the internet is an interactive process; you might want to research the syberian tiger, so you type in "Syberian Tiger" on Google. This gives you a couple of pages with information, which leads you to pursue new avenues of research such as "Syberian Tiger Reproduction". Digging through more than 400 results is a horribly ineffective strategy for finding the information you want. Basically, Google trusts their users to have at least some semblance of Google-fu.

 

And, in fact, Google very likely has the capacity to find 5 800 000 results. This is part mathematics and a clever algorithm, but they also have massive (ABSOLUTELY MASSIVE) datacenters.

You can call me Phi, Numbers, Digits, Ratio, 16, 1618, or whatever really, as long as it's not Phil.

Link to comment
Share on other sites

Of course google incresing power is a huge problem, but here, as you see, it's a computation problem.

Making indexing and ranking results, it's a not trivial problem. Of course there is also an interest for google to manipulate information in some way. But to have some results in 1/x seconds of computation from billion and billion of requests every second, it's primary a computation/resource limit.

 

You say "it's not possible to have a deeper search also if I want to wait more", and that's true. But you can make search not only with Google. Google it's not indexing the entire web, it's only one source of the net. A big source, of course, but not the only one.

 

To avoid this kind of monopoly, it's better to use meta search.

 

Have you tried, IE, http://duckduckgo.com ?

Edited by Ladro
Link to comment
Share on other sites

it might be there spiders over the years have found over a million references to 'siberian tiger' and have kept that count but they've only kept some 900 references in there database, due to storage space for everything. a while back there was a picture of the four large rooms where yahoo kept there search engine database, they were expanding the system to better storage devices and reducing the size to fill 2 rooms. (these rooms being the entire floor space of a medium sized skyscraper, so one room per floor.)

Link to comment
Share on other sites

If you're a university student, you're going to be searching a library database using something like this: http://ebscohost.com/ . I guess Google Scholar works too.

 

Put it another way, if there were another independent search engine identical to google but they provide all results for the user to decide what's relevant and what isn't, then I would use that service instead. :)

 

DuckDuckGo has explicitly stated that their method of stitching results from various sites together is a way of avoiding the "filter bubble", an emerging concern about personalized searches. The idea is that if Google customizes results based on your history, geo IP, or whatever info they can get on you, they are segregating users, and subtly affecting their perception of the Web.

 

As for Google limiting to 400 results, they are probably correct that you don't want to see past that 400, and that you should instead refine the search using exact phrases and more search terms. In fact, I'm sure they have studies showing that almost all users don't need to look past the first 10-40 results, and I rarely need to look far past 20.

 

https://www.ixquick.com/

http://archive.org/

http://eol.org/

http://dp.la/

Link to comment
Share on other sites

Yeah, DuckDuckGo is good. I use them now and again but miss the non-symbol advanced input and other easily-applied filters. I mean, it's like returning to Dos from Windows. Enter args instead of clicking an option.....

 

It returns about 900 results. It doesn't number them nor claim there are any more. Likely it's getting the same 900 from google.........

 

Trying to think of another example. Suppose you collect limericks? Google claims about 2 million but returns about 500. Only then can you continue in which case it starts right at the beginning again with the ones you've already seen. Then it returns 900. Sound familiar? There may be another 50,000 or 500,000 or even 2 million of interest but you will never be allowed to discover if they are relevant because papa Google knows what's best for us little children........

Link to comment
Share on other sites

Even with an endless list at some point the quality of the results is going to degrade to the point that it's no longer useful. You have to keep in mind that these lists are compiled by computers and not people. Any page that has a word you're looking for in it will get indexed. It doesn't matter if the context is relevant or not.

 

A good example is this forum discussion. It's now a "result" and counted against that X billion total when you search for "Siberian tiger" or "limericks" and yet you'd be no more clued in after having read this. That said, for anyone reading this who is looking for Siberian tigers or limericks, at this point in your search you're better off typing a random URL or IP address into your address bar.

Link to comment
Share on other sites

I know all this. But I don't want a world where a friggin' computer algorithm designed by a commercial enterprise whose sole purpose is to make as much money as possible decides what is no longer useful for me. I don't accept it. I will not accept it. I want the choice. Suppose I am a social scientist not interested in the life of the tiger, its habitat, it biology, but I am interested in how people talk about it. How often does it come up conversation? Or a dictionary creator and interested in common useage of words? Or a researcher in English investigating how many people speak about the Siberian Tiger but don't understand what the term means? Or I know an acquaintance put up an article about Siberian Tigers and I can't find it amongst the first 400? Or suppose some other circumstance that neither I nor Google have thought of?.

 

As I have said, how do you know the rest is not relevant? How does the software know what is in my mind? What is relevant for me? I want the option to see EVERY occurrence of "siberian tiger" that is anywhere on the internet. And I, myself, will decide what is relevant.

 

Not so long ago we did have the choice. It was never a problem to stop reading at the point I chose nor do I remember anyone else complaining about being forced at gunpoint to keep reading irrelevant material.

 

I ask everyone here: would you object to or be offended if Google provided an option to see all the results? If so, why? And if no one, then where is the harm in including that option like it always used to be?...

Link to comment
Share on other sites

I wouldn't object to the option but Google, Bing, Yahoo, etc. are as you've said commercial enterprises. If these providers decide to change their service to cut costs or make larger profits so be it. They are not the Internet. Your usage of the web may be affected but it is not dictated by what results a search engine spits out.

 

The Internet is a dataset. Odds are the machine you're sitting in front of right now is programmable. You can complain about the software and services others provide you but you're only a slave to them so long as you choose to be.

Link to comment
Share on other sites

There are some search engines where users promote or demote relevant search results, rather than a computer. However these can suffer from spamming, trolling, etc. Then again, SEO is all about tricking Google's PageRank. So you can either make a really good algorithm, or have a really good user community rating results. Or both.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recent Status Updates

    • taffernicus

      i am so euphoric to see new FMs keep coming out and I am keen to try it out in my leisure time, then suddenly my PC is spouting a couple of S.M.A.R.T errors...
      tbf i cannot afford myself to miss my network emulator image file&progress, important ebooks, hyper-v checkpoint & hyper-v export and the precious thief & TDM gamesaves. Don't fall yourself into & lay your hands on crappy SSD
       
      · 3 replies
    • OrbWeaver

      Does anyone actually use the Normalise button in the Surface inspector? Even after looking at the code I'm not quite sure what it's for.
      · 7 replies
    • Ansome

      Turns out my 15th anniversary mission idea has already been done once or twice before! I've been beaten to the punch once again, but I suppose that's to be expected when there's over 170 FMs out there, eh? I'm not complaining though, I love learning new tricks and taking inspiration from past FMs. Best of luck on your own fan missions!
      · 4 replies
    • The Black Arrow

      I wanna play Doom 3, but fhDoom has much better features than dhewm3, yet fhDoom is old, outdated and probably not supported. Damn!
      Makes me think that TDM engine for Doom 3 itself would actually be perfect.
      · 6 replies
    • Petike the Taffer

      Maybe a bit of advice ? In the FM series I'm preparing, the two main characters have the given names Toby and Agnes (it's the protagonist and deuteragonist, respectively), I've been toying with the idea of giving them family names as well, since many of the FM series have named protagonists who have surnames. Toby's from a family who were usually farriers, though he eventually wound up working as a cobbler (this serves as a daylight "front" for his night time thieving). Would it make sense if the man's popularly accepted family name was Farrier ? It's an existing, though less common English surname, and it directly refers to the profession practiced by his relatives. Your suggestions ?
      · 9 replies
×
×
  • Create New...