18

Google claims to be fair, and it is in the company's interest (most of the time) to scour the Internet for anything and everything that its spiders can access. I want to know:

  • What type of (publicly accessible) content does Google fail to deliver?
  • Is there a specific type of content that Google cannot retrieve?

References, especially to Google's own documentation, would be especially awesome.

ale
  • 52,972
  • 42
  • 165
  • 314
samthebrand
  • 5,499
  • 17
  • 56
  • 80

7 Answers7

21

A few ideas on the type of things:

  1. Content explicitly disallowed by a domain's robots.txt file is excluded from the Google index.
  2. Websites that are not linked from other websites that Google already knows. That is, there are probably a lot of websites that do not get linked from visible pages, those websites are never going to be found by the Google spider unless they're manually submitted to Google via the Webmaster Tools.
  3. Websites that are behind web forms that you need to fill out.
  4. Census images. Since the content are images that are often manually index, they usually found on paid-for sites like ancestry.com.

Learn more about the Deep Web

Daniel Standage
  • 205
  • 2
  • 9
amh
  • 351
  • 1
  • 9
6

Aside from Twitter, Google does not index Tumblr all that well. Blog posts on Tumblr are easier to find using Tumblr search. Also everything on Google Sites isn't (or is hardly) indexed. If you start a Google site, get your own domain.

Smaller blogs that aren't regularly updated are often dumped from search results. Plus anything that they think is a splog.

Peter Mortensen
  • 1,871
  • 3
  • 20
  • 30
David
  • 61
  • 1
5

Well, most of the Twitter content is not indexed by Google, even if it’s public. It used to be available to Google, but that’s no longer the case since their agreement expired.

Source.

Alex
  • 22,820
  • 12
  • 83
  • 106
4

It depends in which country you are. In Germany it does not show thousands of sites that the government thinks are not good for you, and the list increases by the thousands every year.

Google is the motor of Internet censorship. If you want a free Internet, use some non-evil companies, like DuckDuckGo or others.

Peter Mortensen
  • 1,871
  • 3
  • 20
  • 30
Hellagot
  • 73
  • 1
4

You cannot search for a keyword with special characters in Google Search:

Generally, punctuation is ignored, including @#$%^&*()=+[]\ and other special characters

This is especially annoying when Google some code.

Franck Dernoncourt
  • 19,429
  • 42
  • 122
  • 201
3

Google removes search results deemed to infringe intellectual property rights following DMCA take-down and similar requests. See Google's search result removal request form (it may have an additional URL btw).

einpoklum
  • 1,305
  • 2
  • 14
  • 28
1

Sites with so much content that google simply hasn't had time (or the inclination) to index it all.

Sites that don't have a crawlable site map and require google to provide search terms to access the results available on the site might not be fully indexed.