Google stopped counting, or at least publicly displaying, the number of pages it indexed in September of 05, after a school-yard "measuring contest" with rival Yahoo. That count topped out around 8 billion pages before it was removed from the homepage. News broke recently through various SEO forums that Google had suddenly, over the past few weeks, added another few billion pages to the index. This might sound like a reason for celebration, but this "accomplishment" would not reflect well on the search engine that achieved it.
What had people buzzing was the nature of the fresh, new few billion pages. They were blatant spam- containing Pay-Per-Click (PPC) ads, scraped content, and they were, in many cases, showing up well in the search results. They pushed out far older, more established sites in doing so. A Google representative responded via forums to the issue by calling it a "bad data push," something that met with various groans throughout the SEO community.
How did someone manage to dupe Google into indexing so many pages of spam in such a short period of time? I'll provide a high level overview of the process, but don't get too excited. Like a diagram of a nuclear explosive isn't going to teach you how to make the real thing, you're not going to be able to run off and do it yourself after reading this article. Yet it makes for an interesting tale, one that illustrates the ugly problems cropping up with ever increasing frequency in the world's most popular search engine.
A Dark and Stormy Night Our story begins deep in the heart of Moldva, sandwiched scenically between Romania and the Ukraine. In between fending off local vampire attacks, an enterprising local had a brilliant idea and ran with it, presumably away from the vampires... His idea was to exploit how Google handled subdomains, and not just a little bit, but in a big way.
The heart of the issue is that currently, Google treats subdomains much the same way as it treats full domains- as unique entities. This means it will add the homepage of a subdomain to the index and return at some point later to do a "deep crawl." Deep crawls are simply the spider following links from the domain's homepage deeper into the site until it finds everything or gives up and comes back later for more.
Briefly, a subdomain is a "third-level domain." You've probably seen them before, they look something like this: subdomain.domain.com. Wikipedia, for instance, uses them for languages; the English version is "en.wikipedia.org", the Dutch version is "nl.wikipedia.org." Subdomains are one way to organize large sites, as opposed to multiple directories or even separate domain names altogether.
So, we have a kind of page Google will index virtually "no questions asked." It's a wonder no one exploited this situation sooner. Some commentators believe the reason for that may be this "quirk" was introduced after the recent "Big Daddy" update. Our Eastern European friend got together some servers, content scrapers, spambots, PPC accounts, and some all-important, very inspired scripts, and mixed them all together thusly...
Five Billion Served- And Counting... First, our hero here crafted scripts for his servers that would, when GoogleBot dropped by, start generating an essentially endless number of subdomains, all with a single page containing keyword-rich scraped content, keyworded links, and PPC ads for those keywords. Spambots are sent out to put GoogleBot on the scent via referral and comment spam to tens of thousands of blogs around the world. The spambots provide the broad setup, and it doesn't take much to get the dominos to fall.
GoogleBot finds the spammed links and, as is its purpose in life, follows them into the network. Once GoogleBot is sent into the web, the scripts running the servers simply keep generating pages - page after page, all with a unique subdomain, all with keywords, scraped content, and PPC ads. These pages get indexed and suddenly you've got yourself a Google index 3-5 billion pages heavier in under 3 weeks.
Reports indicate, at first, the PPC ads on these pages were from Adsense, Google's own PPC service. The ultimate irony then is Google benefits financially from all the impressions being charged to Adsense users as they appear across these billions of spam pages. The Adsense revenues from this endeavor were the point, after all. Cram in so many pages that, by sheer force of numbers, people would find and click on the ads in those pages, making the spammer a nice profit in a very short amount of time.
Billions or Millions? What is Broken? Word of this achievement spread like wildfire from the DigitalPoint forums. It spread like wildfire in the SEO community, to be specific. The "general public" is, as of yet, out of the loop, and will probably remain so. A response by a Google engineer appeared on a Threadwatch thread about the topic, calling it a "bad data push". Basically, the company line was they have not, in fact, added 5 billions pages. Later claims include assurances the issue will be fixed algorithmically. Those following the situation (by tracking the known domains the spammer was using) see only that Google is removing them from the index manually.
The tracking is accomplished using the "site:" command. A command that, theoretically, displays the total number of indexed pages from the site you specify after the colon. Google has already admitted there are problems with this command, and "5 billion pages", they seem to be claiming, is merely another symptom of it. These problems extend beyond merely the site: command, but the display of the number of results for many queries, which some feel are highly inaccurate and in some cases fluctuate wildly. Google admits they have indexed some of these spammy subdomains, but so far haven't provided any alternate numbers to dispute the 3-5 billion showed initially via the site: command.
Over the past week the number of the spammy domains and subdomains indexed has steadily dwindled as Google personnel remove the listings manually. There's been no official statement that the "loophole" is closed. This poses the obvious problem that, since the way has been shown, there will be a number of copycats rushing to cash in before the algorithm is changed to deal with it.
Conclusions There are, at minimum, two things broken here. The site: command and the obscure, tiny bit of the algorithm that allowed billions (or at least millions) of spam subdomains into the index. Google's current priority should probably be to close the loophole before they're buried in copycat spammers. The issues surrounding the use or misuse of Adsense are just as troubling for those who might be seeing little return on their adverting budget this month.
Do we "keep the faith" in Google in the face of these events? Most likely, yes. It is not so much whether they deserve that faith, but that most people will never know this happened. Days after the story broke there's still very little mention in the "mainstream" press. Some tech sites have mentioned it, but this isn't the kind of story that will end up on the evening news, mostly because the background knowledge required to understand it goes beyond what the average citizen is able to muster. The story will probably end up as an interesting footnote in that most esoteric and neoteric of worlds, "SEO History."
Imagine getting a telephone call in the middle of the night, informing you that a family member was in the hospital with only minutes to live. What would you do to be able to spend the final seconds of a loved-one's life at their bedside? Would you do everything you possibly could to be there for a person who had been there for you throughout your entire life?
Well, that is what Houston Texans running back Ryan Moats, 26, and his wife Tamishia Moats, 27, did on the night of March 18 in the Dallas suburb of Plano, Texas as they rushed through the streets of Dallas in a desperate attempt to make it to the bedside of Tamishia's mother, Jonetta Collinsworth, who was dying of breast cancer.
However, a time that should have been sentimental for Ryan Moats and his family turned dangerous when they were confronted and verbally assaulted in the parking lot of the Baylor Regional Medical Center in Plano by Dallas police officer Robert Powell, 25, a three-year member of the force. Alleged incidents of police brutality and abuse of power are widespread in minority communities, but this incident has outraged an entire nation. Many critics, including Moats, have wondered if this incident was racially motivated, but it is also possible that this incident was age-related, as the young officer may have overreacted in an attempt to prove his power, despite his young age.
Powell allegedly drew his weapon on Tamishia when she opened the door of their SUV. She initially pleaded with Powell to let her go see her dying mother, but when the officer refused to listen to her pleas; she entered the hospital anyway to spend the last moments with her dying mother. ?He was pointing a gun at me as soon as I got out of the car,? Tamishia told The Dallas Morning News.
The Dallas police officer had been following Ryan Moats after he ran a red light in an attempt to make it to the hospital in time. Despite the traffic violation, Moats said he waited to see that there was no traffic before he ran the red light. While in the hospital parking light, the Texans running back also pleaded with Powell to let them go, saying that Powell was wasting his time.
?I can screw you over,? Powell responded at one point. Dallas police Chief David Kunkle immediately apologized to Tamishia and Ryan Moats, placing Powell on paid administrative leave on March 26. Powell has since resigned.
?When we at the command staff reviewed the tape, we were embarrassed, disappointed,? Kunkle said. ?It's hard to find the right words and still be professional in my role as the police chief. But the behavior was not appropriate.?
After hospital staff and another officer came out of the hospital to validate Ryan Moats? story, he was ticketed and allowed to see his mother-in-law in the hospital but she had already passed away.
?(Powell's) behavior in my opinion did not exhibit the common sense, the discretion, the compassion that we expect our officers to exhibit,? Kunkle added.
Although race may have played a role in the Ryan Moats incident, as Powell, who is White, has recently been accused of jailing former Dallas Cowboys linebacker's wife Maritza Thomas who is Latina, for three hours and giving her five tickets (four of which were dismissed) for making an illegal U-turn, he also may have been attempting to prove that he was a powerful police officer, despite only being 25 years-old.
Often, when people in their twenties take jobs, they have the belief that they are not getting the respect they deserve from their older counterparts because of their lack of experience. Many times, young professionals will try to prove that they are someone to be respected by abusing their power and demanding that everyone knows that they are in charge. Nevertheless, that insecurity leads to incidents like the one involving Tamishia and Ryan Moats and embarrassment for Powell and his colleagues.
Both Eric Lester & Todd A. Smith are contributors for EditorialToday. The above articles have been edited for relevancy and timeliness. All write-ups, reviews, tips and guides published by EditorialToday.com and its partners or affiliates are for informational purposes only. They should not be used for any legal or any other type of advice. We do not endorse any author, contributor, writer or article posted by our team.
Eric Lester has sinced written about articles on various topics from Web Development, Search Engines and The Internet. Mr. Lester served for 5 years as webmaster for ApolloHosting.com and previously worked in the IT industry an additional 5 years. Apollo Hosting provides . Eric Lester's top article generates over 8100 views. to your Favourites.
Todd A. Smith has sinced written about articles on various topics from Education, Entertainment Guide and Facts about Barack Obama. Todd A. Smith is the web master for The Preeminent Online Magazine for African American Men. For more information on this subject visit our. Todd A. Smith's top article generates over 1220000 views. to your Favourites.