Does Google really have two indexes? A main & a supplemental where all the trash of the web is parked there?
SEO CONSPIRACY S01E21
THE WORD OF DIXON JONES
I used to be very obvious, like five or six years ago, that there was a main index and a supplemental index. Google kind of showed it to you but then if got lost, it was around Caffeine. You couldn’t see the difference anymore. but I still think that there is a quality threshold that you’re not really going to get your pages recrawled on a regular basis if they get into what you might consider the main index.
I think there is still a whole load of rubbish out there on the internet and Google says: “Okay, I know it’s there I can’t be bothered to go back”, because it’s got to control its resources.
And once you say “okay I don’t really care about 95% of the Internet” and let’s face it, it’s going to be at least 95% of the internet and I would say it’s probably 99% of the URLs on the Internet that are completely worthless.
So if you concentrate on the 1% then you can do a lot more algorithms maths, you can send over natural language understanding routines and you can work out page rank metrics better.
And you can try and do a lot more hard programming and an analysis of a very small amount of the data rather than trying to do that on all the wasted stuff.
It makes sense absolute sense for Google to maintain an index which is just there for storage, supplemental one and a main one for which they can spend some more energy on analyzing.DIXON JONES
How to see if your website in on the main or supplemental index of Google?
Some website use the real Google. They pay to use it, it’s not the add-on Google Search you can find on some websites.
So those websites that pay to have access to Google index have access only to the main index.
For example in France we have all the phone companies, that use Google and if you do a site: on one of those portal, you will access the main index only.
So if you check on Google the site: you’re gonna find everything and on the Google used by that website you see only the main and here you go. You see the difference.
I think as well you know a pretty good example is if they haven’t cached the page. They don’t bother to cache most of the old show caches on stuff that hasn’t really made it into some kind of notoriety,
whether that’s a signal of mainly dates or not.
GOOGLE KNOWLEDGE BASE
What I would say is that I think that in modern years the knowledge base becomes another layer so they’ve got supplemental data which is basically lots & lots of data that they don’t think about very much.
They’ve got the main index which is what SEO is really keen to enter because that’s the rankings and stuff but there’s huge amounts of other data sources as well.
So there’s a knowledge base: Knowledge graph; Google Images, YouTube, etc.
There’s all these other different date sources which are all incredibly valuable for SEO.
If we just concentrate on the text in that main index then the more Google moves towards voice search & answers & images and relies on those kind of things, the less real estate they’re going to be giving to the old main index which is becoming increasingly less important for Google as a business.
If Google bot didn’t come visit the page in the last ten days you got a problem. So I don’t know from Google’s perspective but certainly from what I know about search engine retrieval technology and search engine indexing technologies, and trying to calculate things, then it’s absolutely right. After a while a page is going to naturally drop out of an index if it’s not seen again so it does need to get seen reasonably regularly for it to stay current.
Should every page of your website be accessible 3 clicks from the home page?
Ok, it is important that people can get to the content that they need quickly and so can Google + the bot but the thing is you can have a completely orphaned webpage from your website that got a link from the home page of the New York Times. This page is gonna be a really important page and it’s going to have to get links from authority sources but those authority usually those authority sources come from other pages on your site but that’s not always the case.
It’s definitely occasions where you can have pages that are a long way from the home page or can only be found by a search for example but that search is linked directly in a URL form from other pages and so can appear higher in the results.
Mindmapping & Hierarchy of the pages
I use a mind mapping tool to represent and every single nod as a number one, two, three, four. It’s not from the level, it’s from
the ambition, the importance of the page. Some of those pages might be 15 levels down. If they are a number one, I’m gonna figure out a way to bring it back up but if it’s a three or four close to the home, I could care less if the bot comes once every 30 days, it doesn’t matter to me.
What I’m fighting against is people putting every page basically at the same level and throwing a coin.
You got a level, your content has different level. Even if you just think about the Pareto principle, the 20/80.
I even have an extreme Pareto principle, which is five percent of your pages are really doing maybe 60 or 70% of the business & then if we bring back the 15% we go back to that principle which is the law of nature, law of business, it’s gonna be difficult to fight against the Pareto principle.
Back in the days before when we did websites with our ten fingers with HTML & Dreamweaver, we had to structure the website well.
Then CMS like WordPress, Magento, PrestaShop appeared and they didn’t care about the structure so now it’s a big mess.
I’m trying to bring back the spirit of the old days to build a very well structure with silos well isolated. To try to understand structure but also internal links on the page really are gold mines of information.
If a machine can properly interpret the meaning of those links.
A lot of people got tricked into this siloing thing which is not the secret.
The secret is who is in relation to what and why.
In the page, around the page, inside the side and around the website. If you are able to understand everything that’s beyond that, that’s it.
The way that HTML is written as well it’s boxes and boxes and boxes and boxes. It’s ideas within ideas within ideas, within ideas and most code is done the same way.
To finish up on this main versus supplemental index, I don’t think it’s a myth I think it’s real it makes sense from a technological point of view.
See you next week !
Listen to the podcast
Watch the video