XML vs HTML Sitemap

Since Google came out with the Search Console and advised to upload an XML SiteMap file, nobody cares about old school HTML SiteMap. We got news for you if you believed XML SiteMap has any value for SEO. It might help you control website health in the Search Console, but won’t help you rank better in Google. Eventually, it might help new URLs get discovered, but it won’t guarantee their indexation. Furthermore, we always knew how to index web pages, long before XML SiteMap appeared.

SEOCONSPIRACY S01E30

Today, we’re going to talk about site maps. But which one are we talking about? Which sitemap do you want to do? HTML sitemap, XML sitemap, no sitemap? Upload a CSV file to Google search console sitemap? Upload the HTML sitemap to the search console?

The word of Dixon Jones

Here’s the very first thing for me on sitemaps.

I don’t use them very often because my logic is if google can’t come to crawl and easily find your pages through internal links on the site, then you’ve got a problem. And just uploading a sitemap will help Google to discover the content. Still, it gives you no clue about whether you’ve got a crawl issue on your website.

I’m sure with massive websites, you’re going to have to put up sitemaps because google does need to prioritize. But unless you’re going to help google with the priority of which pages are essential on your page, I’m not a big fan of sitemaps myself.

How does Google discover your content?

You have different steps: Google discovers your content then decides if it will crawl or not, then index, and then you got the mystery box.

We don’t see until the page returns into an interface like the search engine result page.

But the word discover is very interesting.

All the URLs you put into that sitemap will get discovered or should get found by google pretty easily.

Google is so aggressive now that it discovers your content very quickly anyway, and if it doesn’t discover your content, you’ve got an issue.

It should be easy for google to see the content but uploading a sitemap using CSV in particular, I think, is not a great plan.

I want google to crawl it naturally because I know whether which pages google is thinking is most important rather than trying to convince Google.

The myth here is people thinking that the XML or CSV sitemap is useful for SEO. It’s not. I’ve audited big press websites, and they had all the site maps: a site map index and then dozens of other sitemaps. 80% of the content was not indexed because there was no internal linking.

Sometimes Google gets the links from somewhere, maybe a backlink from a forum or a category or something but 80% of the content is not indexed, and I don’t even talk about ranking.

Is google explaining why we should have a site map?

Google likes a sitemap so it can control within the search console.

For example, you got 1000 URLs in your sitemap, and Google says it reached only 500. So it knows there’s a problem. 

Suppose you’re going to upload a sitemap to Google instead of having it on a URL. In that case, the other problem is you’re giving google information that different search engines are not getting, and I think that’s a long term disadvantage for SEOs.

I think that even though Google is the god of all things search and even if it’s hardly worth looking at anything else, it’s certainly not in your interest to actively make it harder for other search engines to see your content if you ever want to get out of the stranglehold that google’s got over the industry.

So even if you do have a sitemap, declare it in the robot.txt for the other search engines to find it.

A sitemap doesn’t often come with context.

Sitemaps tend to reduce context.

The great thing about finding a website through a body text link is that the conversation around it gives machine learning the ability to get context. So there’s another advantage to having content found in a more natural way than providing a site map to Google.

What you can do is if you’ve got a spammy link building technique,

then instead of building the links directly to your page, you can link to that page on somebody else’s site that’s got the link to you and especially if that’s a robust website that’s unlikely to get banned as a site. So they can have some protection in there; they might get a message in their search console saying, “Hey, these sites seem to be producing unnatural links towards you.”

But it may be a form of protection for you.

We’re Search Engine Hackers, not Black Hat SEOs

We are not black hat SEOs. My bio reads search engine hacker, which

is a little bit different.

There are the black hat and the white hat in the hacking world, there are the hackers who destroy sites, and you have the ones who discover vulnerabilities and tell what’s wrong with security.

So a lot of it is about how you want to be perceived as well.

The beautiful thing about SEOConspiracy is that we can be our true selves. It’s interesting because we are passionate about trying to

understand how things work and putting out at least our truth. Again, we’re just two guys that see so much nonsense out there, and this XML versus HTML sitemap is one of them.

Our advice would be to build a friendly HTML sitemap, and you have to have a good look as well at precisely what’s going through the user & user’s head at that particular point because there are so many different ways in which e-commerce pages might be displaying content.

On some occasions, it might be appropriate to give a canonical on those pages to say, “look google, when you come to this page, to be honest with you next time somebody comes to this page, you’ll see something else, so this should be the page.”

Suppose you think about the word canonical as preferred. In that case, you know, the canonical page is the preferred page, then google can work things out, but that doesn’t work if you’re trying to manipulate Google and you’re trying to get people into the search engine for buying a Samsung phone. Then you tell them that they’ve just got to go to this page for all phones, then you maybe need a category canonical or something like that.

Be proud to try to make your sitemap beautiful; don’t make a list with all the URLs. My clients, most of the time, are passing and XML feed, which is going to bring the image, the description, the title, the update, something nice, and that’s automatically generated.

But some of them do it by hand. I tell them can you show to your mom? Are you proud enough of this page to offer it to your mom?

Customizing your sitemap makes it much better for the user and much better for any machine learning algorithm to understand the context, and I think that’s a good thing to have.

So, please do the test to take two websites. One with a powerful, excellent internal linking and one with no linking, just all the pages but not linked with the sitemap XML and test for yourself, see the results and call us. We already know what’s going to happen 🙂

Next week we’re going to talk about Cloaking: is it only for Black Hat SEO?

Listen to the podcast

Watch the video

Latest posts