In this session we talk about how to deal with duplicate content in WordPress. It’s possible to have content indexed a number of times which raises the possibility of duplication issues so it’s important to choose what to index. For duplicate content, Google chooses which one it considers to be the most relevant and will only deliver that result. We discuss how to change the settings in Google XML sitemaps so that only posts and pages get indexed and not archives.
Member: Hi Rick, how are you?
Rick: I’m doing well. How are you?
Member: Good. It’s afternoon here, actually. It’s half 5 in the evening.
Rick: Oh yeah, you’re in the UK?
Rick: So what can I do for you?
Member: It has to do with Thesis and there’s a website. I could give you the URL. It’s narcissisticbehavior.net.
Rick: Narcissisticbehavior.net. That sounds like a fun site.
Member: There’s a www in it.
Rick: Yeah, it might… I probably even misspelled it. Oh there we go. Is this it here?
Member: It’s The Roadshow for Therapists.
Rick: Okay perfect.
Member: I’m actually not looking at the same thing as you. One second. Yeah, that’s the one.
Member: And there are a few problems with this at the moment which I’m just about to work on. But the one thing that I’m not sure about, the categories still kind of confuses me and so is the way it has articles under categories. Then if you click on that Articles, it’ll show a list of all the articles and then it also shows all the articles separately. So is there a relation between narcissism and shame? That’s a different… that’s an article. So I’m wondering, do I have here a duplicate content on this website?
Rick: Well, you have the potential for duplicate content depending upon what you choose to index. In Thesis, you have a choice of whether or not something is going to be listed as no index or not. Now in this theme, you may not have that choice.
Member: Yeah unfortunately, I’m not at home at the moment so I don’t have the passwords to get into it but I’m going to change it over anyway. I’m going to move from that theme and I’m going to put on either Thesis or… we’re starting a new course on what is it?
Rick: Genesis, yes.
Member: Yeah, I’m considering maybe trying that one.
Rick: Okay. So the answer to your question is, even though you’ve written the article one time, it is possible in WordPress to have it indexed a number of times which does raise the possibility of duplicate content issues.
Member: Yeah, I believe that there is. This isn’t my website but there’s about 20-odd articles and I believe there’s 40-odd pages indexed.
Rick: So that would mean that, let’s see what Google says in its index. Google has 51.
Member: 51, yeah so it is definitely that it has to be coming up twice then.
Rick: So you’ve got About Me, About Me, About, Understanding Narcissistic Injury. Yes, so that’s what’s happening. Here it is, you’ve got 2 articles that are exactly the same, right? Understanding Narcissistic Injury and this one is indexed under category and Understanding Narcissistic Injury. This one up here is indexed without the category in there.
Member: Yes, I noticed that and this is why I’m a bit confused then because I know in the dashboard, it doesn’t really seem to show how you can do something about that. I know I can stop indexing certain pages but knowing which one is not indexed and it’s going to be confusing.
Rick: Well, in Thesis, you could choose not to index your… and this is also the case in Genesis. You’ll also be able to do this in Genesis. But to choose not to index your archive pages or…
Member: Yeah, that’s the thing. I don’t get the archive. What exactly is the archive?
Rick: This is an archive page. An archive page is a page that displays all of the posts of a certain category or of a certain tag.
Rick: They’re also of a certain date or a certain author or whatever. You know, there are all kinds of different archives. But an archive page is the page that displays a group of posts that share a common taxonomy, whether it’s a tag or a category. I guess what I would recommend that you do is you look at my videos on Google XML Sitemaps because you can change the setting in your sitemap. Even if your theme doesn’t have this ability, if you can do it in Google XML Sitemaps so that the archives don’t get indexed.
Member: That plugin is actually on that website, XML Sitemap… the Google? So I can get it from there then.
Rick: Yeah so you just go down look at the archives and just say, don’t index the category archive and then it won’t. Then it will only index the posts and the pages and it won’t index any archives.
Member: My understanding is that duplicate content is really only a problem on your own website. Your last caller before was saying whether you would have duplicate content. My understanding is, it’s not a problem if it’s on other people’s websites because that’s just you know…
Rick: No. If this article is published here and is published as an article in some article directory and is published on another blog, Google does consider that to be duplicate content and will choose which one of them it considers to be the most authoritative and will only deliver that result.
Member: Because syndication would be something that a lot of news websites would put up exactly the same article in many different places, different news websites would use the same article. It’s just syndication and you wouldn’t get a slap or anything like that, would you? But if it’s on your website, my understanding is that’s when it becomes a problem.
Rick: Well, if you were to take a syndicated article from another site, let’s say you had an AP article that was on 100 other news sites. If you put it on your own site, Google’s not going to return your site in the search results. Google is going to return what it considers to be the most relevant version of that duplicate content.
Member: The thing is, on this particular website here and which is all original content, there’s been quite a few websites you have taken and 4 or 5 different articles and posted them on their website. But they have given a link back to narcissisticbehavior.net. You can get the the original author but there’s still ranking for that particular term as well. So there’s more than one website with some of these articles.
Rick: Yeah so if somebody else is ranking better for your stuff than you are then you’d want to send them a Cease and Desist letter and you probably want to work a bit better at developing your author identity with Google. Because if you set up your author identity with Google and you connect your original articles to your authorship in your Google profile then Google deliberately attempts to return the author’s result rather than the syndicated result. I don’t have any videos on how to do that yet but this is definitely something I plan on working on, a set of tutorials on how to establish yourself as the author of an article and how to have your authorship recognized by Google.
Member: Yeah, that would be very useful actually because the writer of these doesn’t really mind because there are links going back to her website etc, so she doesn’t mind. But I know there are a lot…
Rick: I bet you they’re no follow links though.
Member: No, they’re no follow links, yeah.
Rick: So what that person is doing is they are attributing you but you aren’t getting any benefit out of that link from Google’s standpoint, right? I mean, what have happened is essentially spam. It’s a spam site that is scraping content that it thinks will help in search results. Now, Google’s working really hard to kill those things. But you know, eventually, it will get killed.
Member: One of the previous speakers were saying about whether he was getting penalized for anything he had done on his website. The only thing I’ve heard is that if you’ve been a part of what’s going to be blog networks and the index and a lot of the content on blog networks that have been sharing, content and sending links to each other, that kind of thing.
Member: In a blog network, that may cause you problems in the future that hasn’t already.
Rick: Yeah absolutely, whether it’s a private network which exists where it’s not a really a network. It’s a fake network owned by a single person. You use to find people out there promising you to score real big in Google search results creating these little things called mini-nets which were essentially fake. They were links and they were sites and networks of sites that were designed specifically to boost a particular site’s ranking through linking. Google’s caught on to that and has hammered away at that scheme. Then you have these other schemes where they are large collections of blogs where you can buy links on and where you can have stuff published that links back to you and Google’s also working at beating those so any scheme that looks to simulating the fact that people know about you and like you. That is, any of those link building schemes that do that are all targeted by Google and one way or another, Google’s going to figure out how to beat them.
Member: Good you know, it has to be done.
Rick: Absolutely, especially if you are the kind of person who creates original content then that’s what you want. You want your effort to be rewarded.