Originality.ai founder Jon Gillham joins the podcast to help online publishers better understand the dynamic world of AI and SEO!
The episode covers a range of topics, including:
- How the detection of AI-generated content works
- Addressing false positives in AI detection tools
- The implications for review sites
- Giving power back to online publishers
- And – of course, lots more…
Watch The Interview
John first got into online business when he was an engineer.
Like many of us, niche websites and content marketing were a means to escape his 9 to 5.
He’s gone through every update since starting his first niche site back in 2008. And with this wealth of experience, Jon has some interesting insights into AI-generated content and the ever-evolving challenges Google faces in maintaining the quality of its search results.
And a decent portion of the discussion also centers around a recent study that examines sites that have been de-indexed due to AI spam.
These sites were often low DR, practiced mass publishing in a short period, included lots of ads – and shared other notable characteristics. And Jon highlights the importance of balancing the creation of user-centric content with considerations for SEO strategies – regardless of How the content is created.
As a long-time niche site and content marketing agency owner himself though, he does stress the need for transparency and ethical use of AI in content creation. This is both for the end user and the site owner responsible for and paying for the content published on their site. And Jon shares some tips for ensuring content authenticity, navigating false positives, and ensuring authors are being honest in their work.
There’s also interesting discussions on the implications of the ever-increasing use of AI on user-generated content platforms like Reddit, review sites, and more – enjoy!
Topics Jon Gillham Covers
- Evolution of AI tools in content creation
- Google’s ambiguous stance on AI-generated content
- His recent study on sites de-indexed for AI spam
- Balancing mass publishing and site authority
- Importance of fact-checking and valuable content
- Differentiating between AI-generated and human-written content
- Strategies for navigating false positives in AI detection
- Impact of government legislation on AI content
- Ethical considerations in AI content creation
- Authorship in a post-AI world
- AI’s influence on user-generated content platforms
- Transparency in AI content creation
Links & Resources
transcription
Jared: All right. Welcome back to the niche pursuits podcast. My name is Jared Bauman. Today we are joined by Jon Gilliam. Jon,
John: welcome on board. Yeah. Thanks, Jared. Great to be here. Been on a few times, but, uh, first time with you as the host. So yeah, great to, great to
Jared: be here. Welcome back. It’s always good to have a returning guest.
And today we’re talking about a lot of stuff happening today and. In the past and going forward in the world of AI, as it relates to, to building websites, creating content and whatnot. Um, I can’t wait to dive in cause this is ever changing. So you guys have a lot of different perspectives from where you’re coming from.
Uh, for those who, you know, maybe don’t know much about you or haven’t heard previous episodes, give us a little bit of a backstory on who you are and what you’re about.
John: Yeah, sounds good. Um, yeah. So my background and engineering school and then went, uh, worked at a refinery, wanted to leave the day job, get moved my family back to my hometown and then started sort of discovering niche websites and building, building content sites and other sort of online businesses, um, at the time, um, built that sort of portfolio of sites, portfolio of little businesses up, uh, left the day job, um, seven, eight years ago now.
And then, uh, off the back of sort of that skill set, built a content marketing agency, sold that, and then most recently we, uh, built, um, an AI detection tool called originality. ai, which helps, um, publishers ensure their, their publishing content that meets the specifications that they’re, that they’re after.
Um, so sort of always been in this sort of. Content game. Um, and various business, um, businesses off the back of that.
Jared: Yeah, boy, you have quite a long story. When were you, like, when was your first, when were you building your first websites? I’m just curious how far back we, what, what, what era we go back to in website creation.
Yeah. It’s
John: 2008 probably. Okay. Yeah. So yeah, bit way, way, way back. Like, yeah. E zine articles days way back.
Well, article spinner action where you, uh, do you have any sites during the, the Panda or Penguin updates? For sure,
John: for sure. Got like, yeah, a ton, a ton got, uh, nuked then. Um, and then I think I’ve been pretty clean since.
I mean, certainly ups and downs, but no, no sort of mass, mass pain. Um, like, like Panda Penguin days.
Jared: Well, you’ll be well qualified as we venture into some of the things happening in today’s current environment with Google, the HCU and past. I mean, I always tell people like, you know, um, uh, content creators have been through, you know, massive shifts before.
It’s just been a while since we’ve had one, uh, or some like those back in the days, so you’ll be well qualified to, to talk about it. Well, you are with original on the AI. So you have a lot of experience in AI as it relates to content creation. Certainly. There’s so many ways we can go with it today.
There’s, uh, just ever since I’d say chat GPT got released in November of 2022. That definitely changed the game a bit. Granted, we were working with AI before that, and we were working with companies and tools like a, maybe a Jasper, but certainly a lot of the game changed in 2022. And then obviously we’ve had the evolution of that.
Plus we’ve had now what’s happening in the current Google environment as it relates to AI. Um, Maybe AI today, AI in the past and tomorrow. Like, let me just throw a very broad question at you so you can kind of set the stage with where we’re going to end up talking about it and maybe frame it out for us a little bit.
John: Sure. Yeah. So I’d say, um, You know, and I think when AI first came out, it was a pretty, you know, there’s a, there’s a great cliff, I think from image of like a graph saying like current capabilities and it’s like, it’s like, Oh cute. The monk, the, the, the robot can do monkey tricks. And then it’s like, Oh crap.
This thing is now way more capable at whatever the task is. Then, then we were in it, you know, I think GPT two days. 2000 to 2000? Uh, 2, 2, 2. No, like 2020 to 2022. It was like GPT two. Not really awesome. GPT-3 came out like 20 21, 20 22. Jasper really burst onto the scene. Um, we were extremely heavy users of that tool, creating a generated content, um, for clients within our agency, commu communicating that we were using it.
Um, fact checking it, publishing it. Um, and then Chats UPT came along and, you know, the world, the world changed. Um, in the context of Google, there’s always been a question on, like, does Google want this? Does Google not want this? And Google needs to try and thread this needle of being an AI forward company while ensuring their search results are not massively overrun by AI content, because why would anyone go and use, you know, Google, if they could just go to the AI and get the, the answer.
And so Google has got a really tricky, tricky line to walk. And so that’s why their communication sometimes feels pretty, no, we don’t want AI, no, we do want AI, uh, just spam. We don’t care how it’s created. Um, and then, and then I think the update and the manual actions and some of their strategy around it in terms of trying to what, what feels like instill, not like there’s a bit of a psyops, um, component to this update.
Um, different than some of their past updates, and I think that’s that’s where we’re at now, where it’s they’re trying to, um, really communicate that they don’t want AI spam, leaving it ambiguous about AI in general.
Jared: As it relates to AI in today’s environment, um, what are some of the scenarios that are at play that content creators should be paying attention to?
I’m sure everyone’s going to think of one or two, but at the same time, like let’s kind of frame out some of the different, Scenarios on the table right now. So we can start to wander into where we go from here and try to, like you said, like, it’s, it’s really confusing to try to listen to Google because they, they, they kind of flip flop a bit.
Right. And then they have ulterior motives and they have multiple things at play, but. Currently right now, what kind of scenarios are we looking at? And then we can kind of move forward from that.
John: Yeah. So I think, and I think this is kind of what you’re getting at, but it’s like, if you’re like plugging something into your WordPress site that is mass publishing a thousand posts a day based off of prompts and not being human reviewed, you’re going to get smoked.
Um, that, that is, Google does not want that. It might work for a period of time, the same way as other black hats can strategies can work for a period of time. Um, and I think a lot of this. And then, and then if it’s, um, you know, on the other end of the spectrum, and I’ll use like an example that we, we use internally is we have a, our, some of our research team or English is a second language individuals.
They do, you know, ridiculously intelligent research and then use Chi CPT to assist them in communicating that information in English. Um, I think that’s. Use of A. I. In the eyes of of Google. Um, and so I’d say there’s, there’s that spectrum the same as the same as exists. Um, you know, going back in the history of sort of S.
E. O. Around backlinks. There’s a, there’s a range. There’s Absolute crap that is spam and will get you punished. And then there’s probably some effort that you can put into getting links. That is a really useful, um, effective strategy to get your site more visibility. Um, and I think that’s it. That spectrum exists within, Within AI generated content.
Um, what I think site owners need to be careful of is making sure that they’re the ones that are choosing where on that spectrum they want to be landing. We
Jared: have obviously the way the algorithm has been treating AI up until this point. And, you know, we’ve seen plenty of scenarios where algorithmically a site will explode from a lot of AI content.
And then 10, oftentimes tend to fall off a cliff if there aren’t additional inputs. Or things being associated with it. So you’ll see it grow. You’ll see it grow. And then you’ll see at some point, the algorithm catches up to it. Um, and obviously that’s not the case with all AI sites or some sort of component that makes it do that.
There’s lots of, lots of other examples of sites that have a lower velocity of content being published with AI. Or an edited component of content being published with AI or more than just AI, right? Like internal linking and graphics and imagery and other things added. There’s been lots of success stories around that, those scenarios.
Um, like, have you seen any sort of recipe that uses AI in a way that the Google algorithm Does it seem to mind a bit and still has, uh, more potential for long term success? So,
John: so I’d say, I think that I think when words are not the core value out of the page, I think that is a great time for AI generated content to be, to be used systematically.
And, and so if it’s like. You’re creating a bunch of free tools for free calculators, and then you’re putting words underneath those free calculators or your unique images. And the focus of the story is around images. Um, and that’s the value that is being created, provided to the user on the page. And the words are just sort of supplemental.
I think those are, those are great long term strategies for sort of a systematic approach to the use of of Of AI to create words that are published on a page. I think when, when the main value add of the page is words, pretty hard to sort of systematically inject AI generated words into a page and that be, um, you know, an, an, a net gain for, for, uh, for Google and the, the end user.
Jared: So we have then March rolls along and we have a core update, a spam update, and we have Out of the blue, I would qualify it as tons of manual actions and de indexing of sites. Through Google search console with the label of AI spam. Now you guys did a huge study on this at originality. ai quickly by my dad.
Well done. We featured it on the news podcast. Spencer and I talked about it, but I mean, I, I asked you that my last question was algorithmically, this is manual, right? So for those of you listening, who aren’t aware, like the algorithm can. Penalize a site or just remove a site for the most part from search.
But then a manual action is something done manually by someone on the Google, um, anti spam team. So, I mean, talk about the correlations you found in the study and any of their insights from what you guys, um, kind of uncovered there.
John: Yeah. So I think, I mean, we think of the internet as this like infinitely large place.
Um, that that’s just incredibly big, like there’s no one that’s going to find us. Um, you know, one thing that we’ve seen as we’ve been doing these studies is it’s not, it’s not that massive in terms of the number of sites that are getting meaningful traffic. Um, you know, there’s 70, 000 websites that are, um, connected to Raptive, Mediavine, or Ezoic.
Um, another million that are, that are sort of on the, on the platform for, for AdSense, um, you know, those are big numbers, but those aren’t crazy numbers for Google to sort of like sift through and deal with. Um, and so, so that, that sort of like is a preamble into, into the study. So we, yeah, we looked at, we looked at, um, it was about 5, 5, 000, 5, 000 websites that we were able to identify that had been de indexed.
So a total of about 2 percent of all the sites that we looked at. Yeah. Um, and, uh, 1, 400, 1, 500 websites were de indexed, which represented 2 percent of all the sites that we had looked at. They were on Mediavine, Ezoic, or Raptive. Um, and, You know, some of the interesting takeaways that we saw, none of the sites that had a really high D.
R. rating. Um, so it seemed to be very weighted to the lower, lower D. R. score sites, um, that got got the index. Some had some really impressive traffic, like a handful were over a million a month in inorganic visitors, um, down to zero. A lot of them were pretty obvious. Um, like when, like just manually looking at them, I didn’t see many that I’m like, Oh, they got this one wrong.
This is like, Oh yeah, you got, you got caught. Um, not a lot. Like they were optimizing for publishing a lot of content and not optimizing for. Um, any other means around it? Some were had some attempts of programmatic SEO where there was tables that were being injected and then words around that, which I thought I was made.
I don’t say surprise that, but I thought was a. A reasonable strategy to try and sort of combine programmatic SEO plus a I generated content to produce what might be a more valuable page than than a I would produce on its own, and they were still getting getting hit. Um, so it’s a low, low dr aggressive, aggressive, um, publishing of a generated content.
All the sites that we looked at had published some content, a content that was AI generated. I don’t think this was a, you’re AI content, you’re banned. But, you know, you know, this goes back to what I talked about earlier. It wouldn’t be, you know, I’m a pretty dumb guy. If Google hired me, there’s a lot smarter people than me at Google.
If Google hired me and said, hey, how can you identify what sites should get a manual action? Look at sites that are getting traffic from Google. Look at which they have that information. Look at the number of pages that have been indexed on those sites. Look at sites that are outliers in terms of an increased number of site pages, a reasonable amount of traffic, and then run it through an AI detector, and this result of 1500 sites would probably be pretty similar to the result that, that I would have produced using that same, that same method.
And so I don’t think it’s. I think when the information is in Google’s hands and the world of number of sites that get meaningful traffic is not infinitely large, it becomes a pretty manageable problem for Google to attack manually.
Jared: The big question I hear a lot of people asking is this reference to mass publishing, right?
And like, it’s easy to see on one side of the spectrum, like, Oh yeah, somebody who’s published 700, 000 articles, that’s mass publishing. And then it’s easy to see on the other side of the spectrum, someone who’s not using AI. And so they’re limited by the finite capabilities of how many articles they or a small team of writers is able to crank out in a day.
And that’s usually somewhat related to how successful and big the site is, meaning you don’t have five writers for a site that isn’t earning a dollar and doesn’t have much traffic typically, right? So we see both sides of the spectrum, but how does someone who’s using AI to help them out? Um, how does someone who maybe is in a niche that has the potential to crank out a good amount of content?
I’m using air quotes for these, those of you listening on the podcast. Like how do these people find what massive amounts of publishing is in Google’s eyes versus what is reasonable given the tools they have at their disposal?
John: Yeah, I think, I think, I think what we’re going to see is that this is a, like a, there’s going to be some correlation between DR and, and so what I’m, I don’t know enough yet to be able to say this with certainty, but I think your, your ability to mass publish is increased based on your, D.
R. So the higher the more authority your domain has. The more leeway you get with when does potentially an internal trigger and I, again, this is totally theory at this stage where we don’t have enough data to know how it’s going to work. Um, but I think there’s, there’s some correlation between DR. So if you’re a new site and you put, and you spin it up and you publish 100, 000 articles clear.
So then how do you, how do you decide? And I think it comes back to. Depends what you’re trying to do. If you’re, if you are optimizing, trying to stay on the spectrum of, I’m no, I’m adding value, I know I fact check these, I know these are useful articles. I would be happy to send them, you know, passes the family check.
I’d be happy to send them to my brother or mother to help them with that, with whatever question that they have. Then I think whatever your capacity is to produce content like that, you’re probably safe if you’re trying to. Manipulate Google, which I mean, I know it’s a hard thing to do because like, well, we are, we are all writing for the search.
Like if we weren’t getting Java from Google, why a lot of us wouldn’t be doing it. Um, so, you know, it’s, it’s, it’s a funny, funny wording from Google to say, don’t do it for the Serbs. It’s like, well, I think most of us are. Yeah. Um, and so I think if you’re not doing it for, if you know, you’re not doing it for people and you, then you’re trying to manipulate, manipulate.
Search results. Google isn’t like that. And they’re probably going to be more aggressive, um, on that type of content. And so I think if if you’re if you know you’re producing content, you’d be happy to send to your family to help them with that question, whatever the question is, then whatever capacity you can publish that you’re probably safe.
And if you’re on the other camp of trying to identify the right frequency to publish content so that you don’t trigger any alarms. I think that’s going to be, that’s hard to know right now. And, and likely DR related.
Jared: There’s also stories of people getting manual actions. You had a very limited amount of AI content on their site.
And I’ll say that there’s enough of them going around that it seems like there are other factors, maybe in a minority of sites came to play. Um, any thoughts on what the other factors could have been for Sites that got manual actions that maybe we’re using. I’ve heard of a hybrid of AI and, um, uh, uh, uh, written content, um, or a very low amount of, of AI content, you know, um, under a thousand in, in, in some cases.
You know, any theories around that that your study might have found or just in, in, in general for
John: you? Yeah, so we’re, we’re trying it. So we, we looked at, we looked at the publicly identified sites that had been identified at the time that we did that to share the findings. And 100 percent of those sites had some AI generated content.
The quantity of sites that we were able to look at with that study was only 14. And so we’re now doing a much more in depth study, looking at 200 sites and at least 200 sites and several hundred articles off of those sites to try and Identify some more, get some more detail. I think Google, um, I don’t know enough right now about what the other factors are, other than to say that I’m sure there would be collateral damage.
Would they get it? Has Google ever gotten an update? Perfect. And then that answer is no. There’s always going to be collateral damage. Um, and If they had done, you know, I don’t know how Google would sample the sites, and I don’t know how what Google would look at, but let’s say they were using some love, some amount of a detection.
Are they going to expand resources across all the sites or across all the content on the site? Are they going to look at a sampling of the more traffic articles and say, yep, these are light. We suspect these to be a I hits a bunch of other triggers. Significant amount of ad placement was, was another one that we saw that a lot of these sites, again, potentially problem with our sample size because we were looking at sites based off of the ad platforms that they were on.
Um, but we saw a lot of sites that had a very aggressive use of, of AI that we, they, it was obvious that that site cared about that site for how much money it could put in their pocket, not the user.
Jared: Right. Yep. And that’s been correlated with other studies that I know Cyrus Shepard did a study with. Of Google’s, you know, algorithmic updates in 2023 and found a high percentage of a high correlation of negative, uh, negativity to the review or to the update as it related to add density and stuff, but certainly with the manual action, that’s a far different thing.
And that kind of brings me to my, I think my last question on, on, on this specific topic, but Spencer posed it, um, uh, a bit, a bit ago. And so I’m curious to get your take on it, especially as it relates to someone who’s running an AI detection software, like why. Does Google need to send manual actions out when they’re releasing a spam update that’s supposed to remove 40 percent of all spam from the internet?
You would think that these mass purposed or mass created article, uh, websites with, with tons and tons of articles would fall easily into that spam filter that they’re releasing right now. So why the manual actions?
John: Yeah, I don’t know. I think, I hope we’ll find out eventually. Um, I buy into that this is a bit of a psyops, um, in terms of like, they’re trying to send a message.
Um, I think, I think I believe that. It adds up, right? It adds up. They’ve done this before. You know, Spencer’s been on the, on the receiving end of sort of like when they attack PBNs of people that publicly use them. And, and, Um, I think there’s a component of this where they attempted to, you know, potentially attack sites that of people that publicly talk about how they use a I to build their sites.
Um, where those sites just happen to be connected to the rest, potentially, but I think I think it’s the fact that it’s a manual action. The fact that they communicated, you know, to blog posts about how big these updates are going to be, and then at the exact same time, rolled out the manual action on the day of the launch.
Um, this, this feels like whether it was marketing or, you know, uh, you know, they, they tried to, they tried to send a message with this update. And I think what that tells me is that their update is not going to be as effective at attacking AI generated content as they wish it was. And so they did this other strategy to try and drive the message home in a very dramatic, um, and sensational way.
Um, And I think it sends a clear message on what they want to do, but I also think it sends a clear message on what their capabilities are going to be related to the related to the update. That’s my, that’s my current theory. Um, yeah, but I think only Google knows
Jared: it’d be hard to argue against it. That’s for sure.
It’s hard to find other reasons for it. Um, let’s wait ourselves into a couple other AI, you know, buzz worthy events or stories a bit. Because I do want to get into. But I mean, I, I would be remiss if we didn’t touch on a few of these stories and tackle whichever one you think is appropriate. Or most appropriate to the conversation.
I mean, in the last couple of months, we’ve obviously not just had the manual actions related to. AI and quote unquote AI spam. But we’ve also had other things that have come up along the way. And in our industry, we’ve had the sports illustrated author example, where, you know, authors never even existed for the AI content that was being created.
We’ve, um, we had, uh, obviously many would say that this is probably what led to some of this, which is that whole AI. A heist or, uh, the concept of stealing other people’s content, sitemap URL by URL. Um, and even, I suppose the, the, the topic of parasite SEO could play into the role of AI as it’s related to kind of.
To some degree what you talked about, like high DR sites just keep winning because there just is a higher priority and preference given to them from a trust standpoint with the mass production of AI. And that kind of leans into that, but lots of topics there. Like, do you think any of those have more relation to the larger concepts of ranking with AI these days and others?
Yeah, I think,
John: I think the AI theft one was, was a fun one that got blown out where it’s like, yeah, that’s kind of what everyone has already been doing forever. Whether it was a human writer or an AI writer, you know? What’s the competition doing? And, and I mean, got sensationalized for sure. Um, um, you know, I think as a society, we’re going to be wrestling with how do we use AI content ethically?
Um, and for better or worse, Google is the organizes the world’s knowledge, um, in in the form of search results, and they’re going to be a leading factor in terms of how they how they evaluate this type of content, um, is going to have a significant impact on how society as a whole evaluates it. Um, you know, I think that this, the sports illustrated one is quite interesting, and I think will play out.
Where I think we’re going to see an increased weight placed on authors. Um, you know, I think he has continued to move us in that direction or hasn’t has moved us heavily in that direction, but I think high dr still like high authority sites still kind of didn’t matter. Um, I think their Google is now communicating that they’re going to in a very nice way, not just attack parasite SEO sites off the bat that are off the back of high authority sites, which will all will all cheer and as their little like indie publishers, um, when when Forbes no longer ranks for everything.
Okay. Um, we’ll be happy about that. Um, and I think that the, the authorship is going to mean more and more in a world where we don’t know who created it. If you are the author behind, if you’re putting your name behind as the author on that, that’s going to mean, mean more in a world, uh, sort of a, as we move through this post post AI, um, world.
So, yeah, I’d say that’s my that’s I think what’s probably the most relevant to I think the updates that are currently happening and will continue to happen and the update that’s going to be rolled out in two months attacking parasite SEO. Um, yeah, I’m excited for that. I think that’s going to help level the playing field.
Um, and I think right now they have to rely on authority of a domain. I think that’s going to continue to, um, I hope get diminished as they evaluate more on the on the author and authorship will mean more. Yeah.
Jared: Last question before we get into A. I. Detection. Um, and you know, this is ever all this is ever changing.
I should say all this is so dynamic. But, um, what about the role of government legislature? I mean, we’re coming on the backs of the E. U. Weighing heavily in on this recently. Uh, obviously different countries have had different stances on it previously. Uh, at some point, the U. S. is probably going to weigh in on it.
Like, to your point, we’ve seen Canada and Italy weigh in on it. Uh, you know, more and more that’s coming to the, to the, to the, to the forefront. And, um, and, and, you know, Google’s caught up in a lot of, you know, the antitrust lawsuit and trying to make sure they’re making things happy. Like, and again, I don’t want to get into too big of a theoretical conversation here, but does, Any like as site owners and as publishers, do we need to pay a lot of attention to all that noise?
Or do you think it’s best to just ignore it? And we’ll see it play out in the SERPs. And that’s where we pay attention to it.
John: But that’d be my thoughts. I mean, I think, I mean, I guess on my end within a detection, probably need to be more focused, but from a portfolio standpoint, I mean, the judge jury. An executioner is is Google for organic traffic.
So, um, I mean, what, what, you know, what Google does is what I care about. Not what legislation does not what Google says, but what what actually happens is more what I what I care about. The rest. The rest is all information. I also think the legislation. Is going to be more focused on, um, society on the type of a I content that can cause societal harm.
And I think that is heavier focused on the images and the videos that will come from, um, from a I models then that I think text text alone, I think is. Is, is, has less of a chance of producing societal harm than voice that kid turns into like scam calls, political, you know, I think any, especially as politicians that make the laws, videos of them doing things that they didn’t actually do could be extremely harmful to them.
So I think, I think we’re going to see. We’re going to see laws get passed on the other forms of content and then text first, um, before we see it on text. Yeah.
Jared: Kind of the whole screaming baby syndrome, right? You gotta, you gotta take care of the screen, babe, before you can take everything else. Yeah. Um, okay.
Let’s talk about AI detection and let’s talk about it from, from your standpoint. And again, I’m really wanting to make sure that. Um, uh, there’s so many ways to come about it, but I want to come at it from the voice of the publisher and how AI detection can help, you know, we talked about flags already.
That are trending for manual actions. I also mentioned just in general, algorithmically, um, AI sites that tend to rank and then, and then go bust. But on an individual article level, how important is AI detection software to be using, knowing that you’re a bit biased, make a bit of a case for it.
John: Yeah. So, so like I’m biased on some of my sites.
Like I’m, I’m, I, I use it and I don’t use AI detection because I know I’m using AI content. I think there’s a use case for it in those sites that I’m using it on. Um, you know, I think a lot of people are happy to pay a writer 100. Um, no one’s happy to pay a writer 100 for an article that they just copied and pasted a chat GPT.
Um, so I think that’s, we, we strong, whatever side of the fence you sit on in terms of like, hey, I, a content is good to go. Google doesn’t care. Just hammer the serps with it. You know, I think that’s an overaggressive or no, I never want to touch. I never want to touch my site. Um, we want publishers to be the ones that make that decision, not the writers.
And so that’s, that’s where, where we see AI detection sitting inside the, the content production ecosystem, um, for, for publishers, is that we want publishers to be the ones that decide what content goes on their site and what risks that they’re accepting. You know, they don’t want, they want, everyone wants, Non plagiarized, fact checked content, whether it’s AI generated or not, that’s, that’s their decision.
But we want them to make that decision, not, not the writer, um, to be the one that’s making the decision. Um, so that, that’s how we, how we view In the world of, of, uh, site publishers. I
Jared: want to tackle it from two different sides. I’m going to say it out loud. So I don’t forget, cause I didn’t have time to write it down in my notes.
The first side is just that publishers trying to make sure that they’re getting handwritten content that they’re paying for, not there’s anything wrong necessarily with getting AI content. As long as you know, you’re paying for AI content, right? So that’s scenario one. So number two would be the publisher.
And I hear this a lot, the publisher who wants to make their AI then human edited content, look less like AI to a detection software. So maybe we’ll circle back on that one and I want to hear your thoughts, but going back to that first one, the publisher who’s hiring writers and wanting to, to, uh, to, to, to make sure they’re getting, um, uh, the scenario that comes up for people that I hear is, is false positives.
You know, Hey, my writer says they wrote it. That’s showing up as AI. What are some ways to navigate that as it relates to a software and conversations, either from a tactile standpoint or just from a personal standpoint, you got a writer, you’ve been working for a while and to some degree have an element of trust with them on.
John: Yeah. So false positives happen. I mean, there’s a, there’s a, I’d say that the framework that we are attempting to, that most people are attempting to use AI detection in is in the framework of plagiarism. That’s what we’ve used for the last 20 years is plagiarism detection. Does it pass plagiarism or not?
Yes or no. Um, go, no go decision. Simple A. I. Detections tougher because it all A. I. Detectors are probability machine. And that says here is the probability that it was a I generated versus the probability that it was human generated. And so although it will be, it can be very, very accurate, you know, on on non adversarial prompts.
It’s 99 percent accurate, 1 2 percent false positive rate. That still means when we’re running thousands of scans a day, we are getting, we are calling human generated content AI generated. And that causes, that causes pain. We, we know that, we hate it, it sucks. We’re trying to reduce it. Um, Tactically, what can we do when you, when you are working with writers or you are a writer and you have a false positive or potential false positive?
Um, we like to work with writers on a, on a series of articles, um, not just on an individual article by article case. If we have a writer that their content usually hits like 30%, 40 percent probability of AI. And then there’s one that hit a 60 percent probability of AI, and then it just dropped back down to 30, 40, and you believe them and you have a trust with them, that’s just a false positive.
Carry on everyone’s it’s, it should be, you know, I think that’s, that’s the right play in that scenario. If you have a writer that used to have 0 percent probability of AI generate content and then switched in a week, and now it’s getting a hundred percent probability of a content, That’s probably because they started using AI.
Um, and if you don’t want them using it, yeah, they, they found, they discovered chat GPT and said, Oh, I, I can, I can do a lot more contracts for, for this amount. Um, you know, we discovered we have a lot more people talking to us about the number of people that they’ve caught than the false positives they need to navigate, um, that the other, so we have a free Chrome extension to, to help with false positives that.
Uh, recreates the visualization, recreates the creation of a document. So if you were, if the writer wrote in a Google doc, you get editor access to the Google document and then user free firmware extension, and it recreates the visualization of the creation process. A lot of, you know, that can be tricked.
But a lot easier ways to sort of steal 100 bucks than to go through that, that entire process. So those, those are some of the tactical things. We have live support within Originality to try and help people navigate false positives and that’s greatly reduced the number of. Um, people think of false positive that people used to think of false positive is like if it says it’s 25 percent chance of a I 75 percent chance of of human and they know it’s human written that that they would call that a false positive.
It should show up as 100%. It’s not. It’s not the way the classifiers work. They say the detectors say. Our probability is AI versus the probability. It’s it’s human. So that says it’s 75 percent chance it’s human that correctly identified that article as as human generated. Um, and then also, um, people will use AI and then added it heavily and then say that, like, this is a human article.
It’s like, well, it’s tough. There’s a there’s that back to a spectrum of, like, there’s the full human and there’s the full AI, but it gets it gets tricky in between. Um, and that’s a problem that is not yet fully solved. But transparency with the writers and the us as a site publisher, um, need to be able on the same page on what is allowed and not allowed on our sites.
Jared: Yeah. The, it’s very complicated in, in, in practicality I find or an application. Um, I guess it doesn’t need to be, but it can be, uh, my business partner and I’ve had discussion a couple of times. The best way we’ve found to liken it is it’s a bit like the weather report, right? Like it can say 20 percent chance of rain.
And that doesn’t mean that. It doesn’t necessarily mean that it’s not going to rain or that it is going to rain. It just means that two out of 10 times when the algorithm ran the model for today’s weather, it showed up with rain and it can be a hundred percent chance of rain and only rain for 10 minutes that day.
And it still was accurate. Right. Yeah. Versus it can be a 30 percent chance of rain, but then rain sporadically all day. And all these scenarios are accurate and exist inside the same prediction. Right. Right.
John: You know, it’s, it’s exactly. And the fact that it didn’t rain once doesn’t mean It doesn’t mean you’re not going to trust the weather.
It doesn’t mean you’re not going to bring the umbrella when it says a hundred percent the next day, that there’s that these things have some amount of accuracy, also some amount of, of, of inaccuracy, um, as a, as a nature of being a predictive machine.
Jared: I’ve also heard people say mistakenly. I’m glad you kind of clarified the percentages there.
Like when it says 25%, that doesn’t mean 25 percent of the article is AI. It means the article is a 25 percent chance of being AI generated, right?
John: Yeah. Yeah, exactly. Um, well, let’s talk about that. Go ahead. Good. Yeah. No, it’s a, it’s a, there’s, there’s also a lot of in misinformation that has come out where like there’s, you know, we’ve done a ton of work to try and communicate the limitations of, of our tool on different, different data sets.
Every publicly available data set. We’ve run our tool through to so that we can sort of transparently communicate the, the efficacy, um, even when those numbers are, are not where we wish they were. Um, Um, you know, I think there’s a lot of misinformation out there as a result of open a I, you know, communicating the detectors don’t work because their detector was so tuned to reducing false positives that became useless.
And there’s other detectors that are out there that are claiming accuracy rates with no, no communication of their data set. And and it’s just it leads to this world of.
People saying textures are don’t work and people saying textures are perfect and not accepting any article that has any AI in it. And both of those are wrong. Um, and unfortunately for all of us, we need to navigate a more complex world now.
Jared: Yeah. Yeah. Another challenge we’ve had a couple of times is when we’re using optimization software, because inevitably whenever you use optimization software, you’re trying to make from a density standpoint, certainly, but other things as well, trying to make your article more like other content, theoretically, the content that’s ranking better than you.
Well, in essence, you’re, you’re making it. I mean, I haven’t created any software around AI detection, but it stands to reason you’re making your article look more and more like what’s already on the web, and therefore, potentially, and we’ve seen it play out, getting a higher AI detection score.
John: Agreed. Yeah.
So any, any, anytime AI is used in the creation of content, it increases the chances that the detector is going to identify it as AI generated. Um, heavy use of Grammarly. Um, a heavy use of SEO optimization tools all lead to an increased probability, increased likelihood that that content will look like it was a generated, um, which potentially is okay.
Potentially isn’t again comes back to sort of that agreement between, um, and we’ve seen some publishers work with writers to say, like, submit your pre optimized content. And then we know you’re going to go and optimize it. And so doing the, the anti AI check at that pre optimization stage. Um, and then they go and optimize it.
That’s smart.
Jared: Okay. Well, that dovetails nicely. What about that second scenario where you’ve got people out there who are. Uh, I don’t know what spectrum they’re on in terms of how much of their content is AI produced versus human produced, but they’re trying to, um, make their content look less AI. They’re trying to, uh, at a very tactical level, get the score, the percentage back from originality.
ai to be lower from AI, right? And, and so how do you navigate that? How do you talk about that? What do you say to that? If there, if it’s something that you support, what tips do you have for that?
John: Yeah. So I’d say it’s something that we don’t support. Um, you know, I mean, we support people using their tool.
That’s great. Um, but I don’t think it’s a useful, we’re not Google. Um, Google will have their own algorithm for identifying if content was AI generated, um, creating content with AI. And then trying to use other A. I. To bypass you it. The only methods we have seen to achieve that reduces the quality of the content.
Um, and in the end, that does not serve the users and still leaves fingerprints of and we’ve seen no method that is consistently effective at bypassing, um, detection with the exception of turning it into absolute gibberish. Um, and so I think if you know you’ve used a I and you’re comfortable with using a I.
I think that same energy that you put into trying to trick a tool that isn’t Google is better spent in finding, putting that energy into finding ways to make that piece of content. Um, a net add to the Internet versus tricking, tricking originality. If you know you used AI, accept it. You know you’re gonna get a high AI score, publish the best possible piece of content you can, and spend the energy on tricking originality into, um, into making the piece of content more, more useful to the, to the readers, because that’s ultimately what, what Google and your readers want.
Um, It’s fun to try and trick originality. I mean, we, we have a red team that that’s what they do all day is try and find ways of, of tricking originality. Um, and then every time they find a way that is marginally effective, we train our data. We build a data set off the back of that and train our detector on it.
Um, so it’s I get it. It’s fun. It’s fun to game systems. That’s kind of what to some extent what a lot of SEO is about. Um, but yeah, I don’t I don’t reckon I don’t reckon don’t recommend it because I think it’s just a it’s it’s an effort that doesn’t lead to. I think any net net benefit
Jared: to anyone. Right.
Let’s say you’re someone out there who has an article that’s scoring really high in originality to AI, uh, for whatever reason, does, does, does doing things like adding unique imagery, uh, putting unique tables in, pulling in different data sets that you’ve gone and found on your own, does that actually help reduce that score?
Or is that a score that once you have the, the base of the article created, it’s going to trigger and going to swing that way, no matter what.
John: Yes. It, what, once you have, so, you know, one of the funny things with AI is. Um, you know, when people ask us like what, what, why triggered this article to be a generated, you know, the, the kind of crazy answers we don’t know, you know, our, our AI sat, you know, equivalent, like I’ve sat in a factory, a warehouse that had a human articles and the millions of, of AI articles.
And it had this giant brain and learned to tell the difference between the two and recognize patterns. Um, we don’t know what all those patterns are that it recognizes. That’s where AI is so powerful. Um, And so once once it’s been triggered, um, it can be very hard to sort of identify what it was that that triggered it.
Um, and so all those things that you just talked about adding unique data is awesome. You know, I think if if you know that it was human created, It got a high AI score, we have our chrome extension to ensure that that can be communicated to the customer that this was human written. Here’s where, how you can see that.
Um, and if that is a one-off case for that writer, um, that would we, you know, we would hope that the person purchasing that piece of content would say, great. We trust you. Carry, carry on. And then the rest of that effort being spent adding in all the things that you just talked about that makes that piece of content more, more useful.
Jared: Well, that’s good. I’m really glad we had a conversation around the best practices for using something like an original data, originality on AI, because there’s both a lot of confusion in how to interpret the tool. And we sorted through that, but also in. The best way to utilize the tool, you know, and I think a lot of people will hopefully better understand the tool where it’s best applied, where it’s not best applied, where they’re wasting potentially their time trying to, trying to, to, to modify and adjust things.
And I think you’ve drawn a good line that I want to just kind of underscore again, like, um, it’s not about AI versus not AI. It’s about having a tool to help you understand what you are and aren’t getting. And then, you know, in terms of content creation, it’s not making a judgment On the validity of the content for the internet.
It’s making a judgment on its likelihood of being AI creator or not. That’s all.
John: Yeah, yeah, no, exactly. It’s just about providing that. And, and, you know, we talked about a section where we have like the plagiarism detection, fact checking readability, it’s about sort of letting publishers make sure that you’re able to hit publish.
With a piece of content that meets the standards that they’re, they’re trying to achieve for their site.
Jared: Yeah. We talked about a lot of companies at the beginning that are using AI in a way that’s maybe not as, uh, as open with their audience, but certainly for a company that wants to be open with their audience, they still have to make sure they can actually obtain that and actually hit that every single time.
So, yeah. Yeah, doesn’t make the news as much. But, um, Hey, we got a few more minutes left. I know we talked a lot about your study of the manual penalties, but, um, you know, you and I had gone back and forth about a number of studies that you guys have done, some case studies, some cool results, some cool things.
Um, I mean, we probably have about five or 10 minutes. Anything that comes to mind that you think would be fun to close on and share?
John: Yeah. I mean, I think what’s interesting is the, uh, You know, we’re using their tool a ton to look at just where’s AI content, you know, I’ll use the word polluting, not necessarily the right word for it, but where is it?
Where is it polluting the Internet? Um, and what we’ve seen is, you know, Some really fascinating places. Um, so some of the review sites like a G2, TrustRadius, software review sites have had up to 30 percent of their reviews since the launch of Chat2BT being AI, suspected of being AI generated. Um, and so, you know, when you’re going online to read a review, you’re looking at, Uh, reading a review or you, you need to complete a Turing test where basically you’re trying to figure out is this review that I’m reading, a human that I’m interacting with, or, or an AI that I’m interacting with?
Um, we’ve also seen other review sites start to like have their, have their num ai generated number, so like gone from like a 2% review rate, which sort of falls in line with our false positives. That predated, uh, GPT three, and then it sort of climbed up to like 10 percent and then chat GPT launched, jumped to 30%.
And then we’ve seen some sites being able to sort of effectively bring that back down. So some sites trying to work on, on reducing that we’ve seen, uh, Reddit, um, you know, uh, sort of a SEOs. One of SEO’s current favorite sort of kicking, kicking boys online of, of sort of, uh, complaining about how much organic traffic Reddit gets compared to, compared to all our sites.
Um, and we’ve seen a significant increase in the number of posts that are AI generated on, on Reddit, even though, you know, I think potentially the theory around why is Reddit gotten, why have all these user generated sites? Um, leads gotten such a lift in Google is in part because Google is trying to prioritize human first content, and these sites have a decent human filter of.
Of human versus, versus just spam already cooked into it. Um, so it’s sort of an extra layer of that, of that human versus, versus machine filtering on the user generated sites. Um, so that, that study was, we found was interesting. Yeah.
Jared: I mean, I guess, what can the individual publisher take from that? Uh, aside from being fascinating, by the way, which I’m fascinated by the whole thing, but what can the individual publisher take from that?
John: I think it says, I think it says that. I think it says that the society as a whole hasn’t worked out where it’s okay and not okay to use AI generated content. I think a lot of us would agree that we don’t want to read a review that was AI generated unless we know that there was a human behind it that reviewed that feedback and communicated it.
But what we don’t want is an AI that says, hey, write a review on this water bottle, and that’s the review that we’re reading, making a purchasing decision. I think that’s, that’s bad. Um, We don’t like that. And I think what we’re also seeing is Google by prioritizing user generated sites is also trying to wrestle with this.
Yet incomplete ability to manage a generated span. Um, and so I think I think my take away from it is the world is still wrestling with what is how we want to live in a in a in a generative AI world. Um, and that is not yet finalized, but just because it’s working just because you know, what was the takeaway?
Just because it’s working now, um, doesn’t mean that that’s the Going to be working in the future in the form of mass producing AI generated content.
Jared: It’s very interesting. The whole concept of user generated, if you literally look at the words is that it’s not AI and if AI is flooding. So the UGC platforms, then it almost flies in the face of what people originally wanted.
So you’re going to have a little bit of a, uh, of a crux on their hands here pretty soon at this point, especially with some of the data you just shared. Yeah. Uh, John, that was fun. That hour flew by where can people catch up with you? You’re very active. I know in this, in this industry and have been for quite some time, but where can people catch up, follow along, you know, touch base with you.
If anything like that.
John: Yeah, I’m on. I’m on X and use it a bit. I’m on LinkedIn again. Use it a bit. Um, but, uh, me, my main focus right now is on originality. And, uh, yeah, I can reach out to, uh, John J. O. N. at originality dot A. I. And happy to, uh, have it. You know, if anyone has any questions related to this, Best practices around working, uh, AI detection into their content creation workflow.
Um, yeah, happy to happy to chat.
John, thank you so much. Been great to have you on. Welcome back. Thank you again. My first time interviewing you though. So it definitely has been a couple of years. Thanks again. And we’ll catch up with you again
John: soon. Sounds great. Awesome. Thanks Jared.