We all understand that the internet has bots and crawlers, spiders that gather data for search engines and for search engine optimisation purposes. These are essentially tools that bring information back to analyse and make use of for online business plans.
For the most part, site owners are unaware of these robots as they do not show up in analytics or site data. Managers and owners of websites are also able to ask the bots to behave in certain ways through use of robots.txt, an internet standard in coding.
Update: There has been more and more information coming out about semalt and security issues. While we are not in a position of research on this level, it would be remiss to not let others now that there are greater concerns about other than just invading Analytics and skewing data. Please read the rest of the information below with this in mind.
Is Semalt.com hurting your site?
Since its launch in January this year, Semalt has made its way around the internet to every site, big or small, and leaving its impression in your analytics. In essence it is harmless, it’s not malicious, there’s no spyware (as far as we can tell) and while they claim they are not ‘hurting’ your site – which they probably aren’t, your data is now skewed.
The Semalt crawler does not behave like a normal robot
Normal robots can be asked to not access certain parts of a website through robots.txt, and even without a direction, they usually don’t show up in website analytics, and they especially don’t record multiple visits a day.
Their crawler is not simulating real user behaviour. A real user comes to a site, spends some time and (usually) goes to more than one page. The Semalt crawler spends no time on the site, has 100% bounce rate (it exits immediately) and only goes to the one page.
As you can see from the Google Analytics information below, this particular site had 50 visits in the one month from this crawler. At 1.66 visits per day, it’s definitely not random.
Why is 100% bounce rate bad?
Any SEO practitioner who uses Google Analytics regards a bounce rate as a measure of relevance and usefulness of a page to the user, the lower the bounce rate, the more useful the page is to the potential customer.
Google lists improving your bounce rate as a powerful metric, according to the video produced by Google if we are good at direct marketing, the bounce rate will fall between 40 and 60%, and they list suggestions to help improve your bounce rate.
The first of which is to analyse data… and this data is now skewed.
To give you a comparison snapshot, 6 months ago (before Semalt) the sitewide bounce rate was at 37.14% for the month, and is now at 46.77% for the month.
Does the Semalt removal tool work?
As their bot does not respond to robots.txt, does not behave like a regular crawler, and is ‘only’ skewing your stats the only real method to stop them is to filter or block their visits. While they advertise a removal tool on their website, it doesn’t work.
For the website below I submitted it to their removal tool on June 6th, there was one ping on the 9th and nothing until the 16th… then it started again, I resubmitted it on the 17th and 20th. Finally I added a filter on the 18th of July. So far, so good.
I have also recently had a discussion on twitter with @JustinCutroni, Analytics Evangelist at Google about what to do with a site that is skewing analytics globally, and he said that running a filter is the best option to accurately represent the data that is being collected.
How can you stop Semalt coming to your site?
There are two methods that are currently being used to effectively stop seeing them in your stats, and having your data return to normal. While this cannot and does not stop them coming, at least your stats are now accurate.
Before doing the change to your analytics, I would recommend you setup a raw data view, along with a Master profile, so you have control over the insights the data is providing for you.
- Method one is to block Semalt through the HT access file – this is for advanced users.
- Method two is to create an exclusion filter in your Google Analytics.
If you are running a WordPress site, you can also mark them as a spam referrer.
So, while we cannot stop them, and they do seem to be harmless, you can at the least regain control over what you see on your site and have the information at hand for business planning and decisions about your website.
Have you any other methods that work or other information? Let us know below.