My hosting company is running two different web stats programs on the log files for this site – Webalizer and Awstats.
Same log files – you’d think they’d come up with something like similar results – not a chance!
In the Webalizer stats I am wildly popular:
Ok, maybe not wildly popular but when you compare the figures with the Awstats results, you can see I am far less popular according to Awstats:
For instance, look at the number of Visits for the month of July (the most recent month for which we have all the data). According to Webalizer this site had 135,352 visits in July whereas Awstats only has 30,035! Look at the number of pages served in July. Webalizer has this site serving 333,461 whereas Awstats has it only serving 178,395.
How can such enormous differences be accounted for? Which one do I believe? I honestly have no idea!
By the way, the peak in May/June has to do with the Web 2.0 controversy!
If you enjoyed this post, make sure you subscribe to my RSS feed!
Tom,
why don’t you install Google’s free web analytics?
You could then add up all three and calculate the average
I guess the algorithms used to compute “visits” from “hits” vary. There really should be a standard algorithm to do that…
Is one of those apps the de-facto standard for traffic comparisons? Surely advertisers would require a particular visit-computation algorithm as a key feature?
I’ve been struggling with log analysis software for years and used to cringe when my boss asked why they were so different.
As for the visits, that could be simple. One package thinks a visit is the amount of time spent on a site in 15 minute chunks, while the other thinks it’s 30 minutes.
Thanks for the suggestion Lar. I had Analytics on the site before but I could never make head or tail of half their graphs! I’ll have another look though.
On the advertising front, the only advertiser is Google and their ad service is based on the number of times their ad is clicked on. Because they are hosting the ad, they know how often it is clicked.
Donncha, Matt used to give you a hard time over stats, did he?
That may account for the differing Visits Donncha but not the Page Views which are also poles apart.
Sorry if this is a dup having some problems with the comment form.
I think this is because Webalizer doesn’t distinguish between human visitors and automated requests (bots etc.), whereas AWStats *tries* to distinguish between the two.
If you compare the two packages to Google Analytics you will have even less visits again.
Still I wouldn’t mind stats like those
tom – the differences are weird, but i’ve always believed that the most important aspect of log analysis is trend, and not actual numbers.
different engines work differently. awstats, i think, removes a certain amount of robot traffic, which is why it is usually lower than other stats packages on all values.
on some instalations of webalizer i’ve been faced with, i’ve noticed that it doesnt always know what a page actually is – on one site, pages were .*htm*, meaning .php pages weren’t counting as pages at all.
personally, my pref is always to download the the log files and a copy of webtrends i have over them. there is a degree more configurabilty with that than online stats packages offer
Thanks fmk,
I have a copy of Analog – I might try downloading the raw logs and running them through Analog for yet another view on the stats (!).
Or maybe I’ll just watch the trends, as you suggest. Actual numbers aren’t that important to me anyway and vary with which package you ask in any case!
I would trust AW Stats to deliver a picture of the real people visiting because the gigabyte transfer documented by AW Stats is more closely related to the page weights transferred when whole pages load.
I shift transfer around 7GB each month to serve around 38,000 monthly visitors. I use a combination of Six Apart and Statcounter monitoring to keep the two tracking methods honest.
Tom
If you look at the summary in Awstats you should see a section at the very top where you have “Viewed traffic” ie. humans and “Not viewed traffic” which is robots etc., ie. non-humans
Webalizer doesn’t even try to differenciate between the two and just gives you a total.
For January of 2006, for example, my blog did 20.24 GB of “human” traffic while it did a further 164.31 GB of bots etc.,
Michele
I’m using Webtrends for years, though they are charging an arm and a leg for their software (PLUS the latest release needed outrageously expensive hardware to run on [mind you - the analyzing software, NOT the webserver!]). Webtrends does a pretty good job on differentiating between hits and visits. They take a split approach with setting cookies and tracking IP’s. The results SEEM pretty straightforward. However they charge you based on traffic, and once I crossed the first threshold (and needed to pay 600 EUR for the upgrade plus 80 EUR for the mandatory maintenance) it seems I reach more and more traffic faster and faster. Honi soit qui mal y pense…
However, you seem to have missed a paradigm shift. As somebody mentioned before, it’s not about hits anymore, it’s about conversions. Hits – especially pagehits and not visits – are 20th century stuff to impress suits to get a larger budget. It’s all about humans and patterns nowadays. Have a look at Clicktracks – they even have a free software nowadays. They not only help you to streamline your page, but also do a great job in identifying vistor trends. You might even want to look at Visitorville. Rather expensive too (last time I looked at it), but it adds a fun element and does a pretty good job at showing patterns too.
I would be reluctant with Google Analytics. Call me paranoid but Google does already know enough about me and my sites. Thankfully there’s plenty of alternatives.
I agree about trust and Google Analytics. Something about Google Analytics feels like yet another piece of my data belonging to a Google data silo.
I too get wonderful numbers with Webalizer. I use StatCounter for key blogs to identify trends.
Now, the question is where can we get the real analysis? I have been using Google Analytics for quite sometime now and I had pretty much trusted them for their results.
At my company I have to switch from Windows Webserver onto Linux, which is ok wiht me on the one hand.
On the other hand I have problems with awstats. With the same logfile (imported) in the same log format and the same config of awstats.conf and the same of everything it shows a far less amount of visitors and visits (~ 40% of amounts under Windows).
The other numbers (pages, hits, bandwidth) are exactly the same, which just tells me, the logfiles are recognized somehow properly.
I have to admin on Suse I use version 6.6. On Windows it had been 6.5. Can’t be such a big difference.
Is there anyone who’d experienced the SAME weird thing or has any advice / guess what I can do???
I’ve just found out, that the entry for DNSLookup was different (1 vs 2), now the numbers are (more or less) the same again.
Another question:
If a domain includes a port number (to whatever reason) and I can see this url in “Links from an external page (other web sites except search engines)” with about 100.000 pages and 1.000.000 hits (total numbers are 230.000 and 2.200.000) for one month… can it be that one request is counted twice?
1.) the url with the port number (which isn’t really external)
2.) the redirection without the port number
We’d already get rid of the port number… but the high risin’ stats for former month keep on confusing me.
I have been using Webalizer for years and in the last 2 years awstats and in the last year google analytics. Wouldn’t analytics be more acurate since it tracks users via a script?
Personally I prefer the webalizer stats because they are higher…….
Dan
I’d stay away from Werbalizer. Although it does have major log speed analysis advantage over the accurate Awstats solution.
However accuracy rules speed at least for most sites.
why the werbalizer and g.analytics have different results?
because this stats give me different numbers, and i can`t understand why?!
One reason why google counts may be a little lower is their placement in your code. Google suggests placement just before the tag. Partial content load, a click to another page before the script can contact google, or a slow connection could all cause some misssed page views.
Check out the books Web Analytics Demystified by Eric Peterson and Web Analytics by Jim Sterne. They are each pioneers and titans in the field of web analytics, and these books are fantastic.
As for why different analytics packages don’t match up — they really don’t want to. They need to differentiate in order to prove they’re better or at least a better value. All need to make certain assumptions about what should count as what, and what should not. These assumptions are part of the differentiation — still, they are assumptions. Some really do more things better than others (data correlations, display, ease of use, etc.), but no package I’ve seen or used couldn’t be augmented by features of another.
However, in the past 8 years, I’ve used almost everything from analog to omniture in helping to improve client websites. They’re all tools, they all have limitations, they all are more or less appropriate for specific websites, and all can be very useful when used properly. (Though “very useful” may be a bit generous for some…)
Script tracking is not better or more accurate than log file analysis — it’s just different; more appropriate in some situations, less so in others, sometimes it’s ideal to have both.
IMO – The bottom line is that only trends and conversions matter. Trends give you indication of what’s going on and conversions are truly the only hard facts. Hits, visits, page views, etc. can only be approximations, regardless of assumptions, because not every visit to your website happens on your webserver — that’s the Net.
As for Google Analytics — believe me, I understand the reluctance of giving your data away, especially to the people who set the prices you pay for advertising. Still, Google Analytics is on par with other packages that are worth $1000’s per month — it is far and away more powerful than any other “free” analytics tool out there, and much better than a lot of commercial ones, too. Plus, chances are, your data is not so much different from your competitors who may be benefiting tremendously from use of Google’s tool.
And finally, none of these tools will do anything for your business — at best, they can tell you what’s happening on your website, but you still will need to figure out what that actually means and what to do about it. Graphs can only give you clues and look pretty – whether you pay for them or not.
I have experienced exactly the same problems, but it’s not always awstats that report lower numbers, sometimes the reverse.
Anyway, i agree with one of the first comments, use the Google analytics and take the average.
I DONOT like WebAlizer.
Main reasons:
1) Not very correct stats
2) Refspam through webalizer logs
Refspam is popular in my country, and in case they make it more often the site with WebAlizer may me ddosed.
Thats what i think
While this post is old and the comments on it date back to 2006, It is still found by Google as being relevant when it comes to comparing at least the 2 packages.
The 2 cents I wanted to put in has to do with the differences in comparing server side stats (awstats, webalizer, and analog) vs script stats (google analytics). Web server software like IIS and Apache keep a log of the ip addresses that request information (html files, php files, image files .. whatever) and store that information into log files. Server side stats read those log files and crunches it into graphs and information that is more easily understood by us humans. The differences in server packages differ because whoever set the standard for crunching in separate was explained above in that someone arbitrarily said this is relevant this way and not.
On the other hand is script type statistics, like Google analytics, which puts a special javascript snippit on the pages of your site, if a page doesn’t get that script then it can not gather any data. But there is a large advantage of using script based stats in that script languages like javascript are only run inside a browser. Spiders and Search engines that pull your site do not execute javascript and therefor do not count when it comes to a “visit” or a “hit”, this means the data in Google Analytics would be more relevant because you know that real eyeballs had to hit that page because it had to be a real browser that has javascript turned on that opened the page and executed the script that got counted as a “hit” or “visit”. Server side stats show all the requests including search engine spiders and the like.
While packages like AWStats try to differentiate real people from non-people, the issue is that server side stats somehow have to keep a list of what ip addresses spiders come from, or what broswer agent information is a spider and not some new browser like Google’s new Crome browser. If a new spider was made and the server side stats didn’t know that it was a new spider, how would it classify the visits from that visit?
Where as a script stats package would not see a spider hit since spiders hardly ever execute javascript while browsers would, thus getting a more accurate count of real eyeballs looking at the page.
Hope that helps some people understand the differences.
I would think if you want to be completely in charge of the data, you would need to write your own scripts to generate your own series of amalgamated data. Depending on the someone else’s apps has some trade offs over the convenience…
Alternatively, you can install 3rd app to see what is closer to one another in order to eliminate the bad seed (that is, the most out of wack one).
I agree with kotoponus. I made a personal script to collect data for the websites of my customers and I use google analytics just to make sure tat everything is ok.
Good post.
Google’s Analytics collects statistics ONLY for Google search engine, AWStats collects statistics of all search engines (at least in theory!)