Web statistics are a very important part of web site development and marketing. Every day you collect statistics for, you are gathering valuable information to use in your marketing strategy. There is nothing better than looking at a document full of graphs showing increases in users or traffic; however, reading them and understanding how to use them effectively are two very different things.
When collecting stats, you could install multiple different statistics systems for the same set of data but more often than not no two systems will absolutely agree and this is where the biggest problem lies.
There are two main ways of collecting and analysing traffic to generate reports from:
1. Server Side
Using the raw log files generated by the web server, it is possible to get an accurate representation of how much traffic a site is pushing, a rough guess at the number of users on that site and other information. By default, log files tend to log the address the user is viewing from (their IP address), the request they made (complete file path), a time, and a string that identifies their browser – its “User Agent”. This user agent is often used to identify bots, or spiders, for reporting but can very easily be faked meaning this is not as reliable as it might seem to start with.
There are plenty of software packages available that will process log files into readable reports, but they vary in how they function so two reports from the same set of logs will different slightly in the final statistics they produce.
This method of log reporting has some disadvantages however:
- Proxied requests will all appear as having originated from the same internet address (IP address), so multiple users who share an internet connection could appear as the same user in the logs due to a shared user agent as well as IP.
- Log files won’t record files loaded directly from browser cache, as the browser never attempts a request back to the server for them. With the internet generating dynamic pages that explicitly forbid caching, this isn’t as much of an issue now as in the past, but it still can cause inaccuracies in your log reports.
2. Client Side Reporting
Knowing more about the user’s browser is beneficial as it allows you to target your site’s features to technology that you know is available. For example, nearly every browser on a PC supports Flash, but the browser on an iPhone doesn’t so if you have a large number of visitors using iPhones, you could supply a more compatible version of the site.
Client based reporting has the massive advantage of being able to track a user’s path through the site completely, start to finish. Some systems, like Google Analytics, take this data and allow you to plot heat maps of where users click through on a site.
Like log reporting, this method has a different set of problems:
- Its very easy to block, either by disabling the scripting language at the browser end or via a physical block on downloading the script to be run.
- It doesn’t cover every file request. With log reporting, every request that is made to the server gets its own entry in the logs, however with embedded script only the “pages” themselves get logged. If a user is directed to a resource directly on your site, say a Word Doc or a PDF download, this will never show as a visitor in client based logging where it will show in server side logging.
Are they useful?
Website Reports are, and will continue to be, very useful for designing, developing and marketing websites. It is very important to know who you are targeting and how users get to your site, and then what they do when they are there. However, it is very important to know the limitations and differences with the way the data is collated for your chosen method, and how the report is presented, to avoid including errors in the analysis of your site.
To improve the reliability of reports it is a good idea to consider more than one source for stats, for example, using a log-based report in combination with Google Analytics covers both types of data collection.