#clickstream #analysis #tools, #bigdata #cio #cto #cdo #agile #nosql #devops #database #cmo #digital #transformation #iot
ClickStream Analysis Tool
For those of you that want a free/easy clickstream analysis tool, have a look at StatViz. If you’re running Apache and using the standard log format then plugging in this tool is very easy.
- Download and install GraphViz. There’s an RPM for linux.
- Download and install StatViz in a directory. It’s basically one php file. The README file will tell you how to customize the configuration file and run it.
- I don’t have too many PHP apps running so there’s a couple of other things you may need to do. First, you’ll need PEAR:Config. Once you have this, uncompress/untar it the easiest thing to do is move it Config.php and the Config dir to /usr/share/pear. Second, statviz takes up a lot of memory so you may need to increase the memory_limit configuration parameter in your /etc/php.ini
That’s pretty much it.
You can run it using
/statviz.php –config configfile
and then create a gif file of the output by doing something like
dot -Tgif -o OutputGifFileName InputDotFile
If you put the output gif file in a web accessible dir then you’ll be able to see it from your browser.
Things To Look For
There are a number of things you’ll need to consider if you want accurate results:
- Make sure you look at the bot extensions and make best attempts to get these filtered out.
- Make sure you have all non-pages (graphics, js, css) filtered out.
- If possible, try to filter out requests from internal users. Statviz doesn’t have a filter for this, so I just scrubbed out of the logs myself using a grep -v.
- If you’re site has long URL’s, you will most certaintly want to clean them up before processing. The tool allows you to create an alias file, but you may need/want to do some log scrubbing on your own.
- Play around with the GraphNReferrerPairs parameter. You can get a lot more detail on site activity with higher numbers, but the graph becomes the graph then becomes a lot more complex to digest. If you decide on a large graph, you may need to modify the source and change the size of the graph. It defaults to 10, 8 and there isn’t a parameter to configure this. I changed it to 20, 16 for most of my small graphs ( GraphNReferrerPairs <> ) and to 40, 32 for larger graphs.
- Very long URLs are going to be a hassle, especially if they come from external referrers and out of your control. I put in some checks in the code to clip the very long URLs.
I’ve automated a couple of things on my site:
– A report that updates hourly on today’s activity.
– I archive a daily gif file. (I will add weekly and monthly in the future).
– I have a ‘full report’ that shows activity for the last 30 days. I update this daily.
I’ll put out another entry with a quick 101 on interpretting the results.