analyzing at all. That said, it’s still won’t hurt to poke into even a small data set, given that...
Lack of tools: ...the price of analytics tools is coming down. Way down. Like, free, thanks to Google Analytics (though you won’t be able to use a hosted service for your intranet). It’s not perfect, but it’s pretty useful, especially given the price. And if you’re working with simple data, Excel will do in a pinch.
But these barriers to taking advantage of SSA don’t explain why it’s still something of an unknown in most circles. So why has SSA fallen through the cracks?
Who Is Responsible for SSA?
Frankly, in smaller, less advanced organizations, SSA receives little or no attention. It’s just one of a few dozen non-urgent aspects of maintaining a Web presence—like meeting accessibility standards or keeping content fresh—that often gets pushed aside as assorted fires get put out. And when it does get done in these settings, it’s by a webmaster who already wears nine other hats.
In more advanced settings, where there are entire business units devoted to web analytics and user research, SSA still falls through the cracks. That’s because when SSA comes up, it seems just different enough from each unit’s existing daily responsibilities to assume that it’s someone else’s job. Why is that? It comes down to what people are comfortable with, and usually we’re comfortable with the familiar.
For example, web analytics people tend to prefer analyzing “cleaner” types of data—like conversion data—that have a more clear impact on the bottom line. (Monitor the Web Analytics Forum Yahoo Group for a week or two, and you’ll see what we mean.)[6] The successful conversion of a search is far more difficult to determine, much less measure, as language (and therefore searching) is so ambiguous. So, in a sense, the semantic richness of search query data is a double-edged sword—while the data might be quite interesting, it can be relatively difficult to analyze.
User experience people, on the other hand, tend to be less comfortable with numbers in general and data analysis in particular. They more typically rely upon qualitative analyses, where there are fewer expectations of conclusive, measureable outcomes and more is open to interpretation. And they may assume that analyzing data requires sophisticated expertise in statistical analysis. So, for UX people, SSA is usually on someone else’s table.
Let’s face it, in most situations today, SSA is no one’s job, but it should belong to someone (hence this book). Whatever your perspective—whether you’re a web analytics expert, a UX researcher, or a wearer of nine hats—you’ll want to have a clear picture of those top most common queries and how well your site is performing. And you’ll want to have that clear picture this month, next month, next season, and next year. Seeing SSA as part of your ongoing work (for example, 5% of your normal week) rather than as a one-off project (for example, a 12-hour assignment) will enable you to continually improve your site and make sure that it keeps up with the changes in its environment. The world around it changes, and like a living organism, your site must change as well in order to survive and thrive. And don’t lose sleep over when during the process—research, design, development, or maintenance—you tackle SSA. You’ll glean something small—good things at each point, none of which will likely take you off on a radical tangent.
Finally, if you’re one of those wearers of many hats, don’t fret: as mentioned earlier, SSA scales wonderfully. Even if you spend 15 minutes per month looking over the simplest reports—the most frequent queries list and the null results query list—you’ll get something useful out of your analysis. This month’s 15 minutes of tuning can gently grow to 30 minutes next month, and so on. The work is the same—it will fill whatever time you can make or justify for it.
[6] http://tech.groups.yahoo.com/group/webanalytics/
Your Secret Weapon
Thank your lucky stars: SSA remains safely under the radar. No one owns it, and the people in most organizations who are closest to it—the IT folks who manage the search engine—aren’t likely to worry much about things like user intent. So if you can crack open the data, you (and your organization) will own the keys to a very powerful secret weapon. Read ahead.
Anatomy of a Search Log Entry
Avi Rappoport, Search Tools Consulting— http://searchtools.com/
Though most of us are now using analytics applications that provide some SSA reporting functionality, you may be in a situation where you’ll have to create your own reports—either because the analytics application doesn’t support your specific needs—or because you don’t have access to an analytics application. In both cases, you’ll need to process the data yourself.
Working with search engine transaction logs, you’ll find the search query, any search parameters (such as language or date), and the number of matches retrieved by the search engine. Most also contain the date and time, and some kind of searcher identifier. Understanding the format makes it easier to understand search analytics reports, recognize what they can and can’t tell you, and perform special processing for unusual questions.
Many search engines conform to the NCSA extended Web server log format,[7] so that’s what we’ll cover here. These text files have a standard field order, with spaces between them. To indicate a field with internal spaces, it needs double quotes or square brackets at the start and end.
However, there’s no place in the NCSA extended format for the hit count (the number of items matched in the search), so search engines tend to slide it in the middle or hang it off the end. If your search log format is not documented, you may need to do some sleuthing: you can figure this out by entering several unique searches that you know will generate no matches, and then look in the search log for those terms.
BASIC FIELDS
A simple query entry in this log format looks like this:
XX.XX.XX.14 - - [10/Jul/2010:10:24:13 -0800] "GET /search?q=noise HTTP/1.1" 200 9429 111
We can break that down into fields for better analysis, as shown in Table 2-2.
Table 2-2.
http://www.flickr.com/photos/rosenfeldmedia/5826101122/Fields By Position | ||||||||
---|---|---|---|---|---|---|---|---|
#1 | #2 | #3 | #4 | #5 | #6 | #7 | #8 | |
meaning | ip | - | - | date/timestamp | search request | response code | bytes | hits |
example | xx.xx.xx.14 | - | - | [10/Jul/2010:10:24:13-0800] | “GET/search?q= noise HTTP/I.I” | 200 | 9429 | III |