2-3 provides even more detail on each field.
Table 2-3.
http://www.flickr.com/photos/rosenfeldmedia/5826101190/Details About Fields | |||
---|---|---|---|
Position | Field | Example | Meaning |
#1 | IP or host name | XX.XX.XX.14 | ID of the computer sending the search. |
#2 | auth. user | - | usually empty, RFC931 authentication |
#3 | user name | - | usually empty |
#4a | date | [10/Jul/2010 | date of the query in standard form |
#4b | time | :10:24:i3 | time of the query in standard form |
#4C | offset | -0800] | offset time from GMT[a] |
#5a | request | “GET | HTTP results (form action) |
#5b | URL | /search.html | search results page URL |
#5c | parameters | ?query=noise | search terms and other options |
#5d | version | HTTP/1.1” | version (always the same) |
#6 | response code | 200 | server response code (if it’s not 200, you are in trouble) |
#7 | bytes | 9249 | bytes returned (the size of the search results HTML page) |
#8 (non-standard but widely used) | hit count | III | number of matches found[b] |
[a] The GMT offset is important because you must have accurate timestamps to look for patterns of usage, such as spikes of traffic at lunchtime. Tracking the time relative to GMT lets analytics systems merge search logs from multiple time zones, which is especially important when adjusting for Daylight Savings Time. [b] Some search engines return the approximate number of hits, rather than provide a definitive number. This is usually because they are reserving the option to check whether the user has security access to additional documents. If you don’t have confidential documents, you may be able to disable the access check and get a real number. |
WHAT EXTENDED LOG ENTRIES LOOK LIKE
Optional fields can be quite helpful as well. These include the “referer” field (it should be “referrer,” but the spec spelled it wrong, so now we’re stuck with this misspelling), which can offer insights into site navigation problems; the user-agent for recognizing various platforms using the search; and an optional cookie, which is better than IP address for tracking searchers. To conform to other Web log formats, these fields might come before the hit count and time taken fields.
An extended log entry could look like this (detailed below in Table 2-4):
XX.XX.XX.14 - - [10/Jul/2010:10:24:13 -0800] "GET /search?q=noise HTTP/1.1" 200 9429 111 0028 "http://search.example.com/ search?q=sound HTTP/1.1" "Mozilla/5.0 (iPhone; U; CPU iPhone OS 2_2 like Mac OS X; en-us) AppleWebKit/525.18.1 (KHTML, like Gecko) Version/3.1.1 Mobile/5G77 Safari/525.20" "USERID=CustomerACooke;IMPID=01234"
Table 2-4.
http://www.flickr.com/photos/rosenfeldmedia/5826101254/Extended Fields | |||
---|---|---|---|
Position | Field | Example | Meaning |
#9 | referer URL | http://search.example.com/search?q=sound | The page that the user was on when he searched: in this case, from a search results page for the query “sound”. |
#10 | user-agent | “Mozilla/5.0 (iPhone; U; CPU iPhone OS 2_2 like Mac OS... | The browser or app that sent the query. These are most useful for getting client metrics (especially mobile) and recognizing robot crawlers. |
#11 | cookie | “USERID=CustomerA; IMPID=01234” | Cookie for server session (rare). |
SEARCH PARAMETERS
Most search engines stick to the common format for additional options and settings (such as language or in the search part of the request). They start after the results page URL with a question mark and then put in a code followed by an equal sign followed by a value, delimited by an ampersand (or comma or semicolon), like this:
search.html?qq=noise&zone=all
There’s no standard, so the query parameter might be q, qq, qt, qry, query, w, words, s, st, search,
or something else entirely. This, and all the other codes, should be documented by the search vendor or open-source group. (We’ve provided an example below, as well as details in Table 2-5.) You’ll find this information useful if you need to “teach” your analytics application what to look for to identify—and parse out—actual queries from your logs. Here is an example of a query parameter:
search?q=noise&l=f1&s=21&p=20v=housewares&i=1
Table 2-5.
http://www.flickr.com/photos/rosenfeldmedia/5826101316/Query Parameters | |||
---|---|---|---|
Code | Field | Example | Meaning |
q | query | q=noise | The search terms, in this case “noise” |
1 | language | l=fi | The searcher’s language, here it’s Finnish |