Louis Rosenfeld

Search Analytics for Your Site


Скачать книгу

2-3 provides even more detail on each field.

      Table 2-3.

http://www.flickr.com/photos/rosenfeldmedia/5826101190/

Details About Fields
Position Field Example Meaning
#1 IP or host name XX.XX.XX.14 ID of the computer sending the search.
#2 auth. user - usually empty, RFC931 authentication
#3 user name - usually empty
#4a date [10/Jul/2010 date of the query in standard form
#4b time :10:24:i3 time of the query in standard form
#4C offset -0800] offset time from GMT[a]
#5a request “GET HTTP results (form action)
#5b URL /search.html search results page URL
#5c parameters ?query=noise search terms and other options
#5d version HTTP/1.1” version (always the same)
#6 response code 200 server response code (if it’s not 200, you are in trouble)
#7 bytes 9249 bytes returned (the size of the search results HTML page)
#8 (non-standard but widely used) hit count III number of matches found[b]
[a] The GMT offset is important because you must have accurate timestamps to look for patterns of usage, such as spikes of traffic at lunchtime. Tracking the time relative to GMT lets analytics systems merge search logs from multiple time zones, which is especially important when adjusting for Daylight Savings Time. [b] Some search engines return the approximate number of hits, rather than provide a definitive number. This is usually because they are reserving the option to check whether the user has security access to additional documents. If you don’t have confidential documents, you may be able to disable the access check and get a real number.

       WHAT EXTENDED LOG ENTRIES LOOK LIKE

      Optional fields can be quite helpful as well. These include the “referer” field (it should be “referrer,” but the spec spelled it wrong, so now we’re stuck with this misspelling), which can offer insights into site navigation problems; the user-agent for recognizing various platforms using the search; and an optional cookie, which is better than IP address for tracking searchers. To conform to other Web log formats, these fields might come before the hit count and time taken fields.

      An extended log entry could look like this (detailed below in Table 2-4):

      Table 2-4.

http://www.flickr.com/photos/rosenfeldmedia/5826101254/

Extended Fields
Position Field Example Meaning
#9 referer URL http://search.example.com/search?q=sound The page that the user was on when he searched: in this case, from a search results page for the query “sound”.
#10 user-agent “Mozilla/5.0 (iPhone; U; CPU iPhone OS 2_2 like Mac OS... The browser or app that sent the query. These are most useful for getting client metrics (especially mobile) and recognizing robot crawlers.
#11 cookie “USERID=CustomerA; IMPID=01234” Cookie for server session (rare).

       SEARCH PARAMETERS

      Most search engines stick to the common format for additional options and settings (such as language or in the search part of the request). They start after the results page URL with a question mark and then put in a code followed by an equal sign followed by a value, delimited by an ampersand (or comma or semicolon), like this:

      search.html?qq=noise&zone=all

      There’s no standard, so the query parameter might be q, qq, qt, qry, query, w, words, s, st, search, or something else entirely. This, and all the other codes, should be documented by the search vendor or open-source group. (We’ve provided an example below, as well as details in Table 2-5.) You’ll find this information useful if you need to “teach” your analytics application what to look for to identify—and parse out—actual queries from your logs. Here is an example of a query parameter:

      Table 2-5.

http://www.flickr.com/photos/rosenfeldmedia/5826101316/



Query Parameters
Code Field Example Meaning
q query q=noise The search terms, in this case “noise”
1 language l=fi The searcher’s language, here it’s Finnish