Jannik Strötgen

Domain-Sensitive Temporal Tagging


Скачать книгу

March 2015 (“Tuesday” and “March”, respectively), and they thus satisfy the information need.

      Figure 1.4: Temporal information retrieval example. Given the query 〈“Germanwings”, “1st of March 2015 to 30th of April 2015”〉, both documents can be identified as relevant if a temporal tagger is used to extract and normalize the temporal expressions in the documents’ content.

      A further interesting observation from Figure 1.4 is that the term “Tuesday” in the first document refers to a date within the time interval of interest (March 24, 2015) while the same term in the second document does not (here, it refers to November 10, 2015).

       TEMPORAL TAGGING FOR QUESTION ANSWERING

      A further area in which time is a crucial dimension is question answering. While this is one commonality with information retrieval, the two tasks share further aspects: In both areas, a user is faced with an information need, and the goal of both information retrieval and question answering is to satisfy this information need. In contrast, the main differences between them is that in information retrieval, the information need is typically formulated as a query consisting of keywords—possibly enriched with time intervals of interest in the area of temporal information retrieval—but in question answering, the information need is formulated as a natural language question. Analogously, the presentation of results is also different: in information retrieval, a ranked list of relevant documents is typically presented to the user while in question answering, the answer to the information need is directly provided.

      On the border between both areas lies so-called entity-oriented search [Balog et al., 2012]. A typical information retrieval query is to ask for a specific entity or fact about an entity. Thus, the goal of entity-oriented search is—as in question answering—to directly provide an answer, in the ideal case together with a justification, e.g., in the form of small text nuggets rather than full-length documents [Pasca, 2008]. An example of such a query with a temporal dimension is the query “Golden Gate bridge built” with the answer “1937”.

      A research competition dealing with temporal (and geographic) information needs is NTCIR GeoTime [Gey et al., 2010, 2011]. As in question answering, the information needs are formulated as natural language questions. Due to the temporal and geographic focus of the competition, the questions contain “where” and “when” aspects. However, unlike in standard question answering, systems are not evaluated based on whether they provide the correct answer, but on whether or not the documents ranked in a result list answer the question and are thus relevant. That is, the evaluation is performed in an information retrieval fashion.

      In contrast to entity-oriented search and GeoTime, which both directly benefit from extracted and normalized temporal expressions, time-related question answering often deals with more complex temporal phenomena [Pustejovsky et al., 2005]. Then, temporal tagging on its own is not sufficient but temporal reasoning is often necessary, for example, to answer questions of the form “did event x happen before event y?”. To be able to automatically answer such questions, the full task of temporal information extraction is required—including the subtasks of temporal tagging, event extraction, and temporal relation extraction. In the recent QA TempEval challenge at SemEval 2015, in which temporal information extraction systems were to be developed, the systems were evaluated solely based on how well they perform in answering such time-related questions for which temporal reasoning is important [Llorens et al., 2015]. In Chapter 3, we will detail how temporal taggers can be evaluated in general.

       TEMPORAL TAGGING FOR SUMMARIZATION

      While the value of temporal tagging for the above examples is quite straightforward, there are further application scenarios, in which temporal tagging can provide more indirect benefits. An example of such an application scenario is the document summarization task.

      In the text summarization community, it is well known that coreference resolution is valuable to create better text summaries [Azzam et al., 1999, Steinberger et al., 2007]. Similar to coreference relations between (proper) nouns and pronouns, the relations between temporal expressions could also be taken into account to improve summaries. Assume the document that is to be summarized contains the following two sentences consecutively:

      • s1 = 〈In 2010, something unimportant happened.〉

      • s2 = 〈One year later, something important happened.〉

      Obviously, good document summarizations should contain important information, that is, in our example s2 should be part of the summary but s1 should not be contained in the summary. However, without proper context information, the semantics of s2 is unclear due to the ambiguity of “One year later”. To fully understand s2, the reader requires a reference time to resolve the relative temporal expression. Unfortunately, this reference time is part of s1 (“2010”).

      One solution to address this issue is to include both sentences in a summary. However, this results in a summary containing unimportant content, so that a better approach is to exploit the information provided by a temporal tagger (in s2 that “One year later” refers to 2011). In this way, the unimportant sentence s1 could be skipped, and s2 could be part of the summary in a slightly modified way, for instance, starting with “One year later (2011), something important happened”. Note that even for the first solution, some information about occurring temporal expressions is necessary, namely that s1 contains the reference time of s2.

      In the context of temporal tagging, two tasks can be distinguished: extraction and normalization of temporal expressions. In several NLP-related research areas, and thus in many applications, temporal tagging output can be exploited to improve the approaches. Note that for almost all applications and research topics exploiting temporal information, the normalization subtask is highly crucial.

      1Timeline: Cross-Document Event Ordering, http://alt.qcri.org/semeval2015/task4/ [last accessed: Nov 9, 2015].

      2Question Answering TempEval, http://alt.qcri.org/semeval2015/task5/ [last accessed: Nov 9, 2015].

      3Clinical TempEval, http://alt.qcri.org/semeval2015/task6/ [last accessed: Nov 9, 2015].

      4Diachronic Text Evaluation, http://alt.qcri.org/semeval2015/task7/ [last accessed: Nov 9, 2015].

      CHAPTER 2

       The Concept of Time

      In the previous chapter, we already have implicitly exploited some characteristics of temporal information to explain the motivating examples. Now, we formulate the key characteristics of temporal information in a precise manner (Section 2.1). Then, we highlight the differences between multiple types of temporal expressions occurring in textual documents (Section 2.2) and analyze their possible textual realizations (Section 2.3).

      There are three key characteristics of temporal information that make this kind of information highly valuable for many search and exploration tasks. They can be formulated as follows [Alonso et al., 2011].

       TEMPORAL INFORMATION