Linked Lexical Knowledge Bases. Iryna Gurevych. Читать онлайн. Hotlib. HOTLIB.NET

Linked Lexical Knowledge Bases

on the synset level, i.e., between synsets, such as hypernymy or meronymy. Other sense relations, such as antonymy, are defined between individual senses, rather than between synsets. For example, while evil and unworthy are synonymous (“morally reprehensible” according to WordNet), their antonyms are different; good is the antonym of evil and worthy is the antonym of unworthy.

The Princeton WordNet for English [Fellbaum, 1998a] was the first such wordnet. It became the most popular wordnet and the most widely used LKB today. The creation of the Princeton WordNet is psycholinguisticially motivated, i.e., it aims to represent real-world concepts and relations between them as they are commonly perceived. Version 3.0 contains 117,659 synsets. Apart from its richness in sense relations, WordNet also contains coarse information about the syntactic behavior of verbs in the form of sentence frames (e.g., Somebody –_{_s} something).

There are various works based on the Princeton WordNet, such as the eXtended Word-Net [Mihalcea and Moldovan, 2001a], where all open class words in the sense definitions have been annotated with their WordNet sense to capture further relations between senses, WordNet Domains [Bentivogli et al., 2004] which includes domain labels for senses, or SentiWordNet [Baccianella et al., 2010] which assigns sentiment scores to each synset of WordNet.

Wordnets in Other Languages The Princeton WordNet for English inspired the creation of wordnets in many other languages worldwide and many of them also provide a linking of their senses to the Princeton WordNet. Examples include the Italian wordnet [Toral et al., 2010a], the Japanese wordnet [Isahara et al.], or the German wordnet GermaNet [Hamp and Feldweg, 1997].⁵

Often, wordnets in other languages have particular characteristics that distinguish them from the Princeton WordNet. GermaNet, for example, containing around 70,000 synsets in version 7.0, originally contained very few sense definitions, but unlike most other wordnets, provides detailed information on the syntactic behavior of verbs. For each verb sense, it lists possible subcat frames, distinguishing more than 200 different types.

It is important to point out, however, that in general, the Princeton WordNet provides richer information than the other wordnets. For example, it includes not only derivational morphological information, but also inflectional morphology analysis within its associated tools. It also provides an ordering of the senses based on the frequency information from the sense-annotated SemCor corpus—which is very useful for word sense disambiguation as many systems using WordNet rely on the sense ordering; see also examples in Chapter 4.

Information Types The lexical information types prevailing in wordnets can be summarized as follows.

• Sense definition—Wordnets provide sense definitions at the synset level, i.e., all senses in a synset share the same sense definition.

• Sense examples—These are provided for individual senses.

• Sense relations—Most sense relations in wordnets are given at the synset level, i.e., all senses in a synset participate in such a relation.

– A special case in wordnets is synonymy, because it is represented via synsets, rather than via relations between senses.

– Most other sense relations are given on the synset level, e.g., hyponymy.

– Few sense relations are defined between senses, e.g., antonymy, which does not always generalize to all members of a synset.

• Syntactic behavior—The degree of detail regarding the syntactic behavior varies from wordnet to wordnet. While the Princeton WordNet only distinguishes between few subcat frames, the German wordnet GermaNet distinguishes between about 200 very detailed subcat frames.

• Related forms—The Princeton WordNet is rich in information about senses that are related via morphological derivation. Not all wordnets provide this information type.

1.1.2 FRAMENETS

LKBs modeled according to the theory of frame semantics [Fillmore, 1982] focus on word senses that evoke certain scenes or situations, so-called frames which are schematic representations of these. For instance, the “Killing” frame specifies a scene where “A Killer or Cause causes the death of the Victim.” It can be evoked by verbs such as assassinate, behead, terminate or nouns such as liquidation or massacre.

The participants of these scenes (e.g., “Killer” and “Victim” in the “Killing” frame example), as well as other important elements (e.g., “Instrument” as “The device used by the Killer to bring about the death of the Victim” or “Place” as “The location where the death took place”) constitute the semantic roles of the frame (called frame elements in frame semantics), and are typically realized in a sentence along with the frame-evoking element, as in: Someone[Killer] tried to KILL him[Victim] with a parcel bomb[Instrument].

The inventory of semantic roles used in FrameNet is very large and subject to further extension as FrameNet grows. Many semantic roles have frame-specific names, such as the “Killer” semantic role defined in the “Killing” frame.

Frames are the main organizational unit in framenets: they contain senses (represented by their lemma) that evoke the same frame. The majority of the frame-evoking words are verbs and other predicate-like lexemes: they can naturally be represented by frames, since predicates take arguments which can be characterized both syntactically (e.g., subject, direct object) and semantically via their semantic role.

There are semantic relations between frames (e.g., the “Is_Causative_of” relation between “Killing” and “Death” or the “Precedes” relation between “Being_born” and “Death” or “Dying”), and also between frame elements.

The English FrameNet [Baker et al., 1998, Ruppenhofer et al., 2010] was the first frame-semantic LKB and it is the most well-known one. Version 1.6 of FrameNet contains 1,205 frames. In FrameNet, senses are called lexical units. FrameNet does not provide explicit information about the syntactic behavior of word senses. However, the sense examples are annotated with syntactic information (FrameNet annotation sets) and from these annotations, subcat frames can be induced.

FrameNet is particularly rich in sense examples, which are selected based on lexicographic criteria, i.e., the sense examples are chosen to illustrate typical syntactic realizations of the frame elements. The sense examples are enriched with annotations of the frame and its elements, and thus provide information about the relative frequencies of the syntactic realizations of a particular frame element. For example, for the verb kill, a noun phrase with the grammatical function object is the most frequently used syntactic realization of the “Victim” role.

Framenets in Other Languages The English FrameNet has spawned the construction of framenets in multiple other languages. For example, there are framenets for Spanish⁶ [Subirats and Sato, 2004], Swedish⁷ [Friberg Heppin and Toporowska Gronostaj, 2012], and Japanese⁸ [Ohara, 2012]. For Danish, there is an ongoing effort to build a framenet based on a large-scale valency LKB that is manually being extended by frame-semantic information [Bick, 2011]. For German, there is a corpus annotated with FrameNet frames called SALSA [Burchardt et al., 2006].

Information Types The following information types in the English FrameNet are most salient.

• Sense definition—For individual senses, FrameNet provides sense definitions, either taken from the Concise Oxford Dictionary or created by lexicographers. Furthermore, there is a sense definition for each frame, which is given by a textual description and shared by all senses in a frame.

• Sense examples—FrameNet is particularly rich in sense examples which are selected based on lexicographic criteria.

• Sense relations—FrameNet specifies sense relations on the frame level, i.e., all senses in a frame participate in the relation.

• Predicate argument structure

Скачать книгу