An even greater expansion of these semantic regions can be found in more recent work (Huth et al., 2016).
So how does an encoding model work? The model uses linear regression to map from a vector representation of a word to the intensity of a single voxel measured during an fMRI scan (and representing the activity in a bit of brain). This approach can be generalized to fit multiple voxels (representing the whole brain) and trained on a subset of word embeddings and brain scans, before being tested on unseen data in order to evaluate the model’s ability to generalize beyond the words it was trained on. But what do vector representation and word embedding mean? This field is rather technical and jargon rich, but the key ideas are relatively easy to grasp. Vector representations, or word embeddings, represent each word by a vector, effectively a list of numbers. Similarly, brain states can be quantified by vectors or lists of numbers that represent the amount of activity seen in each voxel. Once we have these vectors, using linear regression methods to try to identify relationships that map one onto the other is mathematically quite straightforward. So the maths is not difficult and the brain activity vectors are measurable by experiment, but how do we obtain suitable vector representations for each word that we are interested in? Let us assume a vocabulary of exactly four words:
1 airplane
2 boat
3 celery
4 strawberry
One way to encode each of these as a list of numbers is to simply assign one number to each word: ‘airplane’ = [1], ‘boat’ = [2], ‘celery’ = [3], and ‘strawberry’ = [4]. We have enclosed the numbers in square brackets to mean that these are lists. Note that it is possible to have only one item in a list. A good thing about this encoding of the words, as lists of numbers, is that the resulting lists are short and easy to decode: we only have to look them up in our memory or in a table. But this encoding does not do a very good job of capturing the differences in meanings between the words. For example, ‘airplane’ and ‘boat’ are both manufactured vehicles that you could ride inside, whereas ‘celery’ and ‘strawberry’ are both edible parts of plants. A more involved semantic coding might make use of all of these descriptive features to produce the following representations.
In Table 3.1, a 1 has been placed under the semantic description if the word along the row satisfies it. For example, an airplane is manufactured, so the first number in its list is 1, but ‘celery,’ even if grown by humans, is not manufactured, so the first number in its list is 0. The full list for the word ‘boat’ is [1, 1, 1, 0, 0], which is five numbers long. Is this a good encoding? It is certainly longer than the previous encoding (boat = [2]), and unlike the previous code it no longer distinguishes ‘airplane’ from ‘boat’ (both have the identical five‐number codes). Finally, the codes are redundant in the sense that, as far as a linear‐regression model is concerned, representing the word ‘boat’ as [1, 1, 1, 0, 0] is no more expressive than representing it as [1, 0]. Still, we might like the more verbose listing, since we can interpret the meaning of each number, and we can solve the problem of ‘airplane’ not differing from ‘boat’ by adding another number to the list. That is, if we represented the words with six‐number lists, then ‘airplane’ and ‘boat’ could be distinguished: airplane = [1, 1, 1, 0, 0, 0] and boat = [1, 1, 1, 0, 0, 1]. Now the last number of airplane is a 0 and the last number of boat is a 1.
Table 3.1 Semantic‐field encodings for four words.
Word | Manufactured | Vehicle | Ride inside | Edible | Plant part |
---|---|---|---|---|---|
airplane | 1 | 1 | 1 | 0 | 0 |
boat | 1 | 1 | 1 | 0 | 0 |
celery | 0 | 0 | 0 | 1 | 1 |
strawberry | 0 | 0 | 0 | 1 | 1 |
So far, our example may seem tedious and somewhat arbitrary: we had to come up with attributes such as “manufactured” or “edible,” then consider their merit as semantic feature dimensions without any obvious objective criteria. However, there are many ways to automatically search for word embeddings without needing to dream up a large set of semantic fields. An incrementally more complex way is to rely on the context words that each one of our target words occurs within a corpus of sentences. Consider a corpus that contains exactly four sentences.
1 The boy rode on the airplane.
2 The boy also rode on the boat.
3 The celery tasted good.
4 The strawberry tasted better.
Our target‐words are, again, ‘airplane,’ ‘boat,’ ‘celery,’ and ‘strawberry.’ The context‐words are ‘also,’ ‘better,’ ‘boy,’ ‘good,’ ‘on,’ ‘rode,’ ‘tasted,’ and ‘the’ (ignoring capitalization). If we create a table of target words in rows and context words in columns, we can count how many times each context word occurs in a sentence with each target word. This will produce a new set of word embeddings (Table 3.2).
Unlike the previous semantic‐field embeddings, which were constructed using our “expert opinions,” these context‐word embeddings were learned from data (a corpus of four sentences). Learning a set of word embeddings from data can be very powerful. Indeed we can automate the procedure; and even a modest computer can process very large corpora of text to produce embeddings for hundreds of thousands of words in seconds. Another strength of creating word embeddings like these is that the procedure is not limited to concrete nouns, since context words can be found for any target word – whether an abstract noun, verb, or even a function word. You may be wondering how context words are able to represent meaning, but notice that words with similar meanings are bound to co‐occur with similar context words. For example, an ‘airplane’ and a ‘boat’ are both vehicles that you ride in, so they will both occur quite frequently in sentences with the word ‘rode’; however, one will rarely find sentences that contain both ‘celery’ and ‘rode.’ Compared to ‘airplane’ and ‘boat,’ ‘celery’ is more likely to occur in sentences containing the word ‘tasted.’ As the English phonetician Firth (1957, p. 11) wrote: “You shall know a word by the company it keeps.”
Table 3.2 Context‐word encodings of four words.
Word | also | better | boy | good |
|
---|