Группа авторов

Semantic Web for Effective Healthcare Systems


Скачать книгу

2” is more relevant to “Doc 2.” Identifying latent concepts thus improves the accuracy of feature categorization.

      For LDA model, the number of topics K has to be fixed in prior. It assumes the generative process for a document w = (w1, . . . ,wN) of a corpus D containing N words from a vocabulary consisting of V different terms, w ϵ {1, …, V} for all i = {1, … , N}. LDA consists of the following steps [12]

      1 (1) For each topic k, draw a distribution over words Φ(k) ~ Dir(α).

      2 (2) For each document d,(a) Draw a vector of topic proportions θ(d) ~ Dir(β).(b) For each word i,(i) Draw a topic assignment zd,i ~ Mult(θd), zd,n ϵ {1, …, K},(ii) Draw a word wd,i ~ Mult(Φz d,i), wd,i ϵ {1, …, V}

      where α is a Dirichlet prior on the per-document topic distribution, and β is a Dirichlet prior on the per-topic word distribution. Let θtd be the probability of topic t for document d, zdi be the topic distribution, and let Φtw be the probability of word w in topic t. The probability of generating word w in document d is:

       D—number of documents

       N—number of words or terms

       K—number of topics

       α—a Dirichlet prior on the per-document topic distribution

       β—a Dirichlet prior on the per-topic word distribution

       θtd—probability of topic t for document d

       Φtw—probability of word w in topic t

       zd,i—topic assignment of term “i”

       wd,i—word assignment of term “i”

       C—correlation between the terms

Schematic illustration of plate notation of CFSLDA model.

      LDA associates documents with a set of topics where each topic is a set of words. Using the LDA model, the next word is generated by first selecting a random topic from the set of topics T, then choosing a random word from that topic's distribution over the vocabulary W. The hidden variables θ and Φ are determined by fitting the LDA model to a set of corpus documents. CFSLDA model uses Gibbs sampling for performing the topic modeling of text documents. Given values for the Gibbs settings (b, n, iter), the LDA hyper-parameters (α, β, and k), and TD matrix M, a Gibbs sampler produces “n” random observations from the inferred posterior distribution of θ and Φ [60].

       image image

      Ontology development includes various approaches like Formal Concept Analysis (FCA) or Ontology Learning. FCA applies a user-driven step-by-step methodology for creating domain models, whereas Ontology learning refers to the task of automatically creating domain Ontology by extracting concepts and relations for the given data set [27]. This chapter focuses on building an automatic semantic indexer for online product/service reviews using Ontology. The representation of documents is semantically and contextually enriched by using the Context Feature Selection LDA (CFSLDA) topic modeling technique. Search query yields improved relevant results thereby increases the recall value [26, 27].

Schematic illustration of ontology-based semantic indexing (OnSI) model.

      The semantic indexing module includes topic mapping, and term indexing. Ontology development module populates Ontology with these terms and their weights (LDA weights). OnSI evaluation module evaluates the built Ontology