calculates the actual number of positives recorded by our model, i.e., what proportions of actual positives was identified correctly.
Unfortunately, a trade-off was seen between both precision and recall. With higher the value of Recall, then lower will be the precision value and vice versa. As a result, we are getting different impression of the outcome in the result set found for two different datasets (Figures 3.5 and 3.6). In order to overcome this trade-off, F1-score has been calculated, to find an optimum point where both the precision and recall values are high.
Figure 3.5 F-score for lung and colon using precision.
Figure 3.6 F-score for lung and colon dataset using recall.
Figure 3.7 F1 score for lung and colon dataset.
Further, we have computed F1 score values using the formula [Equation (3.9)] for two datasets considered here and it is observed that our PC-LR method generates optimal result for colon dataset compared to lung dataset (Figure 3.7).
It should be noted that the generalized formula for F-score is actually known as Fβ-score. The F-score when used in a regulated manner helps us to weight recall and precision more accurately for our working model. The equation in that becomes little different.
Here in the equation of Fβ-score [Equation (3.10)], the factors are indicating about in what extent recall is having more importance over precision. For instance, setting the value of β to 2 indicates that recall is being given importance two times higher than precision. The standard practice is to set the value of β to 1, while using in F-score [Equation (3.10)] which causes our equation to be as Equation (3.9) and by observing the comparative study by giving different weigh on precession and recall, it can be concluded that F1 score can measure the performance of the working model more accurately.
The proposed method has identified 102 genes for lung cancer and 85 genes for colon cancer as cancer mediating genes. The result is generated by validating with NCBI database where some already identified genes are available. Some of the gene symbols are given in the tables below generated by our developed methodology. Table 3.1 contains some the significant TP gene symbols for lung cancer and Table 3.2 contains the same for colon cancer.
Table 3.1 Resultant genes (gene symbols) identified by PC-LR method.
Significant true positive genes for lung cancer | |||
---|---|---|---|
KRAS | CHRNA3 | MIF | GSTP1 |
TP53 | SOX9 | MAP2K1 | VDR |
IGFIR | TNF | RET | SYK |
IFGBP3 | CDH2 | MET | PGR |
STAT3 | CDH1 | TGFB1 | IL10 |
3.5 Application in Cloud Domain
Cloud computing is a terminology widely used in the field of information technology. It illustrates the basic idea about how an end user can avail different types of resources related to IT like software services and hardware resources. There is no standard accepted definition about cloud is available. But still, it could be defined as a set of virtualized computers which are interconnected and provisions are made dynamically to make them available as computing resources depending on service level agreement. Several categories of services are available in cloud computing domain.
Infrastructural service: Here, different computational resources like processors and storage are provided to the end users in raw format. In this model, users are allowed to install different supplications as well as the operating system in the infrastructure provided to them. This can be thought as users are getting some space for computational purpose on a rental basis. So, the cloud domain can be effectively utilized as research tool for genomic study.
Platform service: While developing and launching new software applications this type of service becomes very important as it needs proper platform for the purpose of implementation.
Table 3.2 Resultant genes (gene symbols) identified by PC-LR method.
Significant true positive genes for colon cancer | |||
---|---|---|---|
MSH2 | IGF1 | PKM | IL33 |
TP53 | CCND1 | MIF | ITGA5 |
VEGFA | VDR | TERT | CSK |
PTGS2 | IGF1R | TAC1 | SDC2 |
AKT1 | HIF1A | CDKN1A | EGFR |
Software services: This type of service is required by users for different applications like Dropbox if storage is an issue or Google Docs in case the requirement is an application which is as good as word processor.
In