the present article, our focus is to avail infrastructural service in the field of cloud computing or more specifically storage as a service. Our study is based on processing of biological data and generates resultant bio markers. As we have developed a model where gene expression data of both normal and cancerous state is analyzed and identified, the cancer mediating genes can be beneficial in the field of medical science and can help biologists in different perspective.
The biggest challenge for research community in the field of genomic science is to develop infrastructure with a huge number of computers and some efficient software tools for analyzing the genomic datasets more exhaustively in the field biomedical research and to some extent in clinical practice. People who are doing research in this domain are getting toward cloud domain. To find a solution of different biomedical problems, it is very much important to analyze data effectively. Thus, integrating data from genomics, systems biology, and biomedical data mining always becomes promising one [24]. In our proposed model, we have worked on a dataset as a file (.csv format), and after processing by the developed methodology, we have produced a resultant dataset which is again sorted in a text file. So, our concerns how all these data can be made available in cloud environment so that it can be accessed by other user of the research community for further progress. But there are some parameters of concern [25].
In the domain of cloud computing maintaining the secrecy of the data is a major concern that needs special attention with utmost priority. As we are here only concerned about the confidentiality of the data at the same time in a simplified manner without going insight the architectural detail. This also attracts the other benefits and advantages of cloud computing like lowering cost and greater efficiency. Besides, these other points of concern are data security and confidentiality. In cloud service, there are many commercial offering but these are heterogeneous in nature and deals with different needs which depends on the customers. The primary contestants in this field are Microsoft Azure, Google AppEngine, Amazone Web Service (AWS), IBM cloud, and many more. Amazon Simple Storage Service also known as Amazon S3 provides an object-based storage service that offers scalability considered as industry-leading, security, performance, and, of course, the availability of data. As our requirement is to store the files and get the security over the dataset so Amazon Web Service can be a good choice as because, AWS provides a Simple Storage Service (S3) for storing of data. It provides object storage to all the software developers and group of people related to IT which is highly secured, scalable, and durable as well. It offers a web interface which is easy to use and provides facility to store and retrieve data from anywhere on the web without considering the amount of data being consumed. It is a place where we can store our files on the AWS cloud Dropbox by simplifying the user interface of S3. The Dropbox here acts as a layer built on top of S3. Data is spread across multiple devices and facilities. Although S3 can be used for many purposes but in the present context, it can be used as storing files in Buckets/Folders in a secured way.
It is to be noted that as security is a major issue so storing the data in Amazon S3 and keeping it secure from the other users is a major parameter to be considered. It has to be implemented by applying encryption features and with different access management tools.
Figure 3.8 Storing and accessing the data values in Amazon S3.
The only available object-based storage service is S3 that can block public access to all the objects stored in the bucket. It can also perform the account level restriction with S3 Block Public Access mechanism. In order to ensure that different objects will never have public access, presently or in the future, S3 Block Public Access provides various controls across in different level like the entire AWS Account or at the individual bucket stage. Objects and buckets are given public access either by access control lists (ACLs), or policies framed for the bucket, or sometimes by using both. For ensuring blocking of public access to all the S3 buckets and objects, it is required that at account level we should switch on block all public access. These settings are utilized will all the account for all the buckets used in present or in future. Although restricting the public access is suggested by AWS by turning on the block option but while doing so it must be ensured that all applications can run properly without having public access. We can configure the settings as per the requirement at individual level below to fit our unique storage use cases for some degree of public access for the objects and buckets. Public access permission of S3 can be redefined by Block Public Access defined by S3. By doing so, it becomes easier for the administrator for setting up a centralized control system which can prevent any changes in the configuration of the security mechanism, no matter in which way the insertion of an object or bucket is formed. While writing an object to an S3 bucket or AWS Account having S3 Public Access Block, and if some form of public permission is designated by any object through ACL or by means of any Policy, then blocking of those public permissions will still remain. Figure 3.8 gives the idea about how to store/access the data in AWS S3.
3.6 Conclusion
The proposed method PC-LR uses a hybrid approach which is chosen as a ML technique to generate the target output. As our task is to select the driver genes which are related to certain cancers, so it is better to design an algorithm which can act as a binary classifier and identify the relevant genes. It is to be noted that LR always works well for this model. But the gene expression data is of huge volume and so to get rid of the curse of dimensionality is mandatory before start working with LR. This is done using PCA. Our implemented approach has established a group of genes very precisely that are expressed differentially and are correlated to some cancers. The experimentation is executed over two datasets, viz., colon and lung, and has determined a gene set. The creativity and robustness of the system is clearly defined. It is to be noted that mutations of genes might have correlations among themselves and it may or may not vary with different stages of cancer. So, identifying these is also a challenging task. Those genes can effectively be examined by research scientists and biologists for the purpose of laboratory testing by focusing on less number of genes instead of whole genome.
Our work is having scope of extension in future for identifying more genes which might be correlated to mutations. Further identifying interactions among those genes can be very much helpful in prognosis, cancer prevention and treatment. Analyzing interactions of Gene-Gene will be beneficial for finding out more TP genes having key role for mediating cancer. The extension of our study using other omics data might help researchers and biologists to concentrate on cancer study in a targeted way.
References
1. Soh, K.P., Szczurek, E., Sakoparnig, T., Beerenwinkel, N., Predicting cancer type from tumour DNA signatures. Genome Med., 9, 104, 2017.
2. Hao, X., Luo, H., Krawczyk, M., Wei, W., Wang, W., Wang, J. et al., DNA methylation markers for diagnosis and prognosis of common cancers. Proc. Natl. Acad. Sci. U.S.A., 114, 28, 7414–19, 2017.
3. Yang, Z., Jin, M., Zhang, Z., Lu, J., Hao, K., Classification based on feature extraction for hepatocellular carcinoma diagnosis using high-throughput dna methylation sequencing data. Proc. Comput. Sci., 107, 412–417, 2017.
4. Rachman, A.A. and Rustam, Z., Cancer classification using Fuzzy C-Means with feature selection. 12th International Conference on Mathematics, Statistics, and Their Applications (ICMSA), pp. 31–34, 2016.
5. Ghosh, A. and De, R.K., Fuzzy Correlated Association Mining: Selecting altered associations among the genes, and some possible marker genes mediating certain cancers. Appl. Soft Comput., 38, 587–605, 2015.
6. Mao, Z.-Y., Cai, W.-S., Shao, X.-G., Selecting significant genes by randomization test for cancer classification using gene expression data. J. Biomed. Inform., 46, 4, 549–601, 2013.
7.