Analyzing cancer data in North Vietnam by complex network technique

  • Duc-Tinh Pham Hanoi University of Industry
  • Minh-Tan Nguyen Hanoi University of Industry
  • Ha-Nam Nguyen Vietnam Institute of Advance studied for Mathematics
  • Tien-Dzung Tran Hanoi University of Industry

Abstract

Data-clustering tools can be employed to generate new knowledge for the diagnosis and treatment of cancer. However, traditional clustering methods, such as the K-mean approach, often require the determination of input parameters such as the cluster number and initial centers to be viable. In this study, we present a network science-based clustering method with fewer parameters that were used to mine a cancer-screening dataset containing over 177,000 records. We propose an algorithm that computes the similarity between pairs of records to create a complex network in which each node represents a record, and two nodes are connected by an edge if their similarity is greater than a given threshold as determined by experimental observation. Based on the network created, we employed the network modularity optimization algorithm to detect modules (clusters) within it. Each cluster contains records that are similar to one another in terms of some attributes; therefore, we could derive rules from the cluster for insights into the cancer situation in Vietnam. These rules reveal that some cancer types are more widespread in specific families and living environments in Vietnam. Clustering data based on network science can therefore be a good option for large-scale relational data-mining problems in the future.

Downloads

Download data is not yet available.

References

Bray, F., et al., Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 2018. 68(6): p. 394-424.
2. Pham, T., et al., Cancers in Vietnam-Burden and Control Efforts: A Narrative Scoping Review. Cancer control : journal of the Moffitt Cancer Center, 2019. 26(1): p. 1073274819863802-1073274819863802.
3. Patnaik, J.L., et al., Cardiovascular disease competes with breast cancer as the leading cause of death for older females diagnosed with breast cancer: a retrospective cohort study. Breast Cancer Research, 2011. 13(3): p. R64.
4. Huang, H.-C., et al., Discovering Disease-specific Biomarker Genes for Cancer Diagnosis and Prognosis. Technology in Cancer Research & Treatment, 2010. 9(3): p. 219-229.
5. Tran, T.-D. and Y.-K. Kwon, Hierarchical closeness-based properties reveal cancer survivability and biomarker genes in molecular signaling networks. PLOS ONE, 2018. 13(6): p. e0199109.
6. Vogelstein, B. and K.W. Kinzler, Cancer genes and the pathways they control. Nature Medicine, 2004. 10(8): p. 789-799.
7. Farkas, S.A., et al., Genome-wide DNA methylation assay reveals novel candidate biomarker genes in cervical cancer. Epigenetics, 2013. 8(11): p. 1213-1225.
8. Zeka, A., R. Gore, and D. Kriebel, Effects of alcohol and tobacco on aerodigestive cancer risks: a meta-regression analysis. Cancer Causes & Control, 2003. 14(9): p. 897-906.
9. Castellsagué, X., et al., Independent and joint effects of tobacco smoking and alcohol drinking on the risk of esophageal cancer in men and women. International Journal of Cancer, 1999. 82(5): p. 657-664.
10. Launoy, G., et al., Alcohol, tobacco and oesophageal cancer: effects of the duration of consumption, mean intake and current and former consumption. British Journal of Cancer, 1997. 75(9): p. 1389-1396.
11. Vineis, P., et al., Effects of Timing and Type of Tobacco in Cigarette-induced Bladder Cancer. Cancer Research, 1988. 48(13): p. 3849-3852.
12. Barnoya, J. and S. Glantz, Association of the California Tobacco Control Program with Declines in Lung Cancer Incidence. Cancer Causes & Control, 2004. 15(7): p. 689-695.
13. Brugere, J., et al., Differential effects of tobacco and alcohol in cancer of the larynx, pharynx, and mouth. Cancer, 1986. 57(2): p. 391-395.
14. Ebenstein, A., The Consequences of Industrialization: Evidence from Water Pollution and Digestive Cancers in China. The Review of Economics and Statistics, 2012. 94(1): p. 186-201.
15. Griffith, J., et al., Cancer Mortality in U.S. Counties with Hazardous Waste Sites and Ground Water Pollution. Archives of Environmental Health: An International Journal, 1989. 44(2): p. 69-74.
16. Zhang, X.-L., et al., Research and control of well water pollution in high esophageal cancer areas. World journal of gastroenterology, 2003. 9(6): p. 1187-1190.
17. Zhang, X., et al., Esophageal cancer spatial and correlation analyses: Water pollution, mortality rates, and safe buffer distances in China. Journal of Geographical Sciences, 2014. 24(1): p. 46-58.
18. Chunhabundit, R., Cadmium Exposure and Potential Health Risk from Foods in Contaminated Area, Thailand. Toxicological Research, 2016. 32(1): p. 65-72.
19. Boffetta, P., Human cancer from environmental pollutants: The epidemiological evidence. Mutation Research/Genetic Toxicology and Environmental Mutagenesis, 2006. 608(2): p. 157-162.
20. Joossens, J.V., et al., Dietary Salt, Nitrate and Stomach Cancer Mortality in 24 Countries. International Journal of Epidemiology, 1996. 25(3): p. 494-504.
21. Hertog, M.G.L., et al., Dietary flavonoids and cancer risk in the Zutphen elderly study. Nutrition and Cancer, 1994. 22(2): p. 175-184.
22. Jung, Y.G., M.S. Kang, and J. Heo, Clustering performance comparison using K-means and expectation maximization algorithms. Biotechnology & Biotechnological Equipment, 2014. 28(sup1): p. S44-S48.
23. Pham, D.T., S.S. Dimov, and C.D. Nguyen, Selection of K in K-means clustering. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 2005. 219(1): p. 103-119.
24. Na, S., L. Xumin, and G. Yong. Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm. in 2010 Third International Symposium on Intelligent Information Technology and Security Informatics. 2010.
25. Newman, M.E.J., Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 2006. 103(23): p. 8577.
26. Sneath, P.H.A., A method for testing the distinctness of clusters: A test of the disjunction of two clusters in Euclidean space as measured by their overlap. Journal of the International Association for Mathematical Geology, 1977. 9(2): p. 123-143.
27. Sneath, P.H.A., Basic program for a significance test for two clusters in euclidean space as measured by their overlap. Computers & Geosciences, 1979. 5(2): p. 143-155.
28. Sony, A., et al. Video summarization by clustering using euclidean distance. in 2011 International Conference on Signal Processing, Communication, Computing and Networking Technologies. 2011.
29. Hathaway, R.J. and J.C. Bezdek, Nerf c-means: Non-Euclidean relational fuzzy clustering. Pattern Recognition, 1994. 27(3): p. 429-437.
30. Zhang, Z., H. Kaiqi, and T. Tieniu. Comparison of Similarity Measures for Trajectory Clustering in Outdoor Surveillance Scenes. in 18th International Conference on Pattern Recognition (ICPR'06). 2006.
31. Noack, A., Modularity clustering is force-directed layout. Physical Review E, 2009. 79(2): p. 026102.
32. Key, T.J., Fruit and vegetables and cancer risk. British journal of cancer, 2011. 104(1): p. 6-11.
33. Hurtado-Barroso, S., et al., Vegetable and Fruit Consumption and Prognosis Among Cancer Survivors: A Systematic Review and Meta-Analysis of Cohort Studies. Advances in Nutrition, 2020. 11(6): p. 1569-1582.
34. Pomerantz, M.M. and M.L. Freedman, The genetics of cancer risk. Cancer journal (Sudbury, Mass.), 2011. 17(6): p. 416-422.
35. Migliore, L. and F. Coppedè, Genetic and environmental factors in cancer and neurodegenerative diseases. Mutat Res, 2002. 512(2-3): p. 135-53.
36. Collins, A. and I. Politopoulos, The genetics of breast cancer: risk factors for disease. The application of clinical genetics, 2011. 4: p. 11-19.
37. Buskbjerg, C.D.R., et al., Genetic risk factors for cancer-related cognitive impairment: a systematic review. Acta Oncol, 2019. 58(5): p. 537-547.
38. Toma, M., et al., Rating the environmental and genetic risk factors for colorectal cancer. Journal of medicine and life, 2012. 5(Spec Issue): p. 152-159.
39. Key, T.J., et al., Diet, nutrition, and cancer risk: what do we know and what is the way forward? BMJ (Clinical research ed.), 2020. 368: p. m511-m511.
40. Bazzan, A.J., et al., Diet and nutrition in cancer survivorship and palliative care. Evidence-based complementary and alternative medicine : eCAM, 2013. 2013: p. 917647-917647.
41. Barrera, S. and W. Demark-Wahnefried, Nutrition during and after cancer therapy. Oncology (Williston Park, N.Y.), 2009. 23(2 Suppl Nurse Ed): p. 15-21.
42. Ravasco, P., Nutrition in Cancer Patients. Journal of clinical medicine, 2019. 8(8): p. 1211.
43. Kim, D.H., Nutritional issues in patients with cancer. Intestinal research, 2019. 17(4): p. 455-462.
Published
2021-12-31
How to Cite
PHAM, Duc-Tinh et al. Analyzing cancer data in North Vietnam by complex network technique. Journal of Science and Technology: Issue on Information and Communications Technology, [S.l.], v. 19, n. 12.2, p. 17-25, dec. 2021. ISSN 1859-1531. Available at: <http://ict.jst.udn.vn/index.php/jst/article/view/140>. Date accessed: 25 may 2022. doi: https://doi.org/10.31130/ict-ud.2021.140.