Data mining approach to predict BRCA1 gene mutation

Olegas Niakšu, Jurgita Gedminaitė, Olga Kurasova


Breast cancer is the most frequent women cancer form and one of the leading mortality causes among women around the world. Patients with pathological mutation of a BRCA gene have 65% lifelong breast cancer probability. It is known that such patients have different cause of illness. In this study, we have proposed a new approach for the prediction of BRCA mutation carriers by methodically applying knowledge discovery steps and utilizing data mining methods. An alternative BRCA risk assessment model has been created utilizing decision tree classifier model. The biggest challenge was a very small size and imbalanced nature of the initial dataset, which have been collected by clinicians during 4 years of clinical trial. Iterative optimization of initial dataset, optimal algorithms selection and their parameterization have resulted in higher classifier model performance, with acceptable prediction accuracy for the clinical usage. In this study, three data mining problems have been analyzed using eleven data mining algorithms.


BELLAACHIA, A.; ERHAN, G. 2006. Predicting Breast Cancer Survivability using Data Mining Techniques. 9th Workshop on Mining Scientific and Engineering Datasets. 6th SIAM International Conference on Data Mining.

BELLAZZI, R.; ZUPAN, B. 2008.Predictive data mining in clinical medicine: Current issues and guidelines. International journal of medical informatics, 77, p. 81–97.

BREKELMANS, C.T.; TILANUS-LINTHORST, M.M.; SEYNAEVE, C.; et al. 2007. Tumor characteristics, survival and prognostic factors of hereditary breast cancer from BRCA2-, BRCA1- and non-BRCA1/2 families as compared to sporadic breast cancer cases. European Journal of Cancer, 43(5):867-76

CHEN, H.; FULLER, S.; FRIEDMAN, C.; HERSH, W. 2005. Medical Informatics. Knowledge Management and Data Mining in Biomedicine. Springer Science

CHOI, J.P.; HAN, T.H.; PARK, R.W. 2009. A Hybrid Bayesian Network Model for Predicting Breast Cancer Prognosis. Journal of Korean Society of Medical Informatics, p. 49-57.

CIOS, K. J.; MOORE, G. W. 2002. Uniqueness of medical data mining. Artificial Intelligence in Medicine, 26. p. 1–24.

CURK, T.; DEMŠAR, J.; XU, Q.; LEBAN, G.; PETROVIČ, U.; BRATKO, I. et al. (2005). Microarray data mining with visual programming. Bioinformatics, vol. 21(3), p. 396-398.

DELEN, D.;WALKER, G.;KADAM, A. 2005. Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine, vol. 34, p. 113-127.

FAYYAD, U.; PIATETSKY-SHAPIRO, G.; SMYTH, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37.

FERLAY, J.; SHIN, H. R.; BRAY, F.; et. al. 2008. Cancer Incidence and Mortality Worldwide. International Agency for Research on Cancer. [internet] [Accessed: May 2013]. Available from:

HALL, M.; FRANK, E.; HOLMES, G.; PFAHRINGER, B.; REUTEMANN, P.; WITTEN, I. H. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations, vol. 11(1), p. 10-18.

JANAVIČIUS, R. 2010. Founder BRCA1/2 mutations in the Europe: implications for hereditary breast-ovarian cancer prevention and control. EPMA Journal, 1(3):397-412.

National Cancer Institute, USA. BRCA1 and BRCA2: Cancer Risk and Genetic Testing. [internet] [Accessed: May 2013]. Available from:

PANCHAL, S. M.; ENNIS, M.; CANON, S.; BORDELEAU, L. J. 2008. Selecting a BRCA risk assessment model for use in a familial cancer clinic. BMC Medical Genetics, 9:116.

PARKIN, D. M.; PISANI, P.; FERLAY, J. 1999. Global cancer statistics. A Cancer Journal for Clinicians, Vol. 49, Issue 1, p. 33–64.

ROBSON, M.E.; CHAPPUIS, P.O.; SATAGOPAN, J; et al. 2004. A combined analysis of outcome following breast cancer: differences in survival based on BRCA1/BRCA2 mutation status and administration of adjuvant treatment. Breast Cancer Research, 6(1):R8-R17.

SHUKLA, A.; TIWARI, R; KAUR, P. (2009). Knowledge based approach for Diagnosis of Breast Cancer. IEEE International Advance Computing Conference (IACC 2009). Patiala, India.

TIBCO Software Inc. 2010. TIBCO Spotfire Miner™ 8.2 User’s Guide. [internet] [Accessed: May 2013]. Available from:

WILSON, A.; THABANE, L.; HOLBROOK, A. 2003. Application of DM techniques in pharmacovigilance, British Journal of Clinical Pharmacology (57) 2, p. 127-134.

ŠPEČKAUSKIENĖ, V. 2011. Development and analysis of informational clinical decision support method. Kaunas: Technologija.

Full Text: PDF


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

eISSN: 2029-9966