A Resamping Approach for Customer Gender Prediction Based on E-Commerce Data

  • Duong Tran Duc Posts and Telecommunications Institute of Technology
  • Pham Bao Son University of Engineering and Technology, Vietnam National University, Hanoi
  • Tan Hanh Posts and Telecommunications Institute of Technology
  • Le Truong Thien University of Engineering and Technology, Vietnam National University


Demographic attributes of customers such as gender, age, etc. provide the important information for e-commerce service providers in marketing, personalization of web applications. However, the online customers often do not provide this kind of information due to the privacy issues and other reasons. In this paper, we proposed a method for predicting the gender of customers based on their catalog viewing data on e-commerce systems, such as the date and time of access, the products viewed, etc. The main idea is that we extract the features from catalog viewing information and employ the classification methods to predict the gender of the viewers. The experiments were conducted on the datasets provided by the PAKDD’15 Data Mining Competition and obtained the promising results with a simple feature design, especially with the Bayesian Network method along with other supporting techniques such as resampling, cost-sensitive learning, boosting etc.


Download data is not yet available.


[1] Argamon, S., Koppel, M., Fine, J. and Shimoni, A. (2003). Gender, Genre, andWriting Style in FormalWritten Texts, Text 23(3), August
[2] Argamon, S., Koppel, M., Pennebaker, J.W. and Schler, J. (2009). Automatically profiling the author of an anonymous text.Communications of the ACM,52(2), pp.119-123.
[3] De Vel, O., Anderson, A., Corney, M., Mohay, G. M. (2001). Mining e-mail content for author identification forensics. SIGMOD Record 30(4), pp. 55-64
[4] Dong Y, Yang Y, Tang J, Yang Y, Chawla NV. (2014). Inferring User Demographics and Social Strategies in Mobile Social Networks. In: KDD’14. ACM. p. 15-24.
[5] Hu, J., Zeng, H.J., Li, H., Niu, C., Chen, Z. (2007). Demographic prediction based on user’s browsing behavior. Proceedings of the 16th international conference on World Wide Web. Pages 151-160.
[6] Iqbal, F., Khan, L.A., Fung, B. and Debbabi, M. (2010). E-mail authorship verification for forensic investigation. InProceedings of the 2010 ACM Symposium on Applied Computing, pp. 1591-1598
[7] Kabbur, S., Han, E.H., Karypis, G., (2010). Content-based methods for predicting web-site demographic attributes. Proceedings of ICDM 2010.
[8] Kotsiantis, S., et al. (2006). Handling unbalanced datasets: A review, GESTS International Transactions on Computer Science and Engineering 30 (1), pp. 25-36
[9] Ling, C.X., Sheng, V.S. (2008). Cost-sensitive learning and the class imbalance problem. In: Sammut C (ed) Encyclopedia of machine learning. Springer, Berlin
[10] Mendenhall, T.C. (1887). The characteristic curves of composition. Science, 11(11), 237-249
[11] Mosteller, F., Wallace, D.L. (1964). Inference and disputed authorship: The Federalist. Reading, MA: Addison-Wesley
[12] Phuong, T.M., Phuong, D.V. (2014). Gender Prediction Using Browsing History. Proceedings of the Fifth International Conference KSE 2013, Volume 1. Pages 271-283.
[13] Nguyen, D., Gravel, R., Trieschnigg, D., and Meder, T. (2013). ”How old do you think i am?”; a study of language and age in twitter. Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media
[14] Rangel, F., Rosso, P. (2013). Use of language and author profiling: Identification of gender and age. In Natural Language Processing and Cognitive Science, p. 177.
[15] Schler, J., Koppel, M., Argamon, S. and Pennebaker, J. (2006). Effects of Age and Gender on Blogging. In 43 proceedings of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs
[16] Schapire, R.E. (2001). The Boosting Approach to Machine Learning: An Overview. Proc. MSRI Workshop Nonlinear Estimation and Classification.
[17] Ying, J.J.C., Chang, Y.J., Huang, C.M., Tseng, V.S. (2012). Demographic prediction based on usersmobile behaviors. Mobile Data Challenge
[18] Zhang, C., Zhang, P. (2010). Predicting gender from blog posts. Technical report, Technical Report. University of Massachusetts Amherst, USA.

How to Cite
DUC, Duong Tran et al. A Resamping Approach for Customer Gender Prediction Based on E-Commerce Data. Journal of Science and Technology: Issue on Information and Communications Technology, [S.l.], v. 3, n. 1, p. 76-81, mar. 2017. ISSN 1859-1531. Available at: <http://ict.jst.udn.vn/index.php/jst/article/view/40>. Date accessed: 29 may 2020. doi: https://doi.org/10.31130/jst.2017.40.