A Resamping Approach for Customer Gender Prediction Based on E-Commerce Data
Demographic attributes of customers such as gender, age, etc. provide the important information for e-commerce service providers in marketing, personalization of web applications. However, the online customers often do not provide this kind of information due to the privacy issues and other reasons. In this paper, we proposed a method for predicting the gender of customers based on their catalog viewing data on e-commerce systems, such as the date and time of access, the products viewed, etc. The main idea is that we extract the features from catalog viewing information and employ the classification methods to predict the gender of the viewers. The experiments were conducted on the datasets provided by the PAKDD’15 Data Mining Competition and obtained the promising results with a simple feature design, especially with the Bayesian Network method along with other supporting techniques such as resampling, cost-sensitive learning, boosting etc.
 Argamon, S., Koppel, M., Fine, J. and Shimoni, A. (2003). Gender, Genre, andWriting Style in FormalWritten Texts, Text 23(3), August
 Argamon, S., Koppel, M., Pennebaker, J.W. and Schler, J. (2009). Automatically profiling the author of an anonymous text.Communications of the ACM,52(2), pp.119-123.
 De Vel, O., Anderson, A., Corney, M., Mohay, G. M. (2001). Mining e-mail content for author identification forensics. SIGMOD Record 30(4), pp. 55-64
 Dong Y, Yang Y, Tang J, Yang Y, Chawla NV. (2014). Inferring User Demographics and Social Strategies in Mobile Social Networks. In: KDD’14. ACM. p. 15-24.
 Hu, J., Zeng, H.J., Li, H., Niu, C., Chen, Z. (2007). Demographic prediction based on user’s browsing behavior. Proceedings of the 16th international conference on World Wide Web. Pages 151-160.
 Iqbal, F., Khan, L.A., Fung, B. and Debbabi, M. (2010). E-mail authorship verification for forensic investigation. InProceedings of the 2010 ACM Symposium on Applied Computing, pp. 1591-1598
 Kabbur, S., Han, E.H., Karypis, G., (2010). Content-based methods for predicting web-site demographic attributes. Proceedings of ICDM 2010.
 Kotsiantis, S., et al. (2006). Handling unbalanced datasets: A review, GESTS International Transactions on Computer Science and Engineering 30 (1), pp. 25-36
 Ling, C.X., Sheng, V.S. (2008). Cost-sensitive learning and the class imbalance problem. In: Sammut C (ed) Encyclopedia of machine learning. Springer, Berlin
 Mendenhall, T.C. (1887). The characteristic curves of composition. Science, 11(11), 237-249
 Mosteller, F., Wallace, D.L. (1964). Inference and disputed authorship: The Federalist. Reading, MA: Addison-Wesley
 Phuong, T.M., Phuong, D.V. (2014). Gender Prediction Using Browsing History. Proceedings of the Fifth International Conference KSE 2013, Volume 1. Pages 271-283.
 Nguyen, D., Gravel, R., Trieschnigg, D., and Meder, T. (2013). ”How old do you think i am?”; a study of language and age in twitter. Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media
 Rangel, F., Rosso, P. (2013). Use of language and author profiling: Identification of gender and age. In Natural Language Processing and Cognitive Science, p. 177.
 Schler, J., Koppel, M., Argamon, S. and Pennebaker, J. (2006). Effects of Age and Gender on Blogging. In 43 proceedings of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs
 Schapire, R.E. (2001). The Boosting Approach to Machine Learning: An Overview. Proc. MSRI Workshop Nonlinear Estimation and Classification.
 Ying, J.J.C., Chang, Y.J., Huang, C.M., Tseng, V.S. (2012). Demographic prediction based on usersmobile behaviors. Mobile Data Challenge
 Zhang, C., Zhang, P. (2010). Predicting gender from blog posts. Technical report, Technical Report. University of Massachusetts Amherst, USA.