A Detailed Guide on How to Use Statistical Software R for Text Mining


  • Wing-Keung Wong Chair Professor
  • Kim-Hung Pho
  • Ngoc-Hien Nguyen
  • Huu-Nhan Huynh




Guide, Text Mining, Statistics, software R


Text mining is a very important issue in Statistics, Applied Mathematics, and many other areas in Sciences, Engineering, and Business because its applications are extremely rich and varied. Text mining can help academics and practitioners with some specific issues such as spam filtering, personal background matching, sentiment analysis, document classification, etc. The statistical software R is an exceedingly widely used software in Science because of its outstanding and completely free features. To contribute to the literature related to text mining, this study provides detailed instructions on how to use the statistical software R for text mining. To implement this goal, we first introduce the algorithm for text mining. We then discuss how to use the software R to approach each step of the algorithm in detail. As an application, the proposed algorithm is studied with an actual data set. The results found in this study will help academics and practitioners understand how to use the statistical software R to analyze text mining. This paper is very useful for both academics and practitioners in the study of text mining.


Abuelfadl, M. (2017). Individual foreign exchange investors, return predictability and market timing. Annals of Financial Economics, 12(01), 1750001.

Chang, C. L., McAleer, M., & Tansuchat, R. (2012). Modelling long memory volatility in agricultural commodity futures returns. Annals of Financial Economics, 7(02), 1250010.

Chang, C. L., McAleer, M., & Wong, W. K. (2018). Decision sciences, economics, finance, business, computing, and big data: Connections. Advances in Decision Sciences, 22(A), 1-58.

Cohen, A. M., & Hersh, W. R. (2005), A survey of current work in biomedical text mining. Briefings in bioinformatics, 6(1), 57-71.

Feinerer, I. (2008). An introduction to text mining in R. R News, 8(2), 19-22.

Gabrielsen, A., Kirchner, A., Liu, Z., & Zagaglia, P. (2015). Forecasting value-at-risk with time-varying variance, skewness and kurtosis in an exponential weighted moving average framework. Annals of Financial Economics, 10(01), 1550005.

Gupta, V., & Lehal, G. S. (2009), A survey of text mining techniques and applications. Journal of Emerging Technologies in Web Intelligence, 1(1), 60-76.

Hassani, H., Beneki, C., Unger, S., Mazinani, M. T., & Yeganegi, M. R. (2020). Text mining in big data analytics. Big Data and Cognitive Computing, 4(1), 1-34.

Hau, N. H., Tinh, T. T., Tuong, H. A., & Wong, W. K. (2020). Review of matrix theory with applications in education and decision sciences. Advances in Decision Sciences, 24(1), 1-41.

Hien, N. N (2016). Implementation of lean production systems for small-medium sized enterprises (Unpublished master’s thesis). Vietnamese-Germany University, Vietnam.

Hien, N. N., Nhan, H. H., Hung, P. K., & Cuong, N. T. (2021). Khai thac du lieu voi R (1st ed.). Ho Chi Minh City, Vietnam: Thanh Nien Pusblishing House. ISBN: 978-604-334-956-6.

Hieu, N. T., Huy, L. M., Phat, H. M., Anh, N. N. P., & Wong, W. K. (2020). Decision sciences in education: The STEMtech model to create stem products at high schools in Vietnam. Advances in Decision Sciences, 24(2), 1-50.

Ihaka, R., & Gentleman, R. (1996). R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5(3), 299-314.

Kobayashi, V. B., Mol, S. T., Berkers, H. A., Kismihók, G., & Den Hartog, D. N. (2018). Text mining in organizational research. Organizational Research Methods, 21(3), 733-765.

Li, H., Bai, Z., Wong, W. K., & McAleer, M. (2021). Spectrally-corrected estimation for high-dimensional Markowitz mean-variance optimization. Econometrics and Statistics. Forthcoming.

Lu, R., Hoang, V. T., & Wong, W. K. (2021). Does Lump-Sum Investing Strategy Outperform Dollar-Cost Averaging Strategy in Uptrend Markets?, Studies in Economics and Finance, forthcoming.

Lu, R., Yang, C. C., & Wong, W. K. (2018). Time diversification: Perspectives from the economic index of riskiness. Annals of Financial Economics 13(3), 1850011.

McAleer, M. (2021). A critique of recent medical research in JAMA on COVID-19. Advances in Decision Sciences, 25(1), 1-102.

Naseem, U., Khushi, M., Khan, S. K., Shaukat, K., & Moni, M. A. (2021). A comparative analysis of active learning for biomedical text mining. Applied System Innovation, 4(1), 23.

Nguyen, T. D. T., & Vo, D. H. (2019). The determinants of systematic risk in Vietnam. Advances in Decision Sciences, 23(2), 1-21.

Niu, M., Wandy, J., Daly, R., Rogers, S., & Husmeier, D. (2021). R package for statistical inference in dynamical systems using kernel based gradient matching: KGode. Computational Statistics, 36(1), 715-747.

Radovanović, M., & Ivanović, M. (2008). Text mining: Approaches and applications. Novi Sad Journal of Mathematics, 38(3), 227-234.

Salloum, S. A., Al-Emran, M., Monem, A. A., & Shaalan, K. (2017). A survey of text mining in social media: facebook and twitter perspectives. Advances in Science, Technology and Engineering Systems Journal, 2(1), 127-133.

Sigmund, M., & Ferstl, R. (2021). Panel vector autoregression in R with the package panelvar. Quarterly Review of Economics and Finance, 80, 693-720.

Solka, J. L. (2008). Text data mining: theory and methods. Statistics Surveys, 2, 94-112.

Truong, B. C., Van Thuan, N., Hau, N. H., & McAleer, M. (2019). Applications of the Newton-Raphson method in decision sciences and education. Advances in Decision Sciences, 23(4), 1-28.

Truyens, M., & Van Eecke, P. (2014). Legal aspects of text mining. Computer Law and Security Review, 30(2), 153-170.

Pejic-Bach, M., Bertoncel, T., Meško, M., & Krstić, Ž. (2020). Text mining of industry 4.0 job advertisements. International Journal of Information Management, 50, 416-431.

Verzani, J. (2018). Using R for introductory statistics. CRC press.

Vijayarani, S., Ilamathi, M. J., & Nithya, M. (2015). Preprocessing techniques for text mining-an overview. International Journal of Computer Science and Communication Networks, 5(1), 7-16.

Wang, L. L., & Lo, K. (2021). Text mining approaches for dealing with the rapidly expanding literature on COVID-19. Briefings in Bioinformatics, 22(2), 781-799.

Wong, W. K. (2020). Review on behavioral economics and behavioral finance. Studies in Economics and Finance. https://doi.org/10.1108/SEF-10-2019-0393.

Williams, G. J., & Simoff, S. J. (2006). Data mining: Theory, methodology, techniques, and applications. Springer.

Zanini, N., & Dhawan, V. (2015). Text Mining: An introduction to theory and some applications. Research Matters, 19, 38-45.

Zuur, A., Ieno, E. N., & Meesters, E. (2009). A beginner's guide to R. Springer Science and Business Media.



How to Cite

Wong, W.-K., Pho, K.-H., Nguyen, N.-H. ., & Huynh, H.-N. (2021). A Detailed Guide on How to Use Statistical Software R for Text Mining. Advances in Decision Sciences, 25(3), 92–110. https://doi.org/10.47654/v25y2021i3p92-110