Predicting hotel booking cancellation: A comparative analysis of models

Authors

DOI:

https://doi.org/10.17072/1994-9960-2021-4-327-345

Abstract

Booking a hotel room is an integral part of any trip. Therefore, recent years are characterized by an increasing popularity of and demand for online travel agencies which save clients’ time and efforts applied to the communication with the hotels, as well as cancel a booking with no fines and charges. Hotel booking cancellations are on the rise in recent several years, which has its adverse effect on the financial status and reputations of the hotels. They have to follow a strict booking policy and overbooking strategy to reduce the risks. This problem is particularly burning today due to a significant decrease in tourist flows induced by the coronavirus pandemic. This issue can be solved by developing the predictive models of hotel booking cancellation with a high confidence index and a high prediction accuracy rate. An overview of the existing solutions shows that the following machine learning methods give the best predictive results: Random Forest, neuron networks, CatBoost, and XGBoost. Thus, the purpose of the research is to develop different machine learning based predictive models for hotel booking cancellation and to compare them in order to justify the choice of the best model with such metrics as Accuracy, Precision, Recall, F-measures, and the area under the ROC curve. The information database for the research was Hotel Booking Demand Dataset prepared by N. Antonio, A. de Almeida and L. Nunes and published on ScienceDirect platform. The research found out that a Random Forest Model gives the best prediction for hotel booking cancellation. For example, this model shows the percentage of the correct answers from a text set, 84.5% is among all predictions; 87.3% is the percentage of the bookings which are actually cancelled and referred to as cancelled by a classifier. Further research is seen to be focused on improving the Random Forest Model and other models of machine learning with additional unaccounted hyperparameters.

Keywords

hotel booking, predictive methods for booking cancellation, machine learning methods, random forest, neuron networks, CatBoost classification, XGBoost classification, prediction.

For citation

Rusakova E.I., Radionova M.V. Predicting hotel booking cancellation: A comparative analysis of models. Perm  University Herald. Economy, 2021, vol. 16, no. 4, pp. 327–345. DOI 10.17072/1994-9960-2021-4-327-345

References

1. Smith S.J., Parsa H.G., Bujisic M., van der Rest J-P. Hotel cancelation policies, distributive and procedural fairness, and consumer patronage: A study of the lodging industry. Journal of Travel and Tourism Marketing, 2015, no. 32 (7), pp. 886–906. doi: 10.1080/10548408.2015.1063864.
2. Talluri K.T., van Ryzin G.J. The theory and practice of revenue management. New York, Kluwer Academic Publishers, 2004. 745 p.
3. Chen C.-C., Schwartz Z., Vargas P. The search for the best deal: How hotel cancellation policies affect the search and booking decisions of deal-seeking customers. International Journal of Hospitality Management, 2011, no. 30 (1), pp. 129–135. doi: 10.1016/j.ijhm.2010.03.010.
4. Huang H.-C., Chang A. Y., Ho C.-C. Using artificial neural networks to establish a customer-cancellation prediction model. Przeglad Elektrotechniczny, 2013, no. 89 (1b), pp. 178–180.
5. Yoon M.G., Lee H.Y., Song Y.S. Linear approximation approach for a stochastic seat allocation problem with cancellation and refund policy in airlines. Journal of Air Transport Management, 2012, no. 23, pp. 41–46.
6. Antonio N., Almeida A., Nunes L. Predicting hotel booking cancellations to decrease uncertainty and increase revenue. Tourism and Management Studies, 2017, no. 13 (2), pp. 25–39. doi: 10.18089/tms.2017.13203.
7. Zeytinci E. Predicting hotel reservation cancellations with machine learning. Available at: https://towardsdatascience.com/predicting-hotel-cancellations-with-machine-learning-fa669f93e794 (accessed 29.02.2021).
8. Wingen M. EDA of bookings and ML to predict cancelations. Available at: https://www.kaggle.com/marcuswingen/eda-of-bookings-and-ml-to-predict-cancelations (accessed 30.06.2021).
9. Denyse T. Learning Pitstop: Predicting hotel booking cancellations using Classification Techniques. 2020. Available at: https://medium.com/tech4she/investigating-factors-affecting-hotel-booking-cancelations-9ec9bf81b0a8 (accessed 20.06.2021).
10. Michta M., Wojciechowski K. Story hotel booking cancellations: eXplainable predictions for booking cancellation. Available at: https://pbiecek.github.io/xai_stories/story-hotel-booking-cancellations-explainable-predictions-for-booking-cancellation.html#bias-correction (accessed 10.05.2021).
11. Banza M. Predicting hotel booking cancellations using machine learning – Step by step guide with real data and python. 2020. Available at: https://www.hospitalitynet.org/opinion/4099297.html (accessed 30.06.2021).
12. Kelman J. Predicting hotel booking cancellations using customer segmentation and neural networks. 2020. Available at: https://medium.com/@julkel/predicting-hotel-booking-cancellations-using-customer-segmentation-and-neural-networks-8a31c2755f5c (accessed 30.06.2021).
13. Antonio N., Almeida A., Nunes L. Hotel booking demand datasets. Data in Brief, 2019, vol. 22, pp. 41–49. doi: 10.1016/j.dib.2018.11.126.
14. Breiman L. Random forest. Machine Learning, 2001, vol. 45, pp. 5–32. doi: 10.1023/A:1010933404324.
15. Morde V. XGBoost algorithm: Long may she reign! 2019. Available at: https://towardsdatascience.com/https-medium-com-vishalmorde-xgboost-algorithm-long-she-may-rein-edd9f99be63d (accessed 25.07.2021).
16. Dorogush A.V., Ershov V., Gulin A. CatBoost: Gradient boosting with categorical features support. Workshop on ML Systems at NIPS. 2017.
17. Sharma A.V. Understanding activation functions in neural networks. 2017. Available at: https://medium.com/the-theory-of-everything/understanding-activation-functions-in-neural-networks-9491262884e0 (accessed 30.06.2021).
18. Powers D.M.W. Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies, 2011, vol. 2, iss. 1, pp. 37–63.

Show full text

Information about the Authors

  • Elena I. Rusakova, National Research University “Higher School of Economics”, Perm Branch

    Faculty of Economics, Management and Business Informatics

  • Marina V. Radionova, Perm State University

    Candidate of Physics and Mathematics, Associate Professor, Assistant Professor at the Department of Information Systems and Mathematical Methods in Economics

Downloads

Published

2021-12-30

Issue

Section

Economic-Mathematical Modeling