Comparative Machine Learning Analysis for Rate of Penetration Prediction in Drilling Performance Optimization

Authors:
Abdul Kareem Noor, Irfan Alam, A. Mohammed Arif, G. Agalya, R. Arasi, K. Pradeep

Addresses:
Department of Petroleum Engineering, Dhaanish Ahmed College of Engineering, Chennai, Tamil Nadu, India.

Abstract:

Rate of Penetration (ROP) is the primary cost driver in well construction, yet conventional empirical models, including the Bourgoyne-Young formulation, struggle with the nonlinear, formation-dependent interactions that govern it. In this work, seven machine learning regression algorithms (Linear Regression, Decision Tree, Random Forest, Gradient Boosting, XG-Boost, SVR, and KNN) were tested to understand how well they can predict drilling ROP on a 267-sample, 16-feature dataset built from WITSML drilling records and petrophysical logs from Volve Well 15/9-F-15, North Sea. The dataset was prepared using a depth-based train–test split, followed by scaling and missing-value imputation, one-hot encoding of lithology, leakage-free scaling, and median imputation for missing formation log values. A Bourgoyne-Young log-linear proxy was included as a conventional baseline. Among the tested models, Random Forest produced the highest test R² value of 0.4799 (MAE = 6.53×10⁻⁴ m/h), ahead of XG-Boost (R² = 0.4259) and Gradient Boosting (R² = 0.4004); all three substantially outperformed the empirical proxy (R² = −0.014), which RF beat by 44% in MAE. VIF analysis uncovered near-perfect collinearity between BIT_RPM and SURF_RPM (VIF > 4,900), attributable to the downhole motor configuration. Feature importance analysis showed that BIT_RPM and depth were the most influential parameters across all ensemble models. The seed-sensitivity experiment indicated that the R² value varied by ±0.12 across stratified splits, providing realistic uncertainty bounds for single-well deployment.

Keywords: Empirical Formula; Random Forest; Gradient Boosting; Volve Field; Multicollinearity; Bourgoyne Young Model; Energy Conservation; Equinor; Scatter Plotting; Cross Validation.

Received on: 28/09/2024, Revised on: 07/12/2024, Accepted on: 14/02/2025, Published on: 03/06/2026

DOI: 10.69888/ FTSES.2026.000678

FMDB Transactions on Sustainable Energy Sequence, 2026 Vol. 4 No. 1, Pages: 51-67

  • Views : 18
  • Downloads : 6
Download PDF