Data quality and quantity greatly influence the performance of machine learning models for predicting the uplift capacity of suction caissons. However, obtaining a large number of reliable and valid data in geotechnical engineering not easy. In this case, utilizing existing datasets effectively becomes crucial. In this study, database containing 149 experimental data items and 423 numerical simulation data items was formed. A modified Co-teaching method was adopted to fuse the two types of data and investigate the impact of different sampling plans on the models prediction accuracy. The results indicate that increasing the amount of data helps improve the prediction accuracy of model. However, the data fusion model inevitably dropped some data. By using a BP-DNN as a model comparison, it can be found that, compared with using all experimental and numerical data directly, the error index results of data fusion model in Mean Absolute Percentage Error (MAPE), Symmetric Mean Absolute Percentage Error (SMAPE), Mean Squared Percentage Error (MSPE) reduced by 2.44%, 3.88%, 2.16%, and in the coefficient of determination (R²) increased by 3.99% for the testing set. Additionally, this finding suggests that in situations where more experimental data cannot be obtained, using high-quality numerical simulation data to extend the data range can effectively enhance model performance.
5th International Symposium on Frontiers in Offshore Geotechnics (ISFOG2025)
5 - Data Analytics and Machine Learning