Reference List for Machine Learning and its Applications in Geotechnical Engineering - PART II: MACHINE LEARNING ALGORITHMS

1  Supervised learning
1.1 Decision tree learning

[1] Breiman, Leo, Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and regression trees. Monterey, CA: Wadsworth and Brooks/Cole Advanced Books and Software. (Google citation: 37373)
[2] Dattatreya, G. R., and Kanal, L. N. (1985). Decision trees in pattern recognition. University of Maryland. Computer Science.
[3] Safavian, S. R., and Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man Cybern., 21(3), 660-674.
[4] Murthy, S. K., Kasif, S., and Salzberg, S. (1994). A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2, 1-32.
[5] Friedl, M. A., and Brodley, C. E. (1997). Decision tree classification of land cover from remotely sensed data. Remote Sensing of Environment, 61(3), 399-409.
[6] Murthy, S. K. (1998). Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2(4), 345-389.
[7] Kohavi, R., and Quinlan, J. R. (2002). Data mining tasks and methods: Classification: decision-tree discovery. In Handbook of data mining and knowledge discovery (pp. 267-276). Oxford University Press, Inc.
[8] Rokach, L.; Maimon, O. (2005). Top-down induction of decision trees classifiers-a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C. 35 (4): 476487.
[9] Barros, Rodrigo C., Basgalupp, M. P., Carvalho, A. C. P. L. F., Freitas, Alex A. (2012). A Survey of Evolutionary Algorithms for Decision-Tree Induction. IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews, 42(3), 291-312.
[10] James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning (Vol. 112). New York: springer. (Google citation: 2673) 
[11] Lior, R. (2014). Data mining with decision trees: theory and applications (Vol. 81). World scientific.
1.2 Artificial neural networks
[12] McCulloch, W. S., and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115-133. (Google citation: 15445) 
[13] White, H. (1989). Learning in artificial neural networks: A statistical perspective. Neural Computation, 1(4), 425-464.
[14] Maren, A., Harston, C., and Pap, R. (1990). Handbook of neural computing applications. Academic Press, Inc., San Diego, California
[15] Ghaboussi, J., Garrett, J. H., and Wu, X. (1991). Knowledge based modeling of material behaviour with neural networks. Journal of Engineering Mechanics, ASCE, 117(1), 132153.
[16] Hunt, K. J., Sbarbaro, D., bikowski, R., and Gawthrop, P. J. (1992). Neural networks for control systemsa survey. Automatica, 28(6), 1083-1112.
[17] Zurada, J. M. (1992). Introduction to artificial neural systems. West Publishing Company, St. Paul.
[18] Yao, X. (1993). A review of evolutionary artificial neural networks. International Journal of Intelligent Systems, 8(4), 539-567.
[19] Garrett, J. H. (1994). Where and why artificial neural networks are applicable in civil engineering. Journal of Computing in Civil Engineering, ASCE, 8(2), 129130.
[20] Fausett, L. V. (1994). Fundamentals neural networks: Architecture, algorithms, and applications. Prentice-Hall, Inc., Englewood Cliffs, New Jersey.
[21] Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford University Press.
[22] Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge University Press. (Google citation: 7483) 
[23] Yao, X., and Liu, Y. (1997). A new evolutionary system for evolving artificial neural networks. IEEE Transactions on Neural Networks, 8(3), 694-713.
[24] Schalkoff, R. J. (1997). Artificial neural networks (Vol. 1). New York: McGraw-Hill.
[25] Gardner, M. W., and Dorling, S. R. (1998). Artificial neural networks (the multilayer perceptron)a review of applications in the atmospheric sciences. Atmospheric Environment, 32(14), 2627-2636.
[26] Zhang, G., Patuwo, B. E., and Hu, M. Y. (1998). Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting, 14(1), 35-62.
[27] Zhang, G. P. (2000). Neural networks for classification: a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 451-462.
[28] Shahin, M. A., Maier, H. R. and Jaksa, M. B. (2005). Investigation into the Robustness of Artificial Neural Network Models for a Case Study in Civil Engineering.  In MODSIM 2005 International Congress on Modelling and Simulation, Zerger, A. and Argent, R.M. (eds), Modelling and Simulation Society of Australia and New Zealand, Melbourne, December 1215, pp. 7983.  ISBN: 0-9758400-2-9.
[29] Hinton, G. E., and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507.
1.3  Deep learning
[30] Dechter, R. (1986). Learning while searching in constraint-satisfaction problems. University of California, Computer Science Department, Cognitive Systems Laboratory.
[31] Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng, A. Y. (2011). Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 689-696).
[32] Deng, L., and Yu, D. (2014). Deep Learning: Methods and Applications. Foundations and Trends in Signal Processing, 7 (34), 1199.
[33] Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117.
[34] LeCun, Y., Bengio, Y., Hinton, G. (2015). Deep Learning. Nature, 521, 436-444. (Google citation: 6899) 
1.4  Inductive logic programming
[35] Muggleton, S. H. (1991). Inductive logic programming. New Generation Computing. 8 (4): 295318.
[36] Muggleton, S., and De Raedt, L. (1994). Inductive Logic Programming: Theory and methods. The Journal of Logic Programming, 1920, 629679. (Google citation: 1708) 
[37] Lavrac, N., and Dzeroski, S. (1994). Inductive Logic Programming: Techniques and Applications. New York: Ellis Horwood.
[38] De Raedt, L. (1999). A perspective on inductive logic programming. In the Logic Programming Paradigm (pp. 335-346). Springer, Berlin, Heidelberg.
[39] Muggleton, S. (1999). Inductive logic programming: issues, results and the challenge of learning language in logic. Artificial Intelligence, 114(1-2), 283-296. 
1.5 Support vector machines
[40] Cortes, C., and Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273297.
[41] Cristianini, N., and Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge university press.
[42] Ben-Hur, A., Horn, D., Siegelmann, H. T., and Vapnik, V. (2001). Support vector clustering. Journal of Machine Learning Research, 2(Dec), 125-137.
[43] William H., Teukolsky, Saul A., Vetterling, William T., and Flannery, B. P. (2007). Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. (Google citation: 116107) 
[44] Steinwart, I., and Christmann, A. (2008). Support Vector Machines. Springer-Verlag, New York.
[45] Chang, C. C., and Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology. 2 (3). (Google citation: 36612) 
1.6 Similarity and metric learning
[46] Davis, J. V., Kulis, B., Jain, P., Sra, S., and Dhillon, I. S. (2007). Information-theoretic metric learning. International Conference in Machine Learning (ICML): 209216. (Google citation: 1580) 
[47] Kulis, B. (2013). Metric learning: A survey. Foundations and Trends in Machine Learning, 5(4), 287-364.
[48] Bellet, A., Habrard, A., and Sebban, M. (2013). A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709.
1.7 Sparse dictionary learning
[49] Engan, K.; Aase, S. O.; Hakon Husoy, J. (1999). Method of optimal directions for frame design. IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. 5, 4432446.
[50] Kreutz-Delgado, K., Murray, J. F., Rao, B. D., Engan, K., Lee, T. W., and Sejnowski, T. J. (2003). Dictionary learning algorithms for sparse representation. Neural Computation, 15(2), 349-396. (Google citation: 714)
[51] Aharon, M., Elad, M., and Bruckstein, A. (2006). rmK-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311-4322. (Google citation: 6842) 
[52] Elad, M., and Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, 15(12), 3736-3745.
[53] Mairal, J., Ponce, J., Sapiro, G., Zisserman, A., and Bach, F. R. (2009). Supervised dictionary learning. In Advances in Neural Information Processing Systems (pp. 1033-1040).
[54] Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 689-696). ACM.
[55] Zhang, Q., and Li, B. (2010). Discriminative K-SVD for dictionary learning in face recognition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 2691-2698). IEEE.
[56] Tosic, I., and Frossard, P. (2011). Dictionary learning. IEEE Signal Processing Magazine, 28(2), 27-38.
[57] Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828.
[58] Zhang, Z., Xu, Y., Yang, J., Li, X., and Zhang, D. (2015). A survey of sparse representation: algorithms and applications. IEEE Digital Object Identifier access, 3, 490-530.
2 Semi-supervised learning
[59] Ratsaby, J., and Venkatesh, S. S. (1995). Learning from a mixture of labeled and unlabeled examples with parametric side information. In Proceedings of The Eighth Annual Conference on Computational Learning Theory (pp. 412-417). ACM.
[60] Chapelle, O., Scholkopf, B., and Zien, A. (2006). Semi-Supervised Learning, MIT Press, Cambridge, MA. (Google citations: 3929)
[61] Belkin, M., Niyogi, P., and Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7(Nov), 2399-2434.
[62] Zhu, X. (2008). Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison, 2(3), 4.
3 Reinforcement learning
[63] Kaelbling, L. P., Littman, M. L., and Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285.
[64] Sutton, R. S., and Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT press. (Google Citations: 26015)
[65] Dorigo, M., Birattari, M., and Stutzle, T. (2006). Ant colony optimization. IEEE Computational Intelligence Magazine, 1(4), 28-39.
[66] Wiering, M., and Van Otterlo, M. (2012). Reinforcement Learning. Springer.
[67] Mnih, V., Kavukcuoglu, K., Silver, D., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.
[68] Sugiyama, M. (2015). Statistical reinforcement learning: modern machine learning approaches. CRC Press.
4 Unsupervised learning
4.1 Clustering

[69] MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability 1(14), 281-297.
[70] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of The Royal Statistical Society. Series B (methodological), 1-38. (Google Citations: 53387)
[71] Jain, A. K., and Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall, Inc.
[72] Zhang, T., Ramakrishnan, R., and Livny, M. (1996). BIRCH: an efficient data clustering method for very large databases. In ACM Sigmod Record, 25(2), 103-114. ACM.
[73] Ester, M., Kriegel, H. P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, 96(34), 226-231.
[74] Ankerst, M., Breunig, M. M., Kriegel, H. P., and Sander, J. (1999). OPTICS: ordering points to identify the clustering structure. In ACM Sigmod record, 28(2), 49-60. ACM.
[75] Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys (CSUR), 31(3), 264-323.
[76] Ng, A. Y., Jordan, M. I., and Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems (pp. 849-856).
[77] Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., ... and Zhou, Z. H. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1-37.
[78] McLachlan, G., and Peel, D. (2004). Finite mixture models. John Wiley and Sons.
[79] Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666.
[80] Han, J., Pei, J., and Kamber, M. (2011). Data mining: concepts and techniques. Elsevier. (Google Citations: 39607)
[81] Witten, I. H., Frank, E., Hall, M. A., and Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
4.2 Feature learning (Dimensionality reduction)
[82] Jolliffe, I. T. (1986). Principal Component Analysis and Factor Analysis. In Principal Component Analysis (pp. 115-128). Springer New York. (Google Citations: 33332)
[83] Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21(1), 1-6. (Google Citations: 23430)
[84] Tenenbaum, J. B., De Silva, V., and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319-2323.
[85] Roweis, S. T., and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323-2326.
[86] Hyvärinen, A., Karhunen, J., and Oja, E. (2004). Independent Component Analysis (Vol. 46). John Wiley and Sons.
[87] He, X., and Niyogi, P. (2004). Locality preserving projections. In Advances in Neural Information Processing Systems,(pp. 153-160).
[88] Srebro, N., Rennie, J., and Jaakkola, T. S. (2005). Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems, (pp. 1329-1336).
[89] Hinton, G. E., and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507.
4.3 Outlier detection
[90] Knorr, E. M., Ng, R. T., and Tucakov, V. (2000). Distance-based outliers: algorithms and applications. The VLDB JournalThe International Journal on Very Large Data Bases, 8(3-4), 237-253.
[91] Ramaswamy, S., Rastogi, R., and Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. In ACM Sigmod Record, 29(2), 427-438. ACM.
[92] Breunig, M. M., Kriegel, H. P., Ng, R. T., and Sander, J. (2000). LOF: identifying density-based local outliers. In ACM sigmod record, 29(2), 93-104). ACM. (Google Citations: 3563)
[93] Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Cmputation, 13(7), 1443-1471. (Google Citations: 3715)
[94] He, Z., Xu, X., and Deng, S. (2003). Discovering cluster-based local outliers. Pattern Recognition Letters, 24(9), 1641-1650.
[95] Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41(3), 15.
[96] Yuen, K.V., and Mu, H.Q. (2012). Novel probabilistic method for robust parametric identification and outlier detection. Probabilistic Engineering Mechanics, 30, 48-59.
4.4 Generative adversarial networks
[97] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27 (NIPS 2014): 2672-2680. (Google Citations: 2829)

5 Bayesian machine learning
5.1 Bayesian network

[98] Pearl, J. (1985). Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning (UCLA Technical Report CSD-850017). Proceedings of the 7th Conference of the Cognitive Science Society, University of California, Irvine, CA. pp. 329334.
[99] Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Francisco CA: Morgan Kaufmann.
[100] Neapolitan, R. E. (1989). Probabilistic reasoning in expert systems: theory and algorithms. Wiley.
[101] Heckerman, D., Geiger, D., and Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3), 197-243. (Google citation: 4819) 
[102] Heckerman, D. (1997). Bayesian networks for data mining. Data Mining and Knowledge Discovery, 1(1), 79-119.
[103] Friedman, N., Geiger, D., and Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2-3), 131-163.
[104] Friedman, N., Linial, M., Nachman, I., and Pe'er, D. (2000). Using Bayesian networks to analyze expression data. Journal of Computational Biology, 7(3-4), 601-620.
[105] Neapolitan, R. E. (2004). Learning Bayesian networks. Prentice Hall.
[106] Nasrabadi, N. M. (2007). Pattern recognition and machine learning. Journal of Electronic Imaging, 16(4), 049901. (Google citation: 29894) 
[107] Yuen, K.V., and Ortiz, G.A. (2016). Bayesian nonparametric general regression. International Journal for Uncertainty Quantification, 6(3), 195-213.
5.2 Bayesian neural network
[108] Kononenko, I. (1989). Bayesian neural networks. Biological Cybernetics, 61(5), 361-370.
[109] Wan, E. A. (1990). Neural network classification: A Bayesian interpretation. IEEE Transactions on Neural Networks, 1(4), 303-305.
[110] MacKay, D. J. (1995). Probable networks and plausible predictionsa review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems, 6(3), 469-505. (Google citation: 827) 
[111] MacKay, D. J. (1995). Bayesian neural networks and density networks. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 354(1), 73-80.
[112] Thodberg, H. H. (1996). A review of Bayesian neural networks with an application to near infrared spectroscopy. IEEE Transactions on Neural Networks, 7(1), 56-72.
[113] Barber, D., and Bishop, C. M. (1998). Ensemble Learning in Bayesian Neural Networks. In Neural Networks and Machine Learning, pages 215-237. Springer.
[114] Lampinen, J., and Vehtari, A. (2001). Bayesian approach for neural networksreview and case studies. Neural Networks, 14(3), 257-274.
[115] Neal, R. M. (2012). Bayesian learning for neural networks (Vol. 118). Springer Science and Business Media. (Google citation: 2915) 
5.3 Gaussian processes
[116] Barber, D., and Williams, C. K. (1997). Gaussian processes for Bayesian classification via hybrid Monte Carlo. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems NIPS9, pages 340-346, Cambridge, MA, MIT Press.
[117] MacKay, D. J. (1998). Introduction to Gaussian Processes. In Neural Networks and Machine Learning, volume 168 of NATO advanced study institute on generalization in neural networks and machine learning, pages 133-165. Springer, August.
[118] C. K. I. Williams and D. Barber. (1998) Bayesian classification with Gaussian processes. IEEE Trans Pattern Analysis and Machine Intelligence, 20, 1342-1351.
[119] M. Seeger. (2004) Gaussian Processes for Machine Learning. International Journal of Neural Systems, 14(2), 69-106.
[120] Rasmussen, C. E., and Williams, C. K. (2006). Gaussian processes for machine learning (Vol. 1). Cambridge: MIT press. (Google citation: 1530) 
5.4 Relevance vector machine (sparse Bayesian learning)
[121] Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1(Jun), 211-244. (Google citation: 4949) 
[122] Faul, A. C., and Tipping, M. E. (2002). Analysis of sparse Bayesian learning. In Advances in Neural Information Processing Systems (pp. 383-389).
[123] Tipping, M. E., and Faul, A. C. (2003). Fast marginal likelihood maximization for sparse Bayesian models. In AISTATS, January.
[124] Wipf, D. P., and Rao, B. D. (2004). Sparse Bayesian learning for basis selection. IEEE Transactions on Signal processing, 52(8), 2153-2164.
5.5 Bayesian deep learning
[125] Gal, Y., and Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning (pp. 1050-1059). (Google citation: 302) 
[126] Wang, H., and Yeung, D. Y. (2016). Towards Bayesian deep learning: A framework and some existing methods. IEEE Transactions on Knowledge and Data Engineering, 28(12), 3395-3408.
[127] Wang, H., and Yeung, D. Y. (2016). Towards Bayesian deep learning: A survey. arXiv preprint arXiv:1604.01662.
5.6 Bayesian model class selection and system identification
[128] Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 111-163.
[129] Jacobs, R. A., Peng, F., and Tanner, M. A. (1997) A Bayesian approach to model selection in hierarchical mixtures of-experts architectures. Neural Networks, 10(2), 231-241.
[130] Beck, J. L., and Katafygiotis, L. S. (1998). Updating models and their uncertainties. I: Bayesian statistical framework. Journal of Engineering Mechanics, 124(4), 455-461.
[131] Katafygiotis, L. S., and Beck, J. L. (1998). Updating models and their uncertainties. II: Model identifiability. Journal of Engineering Mechanics, 124(4), 463-467.
[132] Wasserman, L. (2000). Bayesian model selection and model averaging. Journal of Mathematical Psychology, 44(1), 92-107.
[133] Corduneanu, A., and Bishop, C. M. (2001). Variational Bayesian Model Selection for Mixture Distributions. In T. Jaakkola and T. Richardson, editors, Artifcial Intelligence and Statistics, pages 27-34. Morgan Kaufmann.
[134] Beck, J. L., and Yuen, K. V. (2004). Model selection using response measurements: Bayesian probabilistic approach. Journal of Engineering Mechanics, 130(2), 192-203. (Google citation: 407) 
[135] Posada, D., and Buckley, T. R. (2004). Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology, 53(5), 793-808.
[136]  Burnham, K. P., and Anderson, D. R. (2004). Multimodel inference: understanding AIC and BIC in model selection. Sociological Methods and Research, 33(2), 261-304. (Google citation: 4977) 
[137] Claeskens, G., and Hjort, N. L. (2008). Model selection and model averaging (Vol. 330). Cambridge: Cambridge University Press.
[138] Yuen, K.-V. (2010). Recent developments of Bayesian model class selection and applications in civil engineering. Structural Safety, 32(5), 338346.
5.7 Simulation-based methods for Bayesian inference
(e.g., MCMC, Adaptive MCMC, TMCMC, DREAM, reversible jump MCMC, sequential Monte Carlo, particle filter, BUS)
[139] Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., and Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21, 1087-1092. (Google citation: 37307) 
[140] Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97-109. (Google citation: 12113) 
[141] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell., 6, 721-741.
[142] Gelman, A., and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 457-472.
[143] Gilks, W. R. and P. Wild. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics 41(2), 337348.
[144] Tierney, L. (1994). Markov chains for exploring posterior distributions. the Annals of Statistics, 1701-1728.
[145] Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4), 711-732.
[146] Gelman, A., Roberts, G. O., and Gilks, W. R. (1996). Efficient Metropolis jumping rules. Bayesian Statistics 5, 599608.
[147] MacKay, D.J.C. (1998). Introduction to Monte Carlo methods. In M. Jordan, editor, Learning in graphical models. MIT Press.
[148] Gilks, W. R., Richardson, S., and Spiegelhalter, D. (1998). Markov Chain Monte Carlo in Practice. Boca Raton, FL, Chapman and Hall.
[149] Doucet, A., Godsill, S., and Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197-208.
[150] Doucet, A., De Freitas, N., and Gordon, N. (2001). Sequential Monte Carlo Methods in Practice. New York, Springer.
[151] Doucet, A., Gordon, N. J., and Krishnamurthy, V. (2001). Particle filters for state estimation of jump Markov linear systems. IEEE Transactions on Signal Processing, 49(3), 613-624.
[152] Beck, J. L. and Au, S. K. (2002). Bayesian updating of structural models and reliability using Markov chain Monte Carlo simulation. Journal of Engineering Mechanics-ASCE, 128(4), 380391.
[153] Chopin, N. (2002). A sequential particle filter method for static models. Biometrika, 89(3), 539552.
[154] Arulampalam, M. S., Maskell, S., Gordon, N., and Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, 50(2), 174-188.
[155] Haario, H., Saksman, E., and Tamminen, J. (2005). Componentwise adaptation for high dimensional MCMC. Computational Statistics, 20(2), 265273.
[156] Ter Braak, C. J. (2005). Genetic algorithms and Markov Chain Monte Carlo: Differential Evolution Markov Chain makes Bayesian computing easy (revised) (No. 010404). Wageningen UR, Biometris.
[157] Del Moral, P., Doucet, A., and Jasra, A. (2006). Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3), 411436.
[158] Gamerman, D., and Lopes, H. F. (2006). Markov chain Monte Carlo: stochastic simulation for Bayesian inference. CRC Press.
[159] Ter Braak, C. J. (2006). A Markov Chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces. Statistics and Computing, 16(3), 239-249.
[160] Ching, J. and Chen, Y. (2007). Transitional Markov chain Monte Carlo method for Bayesian model updating, model class selection, and model averaging. Journal of Engineering Mechanics, 133(7), 816832.
[161] Vrugt, J. A., Ter Braak, C. J. F., Diks, C. G. H., Robinson, B. A., Hyman, J. M., and Higdon, D. (2009). Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling. International Journal of Nonlinear Sciences and Numerical Simulation, 10(3), 273-290.
[162] Vrugt, J. A., and Ter Braak, C. J. (2011). DREAM (D): an adaptive Markov Chain Monte Carlo simulation algorithm to solve discrete, noncontinuous, and combinatorial posterior parameter estimation problems. Hydrology and Earth System Sciences, 15(12).
[163] Laloy, E., and Vrugt, J. A. (2012). Highdimensional posterior exploration of hydrologic models using multipletry DREAM (ZS) and highperformance computing. Water Resources Research, 48(1).
[164] Straub, D., and Papaioannou, I. (2014). Bayesian updating with structural reliability methods. Journal of Engineering Mechanics, 141(3), 04014134.
[165] Betz, W., Papaioannou, I. and Straub, D. (2014). Adaptive variant of the BUS approach to Bayesian updating. Eurodyn 2014. Porto, Portugal
[166] DiazDelaO, F. A., Garbuno-Inigo, A., Au, S. K., and Yoshida, I. (2017). Bayesian updating and model class selection with Subset Simulation. Computer Methods in Applied Mechanics and Engineering, 317, 1102-1121.
[167] Byrnes, P. G., and DiazDelaO, F. A. (2017). Reliability Based Bayesian Inference for Probabilistic Classification: An Overview of Sampling Schemes. In International Conference on Innovative Techniques and Applications of Artificial Intelligence (pp. 250-263). Springer, Cham.