Gaussian process regression (GPR) models are nonparametric kernel-based probabilistic models. You can train a GPR model using the fitrgp function.
Consider the training set {(xi,yi);i=1,2,...,n}, where xi∈?d and yi∈?, drawn from an unknown distribution. A GPR model addresses the question of predicting the value of a response variable ynew, given the new input vector xnew, and the training data. A linear regression model is of the form
y=xTβ+ε,
where ε∼N(0,σ2). The error variance σ2 and the coefficients β are estimated from the data. A GPR model explains the response by introducing latent variables, f(xi), i=1,2,...,n, from a Gaussian process (GP), and explicit basis functions, h. The covariance function of the latent variables captures the smoothness of the response and basis functions project the inputs x into a p-dimensional feature space.
A GP is a set of random variables, such that any finite number of them have a joint Gaussian distribution. If {f(x),x∈?d} is a GP, then given n observations x1,x2,...,xn, the joint distribution of the random variables f(x1),f(x2),...,f(xn) is Gaussian. A GP is defined by its mean function m(x) and covariance function, k(x,x′). That is, if {f(x),x∈?d} is a Gaussian process, then E(f(x))=m(x) and Cov[f(x),f(x′)]=E[{f(x)−m(x)}{f(x′)−m(x′)}]=k(x,x′).
Now consider the following model.
h(x)Tβ+f(x),
where f(x)~GP(0,k(x,x′)), that is f(x) are from a zero mean GP with covariance function, k(x,x′). h(x) are a set of basis functions that transform the original feature vector x in Rd into a new feature vector h(x) in Rp. β is a p-by-1 vector of basis function coefficients. This model represents a GPR model. An instance of response y can be modeled as
P(yi?f(xi),xi) ~N(yi?h(xi)Tβ+f(xi),σ2)
Hence, a GPR model is a probabilistic model. There is a latent variable f(xi) introduced for each observation xi, which makes the GPR model nonparametric. In vector form, this model is equivalent to
P(y?f,X)~N(y?Hβ+f,σ2I),
where
X=???????xT1xT2?xTn???????, y=???????y1y2?yn???????, H=???????h(xT1)h(xT2)?h(xTn)???????, f=???????f(x1)f(x2)?f(xn)???????.
The joint distribution of latent variables f(x1), f(x2), ..., f(xn) in the GPR model is as follows:
P(f?X)~N(f?0,K(X,X)),
close to a linear regression model, where K(X,X) looks as follows:
K(X,X)=???????k(x