神刀安全网

Model-Fitting Methods in Machine Learning – Closed-Form vs. Gradient Descent

In order to explain the differences between alternative approaches to estimating the parameters of a model, let’s take a look at a concrete example: Ordinary Least Squares (OLS) Linear Regression. The illustration below shall serve as a quick reminder to recall the different components of a simple linear regression model:

Model-Fitting Methods in Machine Learning – Closed-Form vs. Gradient Descent

In Ordinary Least Squares (OLS) Linear Regression, our goal is to find the line (or hyperplane) that minimizes the vertical offsets. Or, in other words, we define the best-fitting line as the line that minimizes the sum of squared errors (SSE) or mean squared error (MSE) between our target variable (y) and our predicted output over all samples i in our dataset of size  n .


Model-Fitting Methods in Machine Learning – Closed-Form vs. Gradient Descent

Now, we can implement a linear regression model for performing ordinary least squares regression using one of the following approaches:

  • Solving the model parameters analytically (closed-form equations)
  • Using an optimization algorithm (Gradient Descent, Stochastic Gradient Descent, Newton’s Method, Simplex Method, etc.)

1) Normal Equations (closed-form solution)

The closed-form solution may (should) be preferred for "smaller" datasets — if computing (a "costly") matrix inverse is not a concern. For very large datasets, or datasets where the inverse of XTX may not exist (the matrix is non-invertible or singular, e.g., in case of perfect multicollinearity), the GD or SGD approaches are to be preferred. The linear function (linear regression model) is defined as:

Model-Fitting Methods in Machine Learning – Closed-Form vs. Gradient Descent

where y is the response variable,  x is an  m -dimensional sample vector, and  w is the weight vector (vector of coefficients). Note that  w0 represents the y-axis intercept of the model and therefore  x0=1 . Using the closed-form solution (normal equation), we compute the weights of the model as follows:


Model-Fitting Methods in Machine Learning – Closed-Form vs. Gradient Descent

2) Gradient Descent (GD)

Using the Gradient Decent (GD) optimization algorithm, the weights are updated incrementally after each epoch (= pass over the training dataset).

The cost function J(⋅) , the sum of squared errors (SSE), can be written as:

Model-Fitting Methods in Machine Learning – Closed-Form vs. Gradient Descent

The magnitude and direction of the weight update is computed by taking a step in the opposite direction of the cost gradient


Model-Fitting Methods in Machine Learning – Closed-Form vs. Gradient Descent

where η is the learning rate. The weights are then updated after each epoch via the following update rule:


Model-Fitting Methods in Machine Learning – Closed-Form vs. Gradient Descent

where Δw is a vector that contains the weight updates of each weight coefficient w , which are computed as follows:

Model-Fitting Methods in Machine Learning – Closed-Form vs. Gradient Descent

Essentially, we can picture GD optimization as a hiker (the weight coefficient) who wants to climb down a mountain (cost function) into a valley (cost minimum), and each step is determined by the steepness of the slope (gradient) and the leg length of the hiker (learning rate). Considering a cost function with only a single weight coefficient, we can illustrate this concept as follows:

Model-Fitting Methods in Machine Learning – Closed-Form vs. Gradient Descent

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Model-Fitting Methods in Machine Learning – Closed-Form vs. Gradient Descent

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址