LINEAR REGRESSION WITH MORE THAN ONE PREDICTOR VARIABLE.

Today, I have learned that multiple regression is used to understand how multiple variables affect a single dependent variable. Earlier, we studied simple linear regression which had only a single predictor variable. So, basically having more than one predictor variable in a multiple linear regression model allows for a more comprehensive analysis of how multiple factors collectively improve the dependent variable.

The mathematical equation of multiple linear regression is given by,

Y= A0+A1X1+A2X2+………..+AnXn + c

Where Y is the dependent variable and X1,X2,….., are predictor variables. A1,A2…, are coefficients of the predictor variables.

Overfitting I would like to explain this concept by taking an example.

Imagine there are three students, A, B, and C, who are preparing for an important exam. They have each covered different percentages of the syllabus before the test: Student A covered 90 percent, Student B covered 40 percent, and Student C covered 80 percent of the syllabus. When the exam results came in, they showed the following scores: Student A, who diligently covered 90 percent of the syllabus, secured an outstanding 90 percent on the exam. This is a prime example of a “best fit.” In the world of machine learning, this would be akin to a model that is well-trained on relevant data and performs exceptionally well on unseen data, striking a perfect balance. Student B, who only covered 40 percent of the syllabus, managed to score 50 percent on the exam. This situation exemplifies “underfitting.” Student B was underprepared for the exam, which resulted in a subpar performance. In machine learning, this mirrors a model that is too simplistic and fails to capture essential patterns in the data, leading to poor performance on both training and test data. Student C is an interesting case. Despite covering 80 percent of the syllabus, they could only secure 55 percent on the exam. This scenario mirrors “overfitting.” Student C might have overcomplicated their preparation or focused on less critical details, which led to a model that’s too complex. In machine learning, this corresponds to a model that performs exceptionally well on the training data but poorly on the test data because it has effectively memorized the training data rather than generalized from it.

To overcome overfitting, we have different strategies and techniques one of them is cross validation.

Cross validation: I have understood that cross validation helps us evaluate how well a machine learning model can generalize its understanding to new data by training on different parts of the data and testing on the parts it has not seen before. It helps in identifying overfitting issues during model development and ensures the model is better and accurate predictions are made.

Leave a Reply Cancel reply