What are the differences between overfitting and underfitting?
In statistics and machine learning, one of the most common tasks is to fit a model to a set of training data, so as to be able to make reliable predictions on general untrained data. The goal of machine learning is to model the pattern and ignore the nose. If an algorithm is trying to fit the noise in addition to the pattern, it is overfitting.
When an algorithm misses the pattern while trying to avoid fitting the noise, it is underfitting.
Signs of overfitting — Low in-sample error, high out-of-sample error.
Signs of underfitting — High in-sample error, high out-of-sample error.
In overfitting, a statistical model describes random error or noise instead of the underlying relationship. Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model that has been overfit has poor predictive performance, as it overreacts to minor fluctuations in the training data.
Underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. Underfitting would occur, for example, when fitting a linear model to non-linear data. Such a model too would have poor predictive performance.
Classification is a technique where we categorize data into a given number of classes.Classification can be performed on structured or unstructured data. The main goal of a classification problem is to identify the category/class to which a new data belongs.Classification is a two step procedure.
1.Design a model based on TRAINING Data set
2.Classify the target data by using classification mode
- Classifier: An algorithm that maps the input data to a specific category.The classifier having following categories.
Naive Bayes algorithm
This algorithm is based on Bayes’ theorem with the assumption that independence between every pair of features exists. Naive Bayes classifiers can handle real-world situations such as document classification and spam filtering very Smoothly.
This algorithm requires a small amount of training data to estimate the necessary parameters and to define a model. Naive Bayes classifiers are extremely fast compared to more sophisticated methods.
Logistic regression is an algorithm which has been used in machine learning algorithm for classification. This algorithm uses the probabilities describing the possible outcomes of a single trial are modelled using a logistic function.
Logistic regression is designed for classification, and is most useful for understanding the influence of several independent variables on a single outcome variable.
Decision Trees are a type of Supervised Machine Learning ,where the data is continuously split according to a certain parameter. The tree can be explained by two entities, namely decision nodes and leaves. The leaves are the decisions or the final outcomes. And the decision nodes are where the data is split.
Decision Tree models are created using 2 steps:
Induction and Pruning.Induction is where we actually build the tree i.e set all of the hierarchical decision boundaries based on our data. Because of the nature of training decision trees they can be prone to major overfitting. Pruning is the process of removing the unnecessary structure from a decision tree, effectively reducing the complexity to combat overfitting with the added bonus of making it even easier to interpret.
SVM (Support Vector Machines)
SVM is relatively new classification method for both linear and nonlinear data.It uses a nonlinear mapping to transform the original training data into a higher dimension.With the new dimension, it searches for the linear optimal separating hyperplane (i.e., “decision boundary”).
With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyperplane.
SVM finds this hyperplane using support vectors.
Kernel density estimation is a non-parametric method of estimating the probability density function (PDF) of a continuous random variable. It is non-parametric because it does not assume any underlying distribution for the variable. Essentially, at every datum, a kernel function is created with the datum at its centre – this ensures that the kernel is symmetric about the datum.
The PDF is then estimated by adding all of these kernel functions and dividing by the number of data to ensure that it satisfies the 2 properties of a PDF:
§ Every possible value of the PDF (i.e. the function, ), is non-negative.
§ The definite integral of the PDF over its support set equals to 1.
A neural network: A set of connected input/output units where each connection has a weight associated with it.During the learning phase, the network learns by adjusting the weights so as to be able to predict the correct class label of the input tuples.Also referred to as connectionist learning due to the connections between units.
Recurrent Neural Networks
In a RNN, the information cycles through a loop. When it makes a decision, it takes into consideration the current input and also what it has learned from the inputs it received previously.
A Recurrent Neural Network is able to remember exactly that, because of it’s internal memory. It produces output, copies that output and loops it back into the network. It is useful in time series prediction only because of the feature to remember previous inputs as well. This is called Long Short Term Memory.Recurrent neural network are even used with convolutional layers to extend the effective pixel neighborhood.
Modular Neural Network
A modular neural network is one that is composed of more than one neural network model connected by some intermediary. Modular neural networks can allow for sophisticated use of more basic neural network systems managed and handled in conjunction.
Modular neural networks are one of the models that can be used for object recognition, classification, and identification. A modular neural network can be viewed as a set of monolithic neural networks that deal with a part of a problem, and then their individual outputs are combined by an integration unit to form a global solution to the complete problem. The main idea is that a complex problem can be divided into simpler subproblems that can be solved by simpler neural networks and then the total solution will be a combination of the outputs of the simple monolithic neural networks.