Topic modeling is both an unsupervised machine learning and natural language processing technique capable of identifying topics out of a large corpus of documents. In other words it can create topic clusters from a large document collection. It can be used in many applications particularly in discourse analysis. Latent Dirichlet Allocation (LDA) is one of […]
Category: Tech Blog
Extending Linear Regression Models
Linear regression models assume the relationship between predictor variables and output variable is linear. Although linear models are simple and easy to interpret, it lacks predictive power because the true relationship is rarely linear. Ridge regression, lasso, and principle component regression are improved linear models with better model fitting. Yet, still those methods use a […]
Model Fitting in Linear Regression Setting
Model fitting refers to assessing the accuracy of the model. It is achieved by quantifying the extent to which model fits the data. The most common method in computing regression coefficients for model fitting is “least squares.” However to improve prediction accuracy and model interpretability, there are alternative model fitting methods too. Prediction Accuracy When […]
An Overview of Contemporary Classification Methods
Classification methods are used when the output variable is qualitative. Predicting a qualitative outcome for an observation is referred to as ‘classifying’. Figure 1 presents some of the most widely used classification methods. 1. Logistic Regression Logistic regression models the probability that an output variable (Y) belongs to a particular category. For example if you […]
Resampling Techniques
Resampling is the process of drawing samples repeatedly from a training data set and refitting the model on each sample to get additional information to decide on the best fit. The two of the most widely used techniques are cross-validation and bootstrap. 1. Cross-Validation Cross-validation is used to estimate the test error to evaluate model […]
Multiple Linear Regression
Multiple regression is used when there are more than one predictors or input variables. It extends the simple linear regression model by giving each predictor a separate slope coefficient within a single model. Given the number of predictors p equation is is the average effect on Y by increasing one of , holding all […]
Simple Linear Regression
Linear regression is a simple supervised learning approach for dealing with quantitative outcome variables. Mathematically simple linear regression which only includes a single input or predictor variable (X) is represented as; & are called model coefficients or model parameters. represents the intercept and represents the slope. is the mean-zero random error term. That is […]
An Overview of Statistical Learning
Statistical learning methods are useful for two main purposes. That is for prediction or inference. Prediction refers to predicting an output (aka: response or dependent) variable whereas inference refers to understanding how an output variable is affected by input (aka: predictors or independent) variables. There are many different linear and non-linear methods that can be […]