Introduction to Topic Modeling

Topic modeling is both an unsupervised machine learning and natural language processing technique capable of identifying topics out of a large corpus of documents. In other words it can create topic clusters from a large document collection. It can be used in many applications particularly in discourse analysis. Latent Dirichlet Allocation (LDA) is one of […]

Read More

Extending Linear Regression Models

Linear regression models assume the relationship between predictor variables and output variable is linear. Although linear models are simple and easy to interpret, it lacks predictive power because the true relationship is rarely linear. Ridge regression, lasso, and principle component regression are improved linear models with better model fitting. Yet, still those methods use a […]

Read More

Resampling Techniques

Resampling is the process of drawing samples repeatedly from a training data set and refitting the model on each sample to get additional information to decide on the best fit. The two of the most widely used techniques are cross-validation and bootstrap. 1. Cross-Validation Cross-validation is used to estimate the test error to evaluate model […]

Read More

Multiple Linear Regression

Multiple regression is used when there are more than one predictors or input variables. It extends the simple linear regression model by giving each predictor a separate slope coefficient within a single model. Given the number of predictors p equation is   is the average effect on Y by increasing one of , holding all […]

Read More

Simple Linear Regression

Linear regression is a simple supervised learning approach for dealing with quantitative outcome variables. Mathematically simple linear regression which only includes a single input or predictor variable (X) is represented as;   & are called model coefficients or model parameters. represents the intercept and  represents the slope. is the mean-zero random error term. That is […]

Read More

An Overview of Statistical Learning

Statistical learning methods are useful for two main purposes. That is for prediction or inference. Prediction refers to predicting an output (aka: response or dependent) variable whereas inference refers to understanding how an output variable is affected by input (aka: predictors or independent) variables. There are many different linear and non-linear methods that can be […]

Read More