Reddit Sarcasm Classifier

March 18th, 2019

Collaborated with a team of 5 to create a classifier to predict whether or not a Reddit comment was sarcastic based on a Kaggle dataset of over 1,000,000 comments. Implemented Natural Language Processing models (Word2Vec) and data processing tools (MapReduce) to create more accurate features for Machine Learning algorithms. Incorporated Machine Learning methods (Neural Net, Random Forests, Boosting, and SVM) to analyze Kaggle dataset and obtained a classification accuracy of over 80%.

random tree

Above is an image of a Random Forest we created. This graph shows the most influencial predictors for sarcasm (hasExclamation, score, numVowels) and helps to identify useful trends in the predictors.

Built by Ethan Wu, © 2024