Uncategorized – Page 4

Multi-label Classification: A Guided Tour

Introduction I recently undertook some work that looked at tagging academic papers with one or more labels based on a training set. A preliminary look through the data revealed about 8000 examples, 2750 features, and…650 labels. For clarification, that’s 2750 sparse binary features (keyword indices for the articles), and 650 labels, not classes. Label cardinality…

January 23, 2017

nickcdryan

Uncategorized

benchmarking, machine learning, model evaluation, multi-label, tutorial

Multi-label Classification: A Guided Tour

Decision Tree Visualization with pydotplus

A useful snippet for visualizing decision trees with pydotplus. It took some digging to find the proper output and viz parameters among different documentation releases, so thought I’d share it here for quick reference.

January 23, 2017

nickcdryan

Uncategorized

python, sklearn, visualization

Decision Tree Visualization with pydotplus

Income Analysis – US Census Data

A couple months back, I worked on analysis and predictive modeling of US salary given census data. Full Jupyter notebook here, below are some details and some of the more interesting findings. In general, metadata is below and contains lots of null values (as you might suspect of census data).

January 23, 2017

nickcdryan

Uncategorized

benchmarking, model evaluation, preprocessing, python, sklearn, visualization

What Goes First – Speed or Strength?

I recently had access to a lot of baseball data, specifically data on every season of every player in the history of the MLB going back to 1871. Here’s some analysis on how baseball players lose speed and strength (or both) throughout their career. Analysis primarily consisted of variable creation and data queries. Unfortunately, code not available…

January 22, 2017

nickcdryan

Uncategorized

baseball, data analysis, preprocessing, visualization

Trump Tweet Analysis

This project stems from two overarching questions: Which emotions do politicians most frequently appeal to? I recently saw a BuzzFeed presentation on, among other things, the virality of BuzzFeed content. A big part of their business relies on understanding what kind of content goes viral and why, so their data science team understandably spends a lot…

January 22, 2017

nickcdryan

Uncategorized

machine learning, NLP, politics, python, twitter, visualization

Article Classification and News Headlines Over Time

How does front page news track a single topic over a period of time? What’s the media’s attention span for a given story? In general, many find it surprising how quickly major media outlets shift their attention from one story to another. This is partly a reflection of our own attention spans and appetites, and…

January 22, 2017

nickcdryan

Uncategorized

machine learning, NLP, preprocessing, python, sklearn, tutorial, visualization, web scraping

Building a Recurrent Neural Network to Generate Novel Text

Introduction The purpose of this quick tutorial is to get you a very big, very useful neural network up and running in just a few hours. The goal is that anyone with a computer, some free time, and little-to-no knowledge of what neural networks are or how they work can easily begin playing with this…

January 20, 2017

nickcdryan

Uncategorized

deep learning, model evaluation, neural networks, tutorial, web scraping

Building a Recurrent Neural Network to Generate Novel Text

Introduction to Regularization

What is regularization? Regularization, as it is commonly used in machine learning, is an attempt to correct for model overfitting by introducing additional information to the cost function. In this post we will review the logic and implementation of regression and discuss a few of the most widespread forms: ridge, lasso, and elastic net. For simplicity, we’ll…

January 19, 2017

nickcdryan

Uncategorized

machine learning, regularization, tutorial

Short Introduction to PCA

In Principal Component Analysis (PCA), we would like to convert our high-dimensional dataset onto a lower-dimensional space while keeping as much information as possible. Typically, this is done to avoid curse of dimensionality effects or for the purposes of data visualization. In broad strokes, PCA reduces the dimensionality of our dataset in a way that…

January 19, 2017

nickcdryan

Uncategorized

dimensionality reduction, PCA, tutorial