Multi-label Classification: A Guided Tour

Introduction

I recently undertook some work that looked at tagging academic papers with one or more labels based on a training set.

A preliminary look through the data revealed about 8000 examples, 2750 features, and…650 labels. For clarification, that’s 2750 sparse binary features (keyword indices for the articles), and 650 labels, not classes. Label cardinality (average number of labels per example) is about 2, with the majority of labels only occurring a few times in the dataset…doesn’t look good, does it? Nevertheless, more data wasn’t available and label reduction wasn’t on the table yet, so I spent a good amount of time in the corners of academia looking at multi-label work. 

Continue reading “Multi-label Classification: A Guided Tour”

Income Analysis – US Census Data

A couple months back, I worked on analysis and predictive modeling of US salary given census data. Full Jupyter notebook here, below are some details and some of the more interesting findings.

In general, metadata is below and contains lots of null values (as you might suspect of census data).

screen-shot-2017-01-22-at-11-21-22-pm Continue reading “Income Analysis – US Census Data”

Building a Recurrent Neural Network to Generate Novel Text

Introduction

The purpose of this quick tutorial is to get you a very big, very useful neural network up and running in just a few hours. The goal is that anyone with a computer, some free time, and little-to-no knowledge of what neural networks are or how they work can easily begin playing with this technology as soon as possible. Technical explanations of what RNNs are abound on the internet, so this tutorial will skip explanation and focus solely on building. Continue reading “Building a Recurrent Neural Network to Generate Novel Text”