BERT Fine-Tuning Tutorial with PyTorch

Here’s another post I co-authored with Chris McCormick on how to quickly and easily create a SOTA text classifier by fine-tuning BERT in PyTorch. It’s incredibly useful to take a look at this transfer learning approach if you’re interested in creating a high performance NLP model.

BERT Word Embeddings Tutorial

Please check out the post I co-authored with Chris McCormick on BERT Word Embeddings here. In it, we take an in-depth look at the word embeddings produced by BERT, show you how to create your own in a Google Colab notebook, and tips on how to implement and use these embeddings in your production pipeline. Check it out!

Broyden’s Method in Python

In a previous post we looked at root-finding methods for single variable equations. In this post we’ll look at the expansion of Quasi-Newton methods to the multivariable case and look at one of the more widely-used algorithms today: Broyden’s Method.

Continue reading “Broyden’s Method in Python”

Root-Finding Algorithms Tutorial in Python: Line Search, Bisection, Secant, Newton-Raphson, Inverse Quadratic Interpolation, Brent’s Method

Motivation

How do you find the roots of a continuous polynomial function? Well, if we want to find the roots of something like:

f(x) = x^2 + 3x - 4

Continue reading “Root-Finding Algorithms Tutorial in Python: Line Search, Bisection, Secant, Newton-Raphson, Inverse Quadratic Interpolation, Brent’s Method”

Statistical Learning Theory: VC Dimension, Structural Risk Minimization

Sometimes our models overfit, sometimes they overfit.

A model’s capacity is, informally, its ability to fit a wide variety of functions. As a simple example, a linear regression model with a single parameter has a much lower capacity than a linear regression model with multiple polynomial parameters. Different datasets demand models of different capacity, and each time we apply a model to a dataset we run the risk of overfitting or underfitting our data.

Continue reading “Statistical Learning Theory: VC Dimension, Structural Risk Minimization”

DropConnect Implementation in Python and TensorFlow

I wouldn’t expect DropConnect to appear in TensorFlow, Keras, or Theano since, as far as I know, it’s used pretty rarely and doesn’t seem as well-studied or demonstrably more useful than its cousin, Dropout. However, there don’t seem to be any implementations out there, so I’ll provide a few ways of doing so. Continue reading “DropConnect Implementation in Python and TensorFlow”