Another one! This is nearly the same as the BERT fine-tuning post but uses the updated huggingface library. (There are also a few differences in preprocessing XLNet requires.)

Skip to content
## XLNet Fine-Tuning Tutorial with PyTorch

## BERT Fine-Tuning Tutorial with PyTorch

## BERT Word Embeddings Tutorial

## Broyden’s Method in Python

## Root-Finding Algorithms Tutorial in Python: Line Search, Bisection, Secant, Newton-Raphson, Inverse Quadratic Interpolation, Brent’s Method

## Statistical Learning Theory: VC Dimension, Structural Risk Minimization

## DropConnect Implementation in Python and TensorFlow

Another one! This is nearly the same as the BERT fine-tuning post but uses the updated huggingface library. (There are also a few differences in preprocessing XLNet requires.)

Here’s another post I co-authored with Chris McCormick on how to quickly and easily create a SOTA text classifier by fine-tuning BERT in PyTorch. It’s incredibly useful to take a look at this transfer learning approach if you’re interested in creating a high performance NLP model.

Please check out the post I co-authored with Chris McCormick on BERT Word Embeddings here. In it, we take an in-depth look at the word embeddings produced by BERT, show you how to create your own in a Google Colab notebook, and tips on how to implement and use these embeddings in your production pipeline. Check it out!

In a previous post we looked at root-finding methods for single variable equations. In this post we’ll look at the expansion of Quasi-Newton methods to the multivariable case and look at one of the more widely-used algorithms today: Broyden’s Method.

**Motivation**

How do you find the roots of a continuous polynomial function? Well, if we want to find the roots of something like:

Sometimes our models overfit, sometimes they overfit.

A model’s **capacity** is, informally, its ability to fit a wide variety of functions. As a simple example, a linear regression model with a single parameter has a much lower capacity than a linear regression model with multiple polynomial parameters. Different datasets demand models of different capacity, and each time we apply a model to a dataset we run the risk of overfitting or underfitting our data.

Continue reading “Statistical Learning Theory: VC Dimension, Structural Risk Minimization”

I wouldn’t expect DropConnect to appear in TensorFlow, Keras, or Theano since, as far as I know, it’s used pretty rarely and doesn’t seem as well-studied or demonstrably more useful than its cousin, Dropout. However, there don’t seem to be any implementations out there, so I’ll provide a few ways of doing so. Continue reading “DropConnect Implementation in Python and TensorFlow”