nickcdryan
-
Consistency is underrated
LLMs have a consistency problem. I don’t regard this as an LLM problem or an AI problem but as a systems problem. By this, I mean to say they create a systems problem that can occur at or across any level, and could be resolved at or across multiple levels. The systems problem is that…
-
TextEvolve: Automated Program Discovery with LLMs
I’m happy to introduce TextEvolve, a system that iteratively generates and tests new programs over your dataset, evolving its approach with LLM evaluation. TextEvolve changes program flow, tries new ideas, and outputs optimized programs in the form of Python scripts. Paper Github Twitter thread YouTube Initial results are very strong! TextEvolve produces programs that outperform…
-
RefDive: Chrome extension for better paper reading, hover for citation reference
I read a lot of papers. There are a few things that annoy me about reading papers in Chrome: Surely there are enough people reading papers that would benefit from some lightweight improvements. So I built RefDive, a Chrome extension to handle this (demo video). This made for a nice little (heavily LLM-assisted) project. (github)…
-
What’s new in RAG?
RAG first came out in 2020. (I doubt most people know that “RAG” refers to a specific paper from Facebook.) At the time, I was working on question answering systems, and RAG seemed like the obvious path forward. The best models at the time were seriously hobbled by (among other things) memory and context, and…
-
Better than Elo? Experiments show that social choice theory yields more faithful rankings for LLM leaderboards and preference modeling
Summary Most head-to-head LLM ranking (like the LMSYS arena) is calculated with the Elo rating system based on user comparisons between two LLM responses. Elo was developed to rank chess player strength over time. It’s been adapted for LLM ranking, but this adaptation has several problems, including: In this post we test out some alternative…
-
Introducing a learnable temperature value into the softmax self-attention scores
SUMMARY Adding a per-head parameterized scaling factor to the query-key attention scores (analogous to adding a learnable temperature to the softmax) slightly improves performance transformer performance. [Update 11/2024: If this interests you some recent work re-examines the role of softmax in a similar vein. DeepMind’s softmax is not enough proposes adapting softmax temperature based on…
-
Weak recurrent blocks improve language modeling performance
SUMMARY We propose a lightweight and simple mechanism, “weak recurrence,” to incorporate information from previous time steps. This method simply adds a gated weighted sum over the previous n tokens between the attention and feedforward components of a transformer layer. This modification results in improved performance. [Code, Log/notes] INTRODUCTION In recent years there is renewed…
-
Adaptive skip connections improve training
SUMMARY Applying a single, linear, learnable weight to the skip (identity) component of residual connections slightly improves training performance. It also reveals interesting training dynamics: during training models will select for strong skip connections in early layers but minimal skip connections in middle and later layers. This can result in learned near-zero skip connections in…
-
Improving language modeling loss with multi-token prediction: experiments in multi-token prediction and the new FAIR paper
SUMMARY ******************************** UPDATE The FAIR authors were kind enough to discuss their work with me and answer some questions. NWP loss: The main point of interest was that on multitoken prediction they saw an isolated next word prediction that was slightly worse (~.1) than a standard model. (Author later publicly tweeted this here so it’s…
-
A state of the art decoder language model
I pulled together all of the current best practices and modifications for LLMs and implemented them in a minimalist style. Useful as a benchmark model to test research ideas against. See here for project link and full description