About

I’m a machine learning engineer and independent researcher working in NLP and language modeling. I am based in New York City.

I’ve built the machine learning engine for some large search applications and built/consulted on a bunch of other ML products. I enjoy applying and creating new research. I’ve also written and shared lots of machine learning content (tutorials, webinars, research).

I’m currently interested in retrieval models, LLM-guided discrete program search, synthetic data pipelines, and energy markets. Past work focuses on designing and testing novel Transformer architectures to improve pretraining efficiency.

Here are some things I’ve worked on in 2024/2025:

  • A multi-token model that anticipated and overlapped nicely with Meta AI’s work on the same idea (now seen in SoTA models)
  • Framework for training retrieval models using LLM-generated labels to outperform human labels. Exploring the discrepancy between what LLMs and humans regard as useful information to retreive
  • Automated program discovery to generate and optimize LLM-based pipelines over your task (paper/code/demo). Parallel to DeepMind’s AlphaEvolve.
  • [WIP] Ava, data personalization platform I’ve built from the ground up that brings user data to third party services via OAuth, syncs/updates user data with LLM call trees, RAG, and privacy-preserving personalization features. More on the way.
  • An application of social choice theory to better model preference data, used in things like LLM arenas and RLHF, well-received and corroborated by recent research at DeepMind
  • A SSM-like architectural block with improved language modeling performance
  • Ticketworld: synthetic data pipeline that generates a challenging tool-use and multi-hop reasoning environment: resolve customer tickets given a database and customer service policy document. Used to evaluate TextEvolve.
  • Triton kernels to (hardware-specifically) optimize NF4 dequantization. Mostly an excuse to add Triton, some GPU programming, and NVIDIA NSight into the toolkit for low-level optimization
  • A chrome extension to make reading scientific literature better
  • [WIP] Cross-attending character and token inputs to mitigate tokenization errors (e.g. how many “r”s in “strawberry”)
  • A video series that walks through how to implement research papers
  • [WIP] jepa-style EBMs for text inference
  • A periodically updated state-of-the-art language model you can use to benchmark against your research.
  • A study of weighted skip connections for faster training
  • Some notes on learnable per-head temperature scaling for self-attention scores

If you’re interested in collaborating or want to talk about a project I’m always interested – reach out! (twitter or email)