Jessie Dong

I'm exploring what's next for me while I study Computer Science and Physics at Stanford. Here's what I've been doing:

More things I've worked on:

Projects at Stanford AI Lab (OVAL - Open Virtual Assistants Lab)

De(ep)Composition: a system for long-horizon reasoning that decomposes unstructured information into hierarchical components and stores it in persistent, queryable memory. Useful in settings that require continuously updated views grounded in accumulated evidence, like hedge-fund portfolio management, corporate teams monitoring markets, and investigative journalists maintaining long-running storylines. (The code can't be publicly available yet as it's part of an ongoing OVAL project, but here's a fun advertisement we made!)
ScalableTokenizer: a tokenizer that learns a vocabulary with column generation and decodes text with dynamic programming while consulting morphology-aware features, named-entity gazetters, and regex guards. Inspired by foundational tokenizers such as SentencePiece and Byte-Pair Encoding, I see work like this as a path toward more efficient representations of text in LLMs.
POMDP for Investing Portfolios: Optimal Portfolio Rebalancing Under Changing Market Conditions, for CS 238 (decision-making under uncertainty). A partially observable Markov decision process for portfolio allocation under uncertain volatility regimes, using Fama-French 5-factor data and value iteration to maximize mean-variance utility, outperforming 60/40 and volatility-threshold strategies in return, Sharpe ratio, and maximum drawdown.
PreSearch (research opportunity marketplace): a two-sided platform connecting students seeking research positions with PIs and labs posting open project roles, with a compatibility matching algorithm over scraped academic interests. Deployed on AWS EC2 with a React frontend and FastAPI backend; engineered a LinkedIn and university directory scraping pipeline using Selenium and BeautifulSoup.
Atriium (archival document transcription): a web-based system for transcribing scanned historical and archival documents into structured text, combining OCR with LLM-based post-processing for noisy and handwritten sources.
Stanford Archives Database: complete corpus of Regina Twala's letters and archival materials, built via OCR- and LLM-based transcription, translation, and metadata structuring from scanned documents.
Sequence [backend] [frontend] (computer vision safety platform for trades): a real-time video analytics system for hazardous trades, using deep learning-based object detection and activity recognition to identify unsafe behaviors and job-site risks.
State of the Students: an award-winning civic media platform recognized by former presidential candidates Kamala Harris and Pete Buttigieg; had the most views and engagement for high-profile races and local elections from 2019 to 2023.