City of Boston Office of Participatory Budgeting
City of Boston Office of Participatory Budgeting
Background
The City of Boston’s Office of Participatory Budgeting (OPB) launched a Participatory Budgeting (PB) project to gather input from residents on how to allocate $2 million to city programs, services, and projects. The goal was to ensure that Boston’s diverse communities had a voice in deciding how resources would be distributed. OPB collected proposal ideas through multiple channels, including an online portal, workshops, and PB Corners set up in public libraries.
Challenge
After collecting feedback, OPB needed to transform the public’s comments into an actionable list of priorities. Residents submitted a moderate volume of proposals, amounting to about 60,000 words—equivalent to an average adult fiction book. However, unlike a coherent narrative, these proposals were unstructured and varied widely in content and format. They ranged from as few as two words to as many as 300 words, sometimes including complex elements like emojis, similar to sorting through thousands of tweets or product reviews.
Manually processing and categorizing this unstructured data would have been time-consuming and prone to errors. OPB needed a systematic way to group similar proposals efficiently. Traditional topic modeling methods like Non-Negative Matrix Factorization (NMF) and Latent Dirichlet Allocation (LDA) are faster and easier to implement but often struggle with maintaining semantic coherence, especially when analyzing short or complex text.
Solution
Voyatek implemented a more advanced topic modeling approach, BERTopic, to capture the full context of each idea. Using OpenAI’s text embeddings, each proposal was transformed into a dense vector, encoding its semantic meaning. To manage dimensionality without losing key information, Uniform Manifold Approximation and Projection (UMAP) was used for dimensionality reduction. Then, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) was applied to group similar proposals into clusters. This method allowed OPB to identify patterns and connections between ideas, even when they were phrased differently.