The Daily New York

New York news, every day

News

New York Launches Major Cleanup of Duplicate Images in City Records

Decades of siloed city agencies, patchwork software upgrades, and emergency digitization drives left municipal databases bloated with redundant files; now a coordinated cleanup is underway.

By New York News Desk · Published 4 July 2026, 2:36 pm

3 min read

New York Launches Major Cleanup of Duplicate Images in City Records
Photo: Higginson, Thomas Wentworth, 1823-1911 / Public domain (Wikimedia Commons)

New York City's digital infrastructure is carrying dead weight — millions of duplicate image files spread across agency servers from the Bronx to Staten Island, slowing public-facing portals, inflating storage costs, and complicating the kind of rapid data-sharing that a FIFA World Cup host city cannot afford to fumble. The problem did not appear overnight. It accumulated across roughly three decades of disjointed technology procurement, and the story of how the city arrived here is, at its core, a story about what happens when 50-plus agencies each build their own digital house without ever agreeing on a shared address.

The timing matters. With the 2026 World Cup already underway and MetLife Stadium in East Rutherford handling matches while New York City serves as the primary host hub — fan zones stretching from Midtown Manhattan down to the piers on the Hudson — city agencies have been under pressure to modernize public-information portals faster than originally planned. The NYC Mayor's Office of Technology and Innovation, known as OTI, identified duplicate image storage as a measurable drag on load times for several high-traffic sites, including NYC.gov property and permit databases that journalists, lawyers, and contractors consult daily.

A Timeline Built From Band-Aids

The roots of the problem stretch back to the 1990s, when agencies like the Department of Buildings and the Department of City Planning began scanning paper records independently of one another. Each agency contracted its own vendors, adopted its own file-naming conventions, and stored images on isolated servers. When Mayor Bloomberg's administration pushed a broader digitization agenda after 2002, the effort accelerated the volume of scanned records without standardizing how duplicates would be flagged or purged. By the time the de Blasio administration launched its Open Data initiative under Local Law 11 of 2012, the inherited libraries were already stratified with redundant files.

Emergency digitization during the COVID-19 pandemic added another layer. Between March 2020 and the end of 2021, city agencies raced to put paper workflows online, often uploading documents — including permit drawings, inspection photos, and zoning maps — without cross-checking existing inventories. The Department of Records and Information Services, headquartered at 31 Chambers Street in Lower Manhattan, flagged the redundancy problem internally but lacked the budget authority to mandate a citywide purge at that stage.

The financial stakes are not trivial for a city that leases cloud storage capacity across multiple vendors. Industry benchmarks consistently show that large municipal governments can carry duplicate-file rates of 20 to 40 percent in legacy document repositories, though the city has not released a precise figure for its own holdings. Storage costs for enterprise-grade cloud infrastructure have remained between roughly $20 and $30 per terabyte per month depending on contract tier — meaning even a modest reduction in redundant data across a system measured in hundreds of terabytes produces meaningful budget savings.

The Cleanup and What Comes Next

OTI began a phased deduplication program earlier this year, starting with the Department of Buildings' BISWeb portal and the NYC Housing Preservation and Development image libraries, which together serve hundreds of thousands of queries each month from residents navigating the city's housing affordability crisis. The program uses hash-matching software to identify bit-for-bit identical files before flagging near-duplicates for human review — a step intended to avoid accidentally deleting distinct records that happen to look similar.

Community groups in neighborhoods like Sunset Park and the South Bronx, where residents rely heavily on HPD records to track landlord compliance, have pushed for faster portal performance for years. Tenant advocacy organizations working out of offices along Fordham Road and along Fifth Avenue in Brooklyn have documented cases where slow or broken image loads delayed housing court proceedings.

The practical upshot for anyone who regularly uses NYC agency portals: response times on image-heavy pages are expected to improve incrementally through the end of 2026 as the deduplication rollout extends to additional agencies. The Department of City Planning's ZoLa mapping tool is listed as a priority target in the next phase. Users who encounter missing images during the transition are advised to use the NYC311 feedback portal to log specific broken links, which OTI says feeds directly into its remediation queue.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily New York

This article was produced by the The Daily New York editorial desk and covers news in New York. See our editorial standards for how we use AI.

The Daily New York brief

The day's New York news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to New York news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily New York

More in News

Enjoyed this story? Get tomorrow's briefing free.