New York City's digital record-keeping apparatus holds hundreds of millions of files — property deeds, building permits, zoning maps, subway infrastructure diagrams, court exhibits — and a growing share of that archive has, for years, been quietly riddled with duplicate images. The problem is structural, not accidental, and the city is only now putting formal replacement protocols in place across multiple agencies.
The issue matters now because of scale. The Department of Buildings, the Department of City Planning, and the Metropolitan Transportation Authority have all expanded their public-facing digital portals significantly since 2019, when pandemic-era remote access demands accelerated uploads that were never properly deduplicated. Files scanned at borough offices in the Bronx and Staten Island, for instance, were often uploaded more than once by different clerks working under different software environments, with no automated system to flag the redundancy.
A Paper Problem That Went Digital
The roots of the issue stretch back further than the pandemic. When the city began digitizing its physical archives in earnest in the early 2000s, the process was fragmented by borough and by department. The Manhattan Municipal Building at 1 Centre Street operated under different scanning protocols than offices in Queens Borough Hall or the Brooklyn property records annex on Joralemon Street. Vendors changed, software changed, and consistency never arrived.
By 2018, the city's Department of Information Technology and Telecommunications — known as DoITT, later rebranded as the NYC Office of Technology and Innovation — had identified image duplication as a Tier 2 data quality problem in internal audits. That designation meant it was acknowledged but not treated as urgent. Budgets for data remediation work were modest, and the political appetite for fixing infrastructure that the public rarely sees directly was limited under successive administrations.
The MTA's situation was particularly acute. The authority's capital program, which has directed tens of billions of dollars into the subway system since 2020, generated an enormous volume of engineering drawings, inspection photographs, and procurement documents. Many of those files, particularly photographs taken during track inspections along the A and C lines in Brooklyn and the 7 line extension to Hudson Yards, ended up duplicated in both the MTA's internal document management system and in submissions to the Federal Transit Administration. Staff time spent manually identifying and removing duplicates ran into the thousands of hours annually, according to a framework described in the MTA's 2024 capital program oversight documentation.
The Replacement Protocols Now Taking Shape
The Adams administration moved in late 2025 to consolidate city data governance under a revised framework that, for the first time, requires agencies submitting records to the city's open data portal at data.cityofnewyork.us to run automated hash-matching checks before upload. The tool flags files that are byte-for-byte identical or visually similar above a defined threshold. Agencies then have 30 days to either confirm a duplicate and replace it with a canonical version or document why the similar files are genuinely distinct records.
The Department of City Planning, which manages zoning maps and land-use applications for neighborhoods including Gowanus, Midtown East, and the Bronx's Southern Boulevard corridor, began piloting the new protocol in March 2026. Early results from that pilot — described in planning department briefing materials circulated to the City Council's land use committee — found that roughly 11 percent of image files submitted over a 90-day sample period were flagged as potential duplicates. Not all were confirmed redundancies, but the rate was higher than agency officials had projected.
For residents and researchers who rely on these portals — property lawyers checking ACRIS filings, urban planners pulling permit histories, journalists reviewing building inspections — the practical effect of the new system should be cleaner, faster searches and fewer dead-end file downloads. The 30-day replacement window means the fix is rolling rather than instantaneous. Anyone using city data portals through the summer of 2026 may still encounter legacy duplicate entries, particularly in older property and infrastructure records predating the 2025 governance changes. The Office of Technology and Innovation has published a public-facing status page for the remediation effort, updated monthly, which is the most direct way to track which datasets have been cleared.