New York City's sprawling network of municipal databases contains an estimated tens of thousands of duplicate images — photographs of potholes, building violations, permit applications and public housing units stored multiple times across overlapping systems — and the Department of Information Technology and Telecommunications (DoITT) has been quietly working since early 2025 to address the problem. The effort has accelerated this summer, city technology officials confirmed in published budget documents, as the Adams administration pushes to consolidate legacy data infrastructure ahead of a planned citywide digital modernization rollout.
The issue matters now for a specific reason: New York is hosting FIFA World Cup 2026 matches at MetLife Stadium this month, and city agencies from the NYPD to the Department of Transportation have been pooling location data, crowd-management imagery and event logistics files at a pace that has stressed existing storage systems. The practical consequence is degraded search performance, inflated storage costs and, in some cases, conflicting records — a building's permit photo logged twice under different file IDs can slow inspector workflows at the Department of Buildings on Worth Street in Lower Manhattan.
What New York Is Actually Doing
The city's primary tool is an automated deduplication protocol run through NYC Open Data infrastructure, which flags image files with identical or near-identical hash values. The Department of City Planning has applied a version of this to its BYTES of the Big Apple archive, which catalogues zoning and land-use photography going back decades. The Housing Preservation and Development agency, which manages inspection records for more than one million rent-stabilized units citywide, began a parallel audit in March 2026 after internal reviews found significant redundancy in its image libraries tied to the Housing Quality Enforcement division.
The effort is not cheap. Municipal technology contracts in New York have historically run well above those in comparable cities, partly because of union labor agreements and procurement rules. The DoITT's fiscal year 2026 technology services budget, as published in the mayor's executive budget released in April, was listed at approximately $800 million across all programs — though the specific allocation for data deduplication work was not broken out as a standalone line item in publicly available documents.
How Other Cities Are Handling It
London's Government Digital Service, operating under the UK Cabinet Office, began a systematic image deduplication program for borough-level planning and housing databases in 2023 and publicly reported reducing redundant file storage across the Greater London Authority by roughly 30 percent within 18 months. Tokyo's Metropolitan Government ran a similar initiative tied to its Smart Tokyo 2030 strategy, focusing heavily on disaster-preparedness image archives. São Paulo's city administration has taken a more fragmented approach, delegating the work to individual secretariats with no central coordination — a model that technology analysts have described in published reports as the least effective of major global cities tested.
New York sits somewhere between London's centralized model and São Paulo's decentralized one. DoITT functions as a coordinating body, but individual agencies — the NYPD's Real Time Crime Center in the Bronx, the MTA's capital project documentation teams, the 311 complaint-image system — each maintain their own storage environments. That means deduplication has to happen at the agency level first, then reconcile upward. The MTA, which is not a city agency but receives city capital dollars, has not publicly disclosed whether its image deduplication efforts are synchronized with DoITT's broader initiative.
For New Yorkers, the practical stakes are modest but real. Slower 311 response processing, redundant inspection photos delaying building permits on projects from the South Bronx to Red Hook, and inflated cloud storage costs that ultimately appear somewhere in the capital budget — these are the downstream effects of a problem that sounds technical but touches everyday city services. Advocates for government efficiency, including the Citizens Budget Commission, have repeatedly flagged data infrastructure modernization as an underfunded priority in New York's technology spending.
DoITT has not announced a public completion date for the current deduplication audit. City officials are expected to present updated data governance benchmarks to the City Council's Technology Committee in September 2026. Anyone tracking city building permits or HPD inspections can monitor progress through the NYC Open Data portal at data.cityofnewyork.us, where dataset update logs are publicly visible.