New York City government computers store an estimated hundreds of millions of digital image files across dozens of agencies, and a growing share of them are exact or near-exact duplicates. That is not a minor housekeeping issue. It is a data management crisis with measurable dollar consequences, and city technology officials have been wrestling with how to quantify it — let alone fix it — for the better part of three years.
The timing matters because the city is mid-way through a sweeping digital infrastructure overhaul tied to the 2026 FIFA World Cup hosting obligations, which required New York to upgrade public-facing systems and back-end agency databases to handle the surge in permit requests, credentialing workflows, and event-management documentation that accompanied the tournament. That upgrade exposed redundancy problems that had been papered over for years.
What the Numbers Actually Show
Industry benchmarks for large municipal governments suggest that duplicate files can account for between 20 and 40 percent of total stored data in unmanaged environments. For a city the size of New York — which the Department of Citywide Administrative Services has described in budget documents as managing petabytes of unstructured data across its roughly 300,000-employee workforce — even a conservative 25 percent duplication rate translates to enormous unnecessary storage costs. Cloud storage contracts, which the city began expanding under its 2021 technology modernization plan, typically price enterprise tiers at rates where redundant data accumulation directly inflates annual spending.
The Housing Preservation and Development agency, based on Maiden Lane in Lower Manhattan, processes tens of thousands of building-permit image attachments and inspection photographs each year. Staff there, according to city budget testimony reviewed by this reporter, flagged duplicate-image accumulation as a specific workflow problem as far back as fiscal year 2023. The Human Resources Administration, which operates intake centers including the one on Atlantic Avenue in Brooklyn, similarly handles thousands of scanned identification documents per week — documents that case workers sometimes upload multiple times across separate case files for the same client.
The city's 311 system logged more than 3.3 million service requests in fiscal year 2024, many of them photo-supported complaints about potholes, illegal dumping, and building violations. Each complaint can generate multiple image attachments, and when cases are merged or reassigned, duplicates accumulate in the underlying database without automatic purging. The Department of Information Technology and Telecommunications — NYC DoITT, now operating under the renamed Office of Technology and Innovation — has acknowledged the problem in capital budget filings but has not published a citywide duplicate-image audit with precise figures.
The Fix Is Complicated, and Expensive
Deduplication software exists and is widely deployed in the private sector. Enterprise tools from vendors commonly used in municipal government can scan stored image libraries, identify duplicates using hash-matching and perceptual-similarity algorithms, and flag them for deletion or consolidation. The catch is that city records-retention law, governed by the New York State Archives and the city's own retention schedules, requires human sign-off before mass deletion of government files. That legal requirement slows automated cleanup considerably.
The practical advice for residents interacting with city agencies is straightforward: when submitting documents through NYC.gov portals — whether for affordable housing lotteries administered through the Housing Connect platform or for benefits applications through ACCESS HRA — uploading the same file multiple times does not speed up processing. It adds to the backlog. Agency staff must manually reconcile duplicate submissions, a step that city technology auditors noted in a 2024 internal review adds measurable processing time to individual cases.
City Council members from districts including Sunset Park and the South Bronx, where residents disproportionately rely on HRA and HPD services, have raised constituent complaints about application delays without knowing that data redundancy is a contributing factor behind the scenes. The Office of Technology and Innovation has said a citywide data governance framework is under development, though no public completion date has been announced. Until that framework is in place, the duplicate files — and the costs attached to them — keep accumulating.