The Daily New York

New York news, every day

News

New York's Digital Archive Problem: The Hidden Toll of Duplicate Images Clogging City Databases

From the Housing Preservation Department's property photo banks to the MTA's surveillance archives, redundant image files are costing New York millions in storage costs and slowing the systems residents depend on.

By New York News Desk · Published 4 July 2026, 2:40 pm

3 min read

New York City government systems collectively store an estimated tens of millions of digital image files across dozens of agencies, and a significant share of those files are exact or near-exact duplicates — a problem that has grown quietly expensive as storage costs compound year over year. The city's Department of Information Technology and Telecommunications, known as DoITT, has flagged redundant data management as a priority in its multi-year modernization roadmap, though the full scope of duplicate image accumulation across agencies remains difficult to quantify from public records alone.

The issue matters right now because New York is in the middle of a generational infrastructure investment cycle. The MTA's current capital program, a $68.4 billion plan running through 2029, includes substantial spending on digital systems, surveillance cameras, and real-time data platforms across the subway network. At the same time, the city's Housing Preservation and Development agency has been digitizing tens of thousands of property inspection records, including photographic evidence of violations. When those workflows lack automated deduplication protocols, redundant files accumulate faster than administrators can manually audit them.

The Storage Math Is Brutal

Commercial cloud storage at enterprise scale typically runs between $0.02 and $0.08 per gigabyte per month, depending on the provider and tier. A single high-resolution image file from a modern smartphone or inspection camera can run between 4 and 12 megabytes. Multiply that across the roughly 1,100 city-owned buildings managed by the Department of Citywide Administrative Services, add the photo documentation requirements under Local Law 97 compliance inspections, and the numbers accumulate fast. Industry benchmarks from data management firms suggest that between 20 and 30 percent of files in large unmanaged digital repositories are duplicates — meaning for every ten images a city agency stores, two or three are redundant copies providing no additional informational value.

The NYPD's evidence management system, which covers precincts from the 1st in Lower Manhattan to the 123rd in Staten Island, processes body-worn camera footage and crime scene photography that generates petabytes of data annually. The department moved to an upgraded digital evidence management platform in recent years, but duplicate file ingestion during legacy data migrations has been a documented challenge for police departments in major cities undertaking similar transitions. New York's specific figures on NYPD duplicate image volume are not publicly reported in detail.

What the City Is Doing — and What It Isn't

The Mayor's Office of Technology and Innovation has promoted the city's Open Data program, hosted at data.cityofnewyork.us, as a transparency mechanism, but the portal's back-end infrastructure is separate from the agency-level document management systems where duplicate image problems actually accumulate. HPD's online property database, accessible to landlords and tenants in neighborhoods like Bushwick and the South Bronx where inspection volumes are highest, pulls from internal systems that predate modern deduplication tooling.

Several city agencies have begun piloting automated deduplication software as part of broader IT modernization contracts. The city's Fiscal Year 2026 adopted budget, approved in June 2025, allocated funds toward technology infrastructure upgrades across multiple agencies, though line-item detail on storage optimization specifically is not broken out in the publicly available budget documents.

For residents and watchdog groups trying to track city performance, the practical consequence is slower database query times and incomplete search results when duplicate records fragment what should be unified datasets. A tenant in Crown Heights searching HPD's violation database for a landlord's history may encounter the same inspection photograph logged under multiple record IDs, complicating their ability to build a clear timeline.

The fix is not glamorous: agencies need to run hash-based deduplication scans on existing repositories, adopt ingestion protocols that flag identical files before they're written to storage, and conduct periodic audits on file libraries that grow continuously. Several vendors active in the New York municipal contracting market offer these services under existing city procurement frameworks. The longer agencies wait, the larger the repositories grow, and the more expensive the cleanup becomes — both in direct storage costs and in the staff hours required to sort legitimate records from redundant noise.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily New York

This article was produced by the The Daily New York editorial desk and covers news in New York. See our editorial standards for how we use AI.

The Daily New York brief

The day's New York news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to New York news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily New York

More in News

Enjoyed this story? Get tomorrow's briefing free.