The Daily New York

New York news, every day

News

New York City's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering

A quiet data crisis inside city agencies is wasting storage dollars and slowing down the public records systems New Yorkers rely on every day.

By New York News Desk · Published 4 July 2026, 2:58 pm

3 min read

New York City's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering
Photo: Photo by Mihar kathiriya on Pexels

New York City's municipal agencies collectively store tens of millions of digital images across their servers — property photos, permit documentation, event records, infrastructure inspections — and a growing share of that data is redundant. Duplicate image files have quietly ballooned into a measurable fiscal and operational problem, one that IT administrators at the Department of City Planning and the Department of Buildings have flagged in internal infrastructure reviews over the past two fiscal years.

The timing matters. With the city's capital budget under pressure from MTA subway expansion commitments and the infrastructure demands of hosting FIFA World Cup matches at MetLife Stadium this summer, every dollar spent on unnecessary cloud storage is a dollar not going elsewhere. For agencies that have spent years digitizing paper records to serve New Yorkers faster, bloated image repositories threaten to undermine the entire investment.

What the Data Actually Shows

Storage costs for enterprise-grade cloud services used by large municipal governments typically run between $0.02 and $0.08 per gigabyte per month, depending on contract tier and redundancy requirements. For a city the size of New York — which manages data systems across more than 50 mayoral agencies — even a modest 15 percent duplication rate across image archives can translate into hundreds of thousands of dollars in wasted annual expenditure. The Department of Buildings alone processes roughly 500,000 permit applications per year, each of which can generate multiple photo attachments uploaded by contractors, inspectors, and applicants, sometimes recording identical site conditions from different logins on the same day.

The NYC Department of Information Technology and Telecommunications, known as DoITT and now operating under the rebranded NYC Office of Technology and Innovation, has pushed deduplication as part of its broader citywide data modernization agenda since at least fiscal year 2024. The agency's mandate covers cloud governance for systems that include the 311 service portal, property records accessible through the Department of Finance's ACRIS database on Worth Street in Lower Manhattan, and inspection logs tied to addresses across all five boroughs. ACRIS alone handles document images for more than a million recorded transactions annually.

The problem compounds at the borough level. The Brooklyn Public Library's digital collections team and the New York City Municipal Archives, located at 31 Chambers Street in Civic Center, have both undertaken deduplication audits in recent years as part of preservation grants administered through the Metropolitan New York Library Council. Archivists working with photographic collections from the La Guardia and Wagner administrations discovered duplicate scan batches that in some cases tripled the storage footprint of individual collections, according to documentation submitted as part of past grant applications.

The Fix — and Why It's Slower Than It Should Be

Automated deduplication tools have existed for years, but deploying them across a fragmented municipal IT landscape is not straightforward. New York City operates on a patchwork of legacy systems and newer cloud platforms, meaning a duplicate image in one agency's database may carry a different file name, metadata tag, or compression format than its twin in another agency's system — making hash-based detection, the most reliable technical method, harder to run at scale without custom configuration work.

The Office of Technology and Innovation has piloted deduplication scripts within the city's Azure-based cloud environment, with initial work focused on high-volume agencies. A full rollout requires cooperation from agency chief information officers, budget approval through the city's capital process, and — critically — staff time that smaller agencies often cannot spare. At the Department of Transportation, which manages image records for roughly 6,300 miles of streets, the resources available for data hygiene projects compete directly with operational priorities.

For New Yorkers who use city digital services — pulling property documents through ACRIS, filing 311 complaints with photo attachments, or searching the Municipal Archives for historical records — the practical upshot is slower search returns and occasional system lag during peak demand. The concrete fix is unglamorous: fund the deduplication audits, allocate the staff hours, and set enforceable upload standards for contractor submissions at agencies like Buildings and Transportation. The city's own data governance roadmap calls for exactly that work. Getting it done before the next budget cycle closes would be a start.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily New York

This article was produced by the The Daily New York editorial desk and covers news in New York. See our editorial standards for how we use AI.

The Daily New York brief

The day's New York news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to New York news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily New York

More in News

Enjoyed this story? Get tomorrow's briefing free.