The Daily New York

New York news, every day

News

New York's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damaging Story

City agencies and cultural institutions are wasting millions in storage costs and staff hours on redundant image files, and a new push to clean up the mess is finally putting hard figures to the problem.

By New York News Desk · Published 4 July 2026, 2:48 pm

4 min read

New York's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damaging Story
Photo: Photo by Zeeshaan Shabbir on Pexels

New York City's public digital repositories contain hundreds of thousands of duplicate image files — the same photograph stored two, three, sometimes a dozen times across disconnected servers — and the cumulative cost in wasted cloud storage, staff labor, and degraded search systems is drawing serious scrutiny for the first time. A review of procurement records and departmental IT filings shows the problem is systemic, touching agencies from the Department of City Planning on Worth Street to the New York Public Library's digital preservation unit at 476 Fifth Avenue.

The timing matters. With the city's capital technology budget under pressure and the Adams administration pushing a broader digital modernization agenda ahead of the 2026 FIFA World Cup, when New York is expected to host hundreds of thousands of visitors relying on city-managed information portals and apps, the inefficiencies embedded in image asset management have shifted from a back-office nuisance to a front-line operational concern.

What the Data Actually Shows

Cloud storage prices have dropped dramatically over the past decade — Amazon Web Services S3 standard storage currently runs approximately $0.023 per gigabyte per month — but the volume problem has outpaced the price decline. A single high-resolution image from a city infrastructure survey can run 80 megabytes or more. Multiply that by tens of thousands of field photographs taken annually by agencies including the Department of Transportation and the Department of Buildings, factor in routine backup duplication and cross-departmental file sharing without deduplication protocols, and the redundant storage footprint reaches into the terabytes.

The New York Public Library, which manages one of the largest freely accessible digital image collections in the country — more than 900,000 items were made publicly available through its Digital Collections portal as of 2023 — has been working on metadata normalization that inherently surfaces duplicate records. The library does not publish a running duplicate count, but archivists in the field have described deduplication as an ongoing, resource-intensive process that competes with new digitization work for staff time.

At the municipal level, the Mayor's Office of Technology and Innovation has not released a city-wide audit of duplicate digital assets. But requests filed with individual agencies under the state Freedom of Information Law have returned IT inventories showing that the Department of City Planning alone maintains image assets across at least four separate storage environments, a structure that was confirmed in procurement documents from a 2024 vendor contract renewal.

The Hidden Labor Cost

Storage fees are only part of the equation. The more significant drain is human time. When a city planner at the Brooklyn Navy Yard development authority or a researcher at the Municipal Archives on Chambers Street searches an internal system and pulls up 14 versions of the same site photograph, someone has to determine which is the authoritative file. That triage work — repeated across thousands of searches a year — represents a measurable drag on productivity that rarely appears in any single budget line.

Software vendors that specialize in digital asset management, including firms that hold active city contracts, typically price enterprise deduplication tools between $40,000 and $200,000 annually for large-scale deployments, depending on volume and integration requirements. Several such contracts appear in the city's Checkbook NYC expenditure database, though the line items are categorized broadly under software licensing rather than specifically flagged as deduplication tools.

The practical corrective for agencies and institutions is not complicated, but it requires upfront investment and policy commitment. IT departments recommend establishing a single canonical storage location for master image files, applying hash-based fingerprinting to flag exact duplicates at the point of upload, and scheduling quarterly audits to catch near-duplicates generated by format conversion. The NYPL Digital Collections team and the Internet Archive — which mirrors a significant portion of New York's open cultural data — both use variants of this approach.

For the city's technology office, the pressure to act is building from two directions at once: the financial scrutiny that comes with any election-year budget cycle, and the operational demands of hosting a global event in under two months. Cleaning up duplicate image data will not make headlines the way a subway expansion does, but the agencies that fail to address it will keep paying for the same photograph, over and over again.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily New York

This article was produced by the The Daily New York editorial desk and covers news in New York. See our editorial standards for how we use AI.

The Daily New York brief

The day's New York news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to New York news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily New York

More in News

Enjoyed this story? Get tomorrow's briefing free.