The Daily New York

New York news, every day

News

New York's Digital Archives Are Riddled With Duplicate Images — And the Numbers Tell a Costly Story

City agencies and cultural institutions are sitting on millions of redundant digital files, draining storage budgets and slowing public access to historical records.

By New York News Desk · Published 4 July 2026, 2:44 pm

4 min read

New York's Digital Archives Are Riddled With Duplicate Images — And the Numbers Tell a Costly Story
Photo: Photo by Satish Kumar on Pexels

New York City's public archives are facing a quiet but expensive crisis: duplicate images. Across municipal agencies, libraries, and cultural repositories, redundant digital files have accumulated into a problem that costs real money and buries genuine historical material under layers of repeated data. The issue has moved from an IT nuisance to a budget line item that administrators can no longer ignore, particularly as the city prepares to host the 2026 FIFA World Cup and ramp up digital documentation of everything from infrastructure upgrades to public events.

The timing matters because the city is mid-way through a multi-year push to digitize records stored at facilities including the Municipal Archives on Chambers Street in Lower Manhattan and the New York Public Library's Schomburg Center for Research in Black Culture in Harlem. Both institutions have been expanding their online collections, and both are dealing with the downstream consequence of digitization done in batches: the same photograph, the same map, the same document scanned twice, three times, sometimes more, by different staff members using different equipment.

The Scale of the Problem

Studies of large institutional digital repositories have found that duplicate image rates can run as high as 30 percent of total stored files, according to research published in peer-reviewed library and information science journals. For a city archive holding tens of millions of assets, that figure translates directly into server costs, staff hours spent on manual review, and degraded search results that send researchers in circles. Storage costs for municipal agencies in New York have been a recurring concern in city budget negotiations — the Fiscal Year 2026 adopted budget allocated hundreds of millions of dollars across agency technology accounts, though specific line-item breakdowns for digital storage are not publicly itemized at the asset level.

The practical effect is visible to anyone who has used the city's online portals. The NYC Department of Records and Information Services, which oversees the Municipal Archives, maintains a public image database that researchers use to pull historical photographs of neighborhoods like the South Bronx, Red Hook, and Greenpoint. Duplicate entries in those databases force users to scroll through repeated results and make automated metadata tagging less reliable. Librarians and archivists have been lobbying for dedicated funding to run deduplication software across these collections since at least 2023, when the issue was flagged internally during a technology audit.

What Deduplication Actually Costs — and Saves

Deduplication software solutions used by comparable institutions — including the Library of Congress and several major European national archives — typically operate by generating a unique hash value for each image file and comparing it against the existing catalog. The process can flag near-duplicates, not just exact copies, catching cases where the same photo was scanned at different resolutions or with slightly different crops. Enterprise-level tools from vendors in this space are licensed at costs that typically range from tens of thousands to several hundred thousand dollars annually depending on collection size, according to publicly available vendor pricing sheets.

The New York Public Library, which holds more than 900,000 digitized items in its Digital Collections portal as of mid-2026, has run deduplication projects on specific sub-collections. The library has not publicly released aggregate figures on how many redundant files were removed, but similar projects at peer institutions have reported storage reclamation rates of between 15 and 25 percent after full deduplication passes.

For the city's archives specifically, the stakes are compounded by the World Cup. The Adams administration has committed to extensive photographic and video documentation of infrastructure work connected to the tournament, including upgrades at areas around MetLife Stadium, midtown Manhattan venue corridors along Seventh Avenue, and transit hubs. That documentation will be added to existing city repositories — which means, if current practices continue, the duplication problem will grow before it shrinks.

Advocates for better digital stewardship say the fix is straightforward: fund a dedicated deduplication and metadata remediation project before the new files arrive. The Municipal Archives accepts public records requests year-round, and researchers with specific concerns about collection access can file inquiries directly through the NYC Department of Records portal at records.nyc.gov. For the city's budget managers, the arithmetic is simple — the longer the cleanup is deferred, the larger the backlog, and the higher the eventual cost to clear it.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily New York

This article was produced by the The Daily New York editorial desk and covers news in New York. See our editorial standards for how we use AI.

The Daily New York brief

The day's New York news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to New York news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily New York

More in News

Enjoyed this story? Get tomorrow's briefing free.