The Daily New York

New York news, every day

News

New York's Digital Archives Are Drowning in Duplicate Images — And the Numbers Are Staggering

City agencies and cultural institutions are sitting on millions of redundant digital files, burning through storage budgets and slowing public access to records.

By New York News Desk · Published 4 July 2026, 3:16 pm

4 min read

New York's Digital Archives Are Drowning in Duplicate Images — And the Numbers Are Staggering
Photo: Photo by jimmy teoh on Pexels

New York City's public digital repositories contain an estimated tens of millions of duplicate image files — photographs, scanned documents, architectural drawings — stored redundantly across agency servers, costing taxpayers in wasted cloud storage fees and slowing the kind of rapid public access that institutions promised when they digitized their collections in the first place. The scale of the problem has come into sharper focus this year as several major city institutions have begun formal deduplication audits.

The timing matters because 2026 has brought extraordinary pressure on city IT infrastructure. The FIFA World Cup, with matches played at MetLife Stadium in East Rutherford and fan-zone events concentrated around Hudson Yards and Times Square, pushed the city's Department of Information Technology and Telecommunications to accelerate reviews of data storage contracts that were quietly ballooning in cost. Meanwhile, the Metropolitan Transportation Authority's ongoing capital investment program has generated thousands of new engineering drawings and survey images that archivists say are being ingested into shared drives with little deduplication discipline.

What the Numbers Actually Show

Cloud storage is not cheap at institutional scale. Amazon Web Services and Microsoft Azure both price standard object storage at roughly $0.023 per gigabyte per month for the first 50 terabytes — a figure that compounds fast when a single high-resolution scan of a city planning document can run 80 megabytes, and that document has been uploaded by three separate borough offices. The New York Public Library's digital collections, housed partly through its Schwarzman Building on Fifth Avenue and managed through its digital-preservation partnerships, publicly disclosed in its 2024 annual report that its digital assets exceeded 100 terabytes. Library officials have noted that deduplication is an ongoing operational challenge across large-scale collections, though the library has not published a specific figure for redundant files.

The city's Department of Records and Information Services, which manages the Municipal Archives on Chambers Street in lower Manhattan, has been working since 2022 under a multi-year digitization contract to make historical photographs and administrative records accessible online. Archivists familiar with large municipal digitization projects — not specific to New York — have noted that duplication rates in batch-scan workflows typically run between 15 and 30 percent of total ingested files, meaning a collection of one million images could contain 150,000 to 300,000 redundant copies consuming storage without adding informational value.

For city agencies operating under the Adams administration's cost-cutting directives, that redundancy translates directly to budget waste. The city's overall technology spending runs into the billions annually across agencies; even shaving a fraction of a percent off unnecessary storage costs would free up funds that administrators say are needed elsewhere, from cybersecurity upgrades to the 311 system's backend improvements.

What Institutions Are Doing About It

The practical mechanics of duplicate-image replacement involve more than simply deleting identical files. Institutions use perceptual hashing algorithms — software tools that generate a fingerprint for each image based on visual content rather than file name — to catch near-duplicates: the same photograph saved as both a JPEG and a TIFF, or the same scan uploaded at two different resolutions. The Brooklyn Museum, which has made its open-access digital collection a model for the sector, publishes its collection data openly and has invested in metadata standardization that makes deduplication easier. Columbia University's Avery Index to Architectural Literature, based at Avery Hall on the Morningside Heights campus, faces similar issues managing image-linked records across decades of contributions.

For smaller organizations — community archives in the Bronx, local historical societies in Staten Island's St. George neighborhood — the challenge is more acute because they lack dedicated digital-preservation staff. Grants through the New York State Council on the Arts and the Institute of Museum and Library Services have historically funded digitization but rarely include line items for ongoing data hygiene work like deduplication.

Organizations grappling with this problem have a concrete first step available: the Library of Congress publishes free guidance on digital preservation workflows, including deduplication best practices, through its Digital Preservation resources portal. For city agencies, the path runs through DoITT's procurement office, where updated cloud storage contracts are due for review before the end of fiscal year 2027. Getting deduplication requirements written into those contracts before renewal may be the single most cost-effective move available.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily New York

This article was produced by the The Daily New York editorial desk and covers news in New York. See our editorial standards for how we use AI.

The Daily New York brief

The day's New York news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to New York news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily New York

More in News

Enjoyed this story? Get tomorrow's briefing free.