The Daily New York

New York news, every day

News

New York's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story

From the Department of City Planning to the Brooklyn Public Library, municipal agencies are sitting on millions of redundant image files that are eating storage budgets and slowing public access to records.

By New York News Desk · Published 4 July 2026, 2:35 pm

3 min read

New York's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story
Photo: Ross, Frederick, 1816-1893 / Public domain (Wikimedia Commons)

New York City's public agencies collectively manage an estimated tens of millions of digital image files — and a significant share of them are exact or near-exact duplicates. The problem is not abstract. Duplicate image data costs real money, consumes finite server capacity, and buries the records that researchers, journalists, and ordinary New Yorkers are trying to find.

The issue has sharpened in 2026 because the city is mid-cycle on a major infrastructure push. The Adams administration's Office of Technology and Innovation, which absorbed the former Department of Information Technology and Telecommunications, has been rolling out a cloud migration program across dozens of agencies. As that migration proceeds, data audits are surfacing redundancy problems that paper-based cataloging never exposed.

What the Numbers Actually Show

Industry benchmarks for large institutional image repositories typically put the duplicate rate somewhere between 20 and 40 percent, depending on how aggressively files are deduped at ingestion. For an archive holding, say, 50 million images — a conservative estimate for a city the size of New York — that translates to between 10 million and 20 million files consuming storage space without adding informational value. At current enterprise cloud storage pricing of roughly $0.02 per gigabyte per month for cold storage tiers, even modest file sizes compound fast across that volume.

The New York Public Library's digital collections, one of the largest publicly accessible municipal repositories in the country, contains more than 900,000 digitized items as of its most recently published figures. The Brooklyn Public Library's digital archive program, based at its Grand Army Plaza headquarters in Prospect Heights, has separately catalogued thousands of historical photograph collections since launching an accelerated digitization effort in 2022. Both institutions have publicly acknowledged the challenge of deduplication as collections merge and donors submit overlapping material.

At the city planning level, the Department of City Planning maintains aerial photography and survey image sets going back decades — layers that get re-ingested each time a new contract is awarded. Its offices at 120 Broadway in the Financial District have been a focal point of the current audit process. Each new aerial survey cycle produces raw files that, before any quality control, frequently duplicate segments captured in prior runs. A 2024 procurement filing related to a cloud services contract referenced image data management as a line-item cost driver, though it did not break out deduplication costs separately.

Why It's Getting Worse Before It Gets Better

The FIFA World Cup arriving in the New York-New Jersey metro area this summer has added an unexpected wrinkle. The city's tourism and events apparatus — including NYC Tourism + Conventions, headquartered at 810 Seventh Avenue in Midtown — has been generating promotional image content at an accelerated pace since late 2025. Marketing campaigns, venue documentation, and press kits produced across multiple contractors have created exactly the kind of multi-origin duplication that archivists flag as the hardest to resolve algorithmically, because the files are not byte-for-byte identical but are perceptually identical — same shot, slightly different compression or color correction.

Perceptual hashing tools, which compare images based on visual fingerprints rather than file checksums, can catch that category of duplicate. The catch is cost: deploying those tools at scale requires either significant staff time or a software licensing investment that smaller agencies struggle to justify in a single budget cycle. The city's fiscal year 2026 technology budget, passed in June, allocated funds to OTI's enterprise data management program but did not itemize a dedicated deduplication line.

The practical upshot for anyone navigating city records: if you submit a Freedom of Information Law request that touches image files — surveillance footage, inspection photos, planning documents — build extra time into your timeline. Agencies responding to FOIL requests on image-heavy matters have increasingly cited data review and file organization as factors in delayed responses. The city's standard FOIL response window is five business days for an acknowledgment, but substantive responses on large file requests routinely run weeks longer. Filing early, being specific about date ranges and file types, and following up through the city's online FOIL portal at records.nyc.gov are the most reliable ways to move the process forward.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily New York

This article was produced by the The Daily New York editorial desk and covers news in New York. See our editorial standards for how we use AI.

The Daily New York brief

The day's New York news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to New York news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily New York

More in News

Enjoyed this story? Get tomorrow's briefing free.