New York City's public agencies collectively manage an estimated tens of millions of digital image files — and a significant share of them are exact or near-exact duplicates. The problem is not abstract. Duplicate image data costs real money, consumes finite server capacity, and buries the records that researchers, journalists, and ordinary New Yorkers are trying to find.
The issue has sharpened in 2026 because the city is mid-cycle on a major infrastructure push. The Adams administration's Office of Technology and Innovation, which absorbed the former Department of Information Technology and Telecommunications, has been rolling out a cloud migration program across dozens of agencies. As that migration proceeds, data audits are surfacing redundancy problems that paper-based cataloging never exposed.
What the Numbers Actually Show
Industry benchmarks for large institutional image repositories typically put the duplicate rate somewhere between 20 and 40 percent, depending on how aggressively files are deduped at ingestion. For an archive holding, say, 50 million images — a conservative estimate for a city the size of New York — that translates to between 10 million and 20 million files consuming storage space without adding informational value. At current enterprise cloud storage pricing of roughly $0.02 per gigabyte per month for cold storage tiers, even modest file sizes compound fast across that volume.
The New York Public Library's digital collections, one of the largest publicly accessible municipal repositories in the country, contains more than 900,000 digitized items as of its most recently published figures. The Brooklyn Public Library's digital archive program, based at its Grand Army Plaza headquarters in Prospect Heights, has separately catalogued thousands of historical photograph collections since launching an accelerated digitization effort in 2022. Both institutions have publicly acknowledged the challenge of deduplication as collections merge and donors submit overlapping material.
At the city planning level, the Department of City Planning maintains aerial photography and survey image sets going back decades — layers that get re-ingested each time a new contract is awarded. Its offices at 120 Broadway in the Financial District have been a focal point of the current audit process. Each new aerial survey cycle produces raw files that, before any quality control, frequently duplicate segments captured in prior runs. A 2024 procurement filing related to a cloud services contract referenced image data management as a line-item cost driver, though it did not break out deduplication costs separately.
Why It's Getting Worse Before It Gets Better
The FIFA World Cup arriving in the New York-New Jersey metro area this summer has added an unexpected wrinkle. The city's tourism and events apparatus — including NYC Tourism + Conventions, headquartered at 810 Seventh Avenue in Midtown — has been generating promotional image content at an accelerated pace since late 2025. Marketing campaigns, venue documentation, and press kits produced across multiple contractors have created exactly the kind of multi-origin duplication that archivists flag as the hardest to resolve algorithmically, because the files are not byte-for-byte identical but are perceptually identical — same shot, slightly different compression or color correction.
Perceptual hashing tools, which compare images based on visual fingerprints rather than file checksums, can catch that category of duplicate. The catch is cost: deploying those tools at scale requires either significant staff time or a software licensing investment that smaller agencies struggle to justify in a single budget cycle. The city's fiscal year 2026 technology budget, passed in June, allocated funds to OTI's enterprise data management program but did not itemize a dedicated deduplication line.
The practical upshot for anyone navigating city records: if you submit a Freedom of Information Law request that touches image files — surveillance footage, inspection photos, planning documents — build extra time into your timeline. Agencies responding to FOIL requests on image-heavy matters have increasingly cited data review and file organization as factors in delayed responses. The city's standard FOIL response window is five business days for an acknowledgment, but substantive responses on large file requests routinely run weeks longer. Filing early, being specific about date ranges and file types, and following up through the city's online FOIL portal at records.nyc.gov are the most reliable ways to move the process forward.