The Daily New York

New York news, every day

News

New York's Duplicate Image Problem: The Numbers Hiding in Plain Sight

From city agency databases to MTA infrastructure filings, redundant digital imagery is costing New York millions — and nobody's counting carefully enough.

By New York News Desk · Published 4 July 2026, 3:06 pm

3 min read

New York's Duplicate Image Problem: The Numbers Hiding in Plain Sight
Photo: Photo by Holger J. Bub on Pexels

New York City's municipal data infrastructure holds tens of millions of digital image files, and a significant share of them are exact or near-exact duplicates — stored, backed up, and billed for multiple times over. That is the practical reality behind what technologists call the duplicate image problem, and in a city that spent roughly $1.5 billion on information technology in fiscal year 2025, the redundancy carries a real dollar cost that budget watchdogs say remains poorly quantified.

The issue lands with particular urgency right now. The Adams administration is mid-cycle on a sweeping digital modernization push, the MTA is processing unprecedented volumes of image data from its 472-station camera network, and the city's Department of City Planning is digitizing decades of building-permit photographs ahead of a major zoning overhaul. Each of those streams generates duplication. Each duplication means storage, and storage in commercial cloud environments — the kind the city increasingly relies on — is billed by the gigabyte.

What the Data Actually Shows

Cloud storage pricing for enterprise municipal contracts typically runs between $0.02 and $0.05 per gigabyte per month, depending on access tier and provider. At those rates, even a modest backlog of 500 terabytes of redundant image files translates to somewhere between $120,000 and $300,000 in unnecessary annual spend. Multiply that across a city apparatus the size of New York's — which operates more than 40 agencies with independent IT budgets — and the exposure becomes substantial.

The MTA's capital program offers a concrete illustration of the scale involved. The authority's ongoing $51.5 billion 2020–2024 capital plan includes camera and sensor upgrades across the entire subway network, with major installations already completed at hubs including Grand Central–42nd Street, Atlantic Terminal in Brooklyn, and the Fulton Center in Lower Manhattan. Each camera generates continuous image streams. Without automated deduplication protocols baked into the ingestion pipeline, the volume of stored redundant frames can balloon within months of deployment.

The Department of Records and Information Services, which manages the city's archival holdings at its building on John Chambers Street in Lower Manhattan, has publicly acknowledged a digitization backlog stretching back to the 1970s. Physical-to-digital conversion projects routinely produce duplicate scans — the same document photographed twice during a single digitization session, then backed up again across multiple servers. Industry benchmarks from digital archiving projects in comparably sized municipalities suggest duplicate rates between 12 and 18 percent of total digitized image volume are common without active deduplication tooling in place.

Where the Problem Compounds

The FIFA World Cup is making things worse, at least in the short term. MetLife Stadium in East Rutherford is hosting multiple matches this summer, and the city's emergency management and transportation agencies have dramatically expanded their real-time camera monitoring infrastructure in coordination with the NYPD. High-volume event surveillance generates image data at rates that can overwhelm even well-maintained deduplication systems. The NYPD's Domain Awareness System, which aggregates feeds from thousands of cameras across all five boroughs, was already processing petabyte-scale data volumes before the tournament added pressure.

The city's Office of Technology and Innovation, which absorbed the former Department of Information Technology and Telecommunications in 2022, has been working on a unified data governance framework. That framework, still in phased rollout as of mid-2026, includes provisions for automated deduplication — but implementation timelines vary by agency and the framework does not yet cover all legacy systems still operating on on-premises servers in buildings including the Brooklyn Municipal Building on Joralemon Street.

For agencies looking to get ahead of the cost curve before the next budget cycle, the practical steps are straightforward: audit existing storage inventories using hash-based deduplication tools, establish ingestion-time deduplication for new camera feeds, and negotiate storage contracts with deduplication credits factored in. The fiscal year 2027 budget process begins in earnest this fall. Agencies that can document storage savings will have a concrete argument for redirecting those funds — whether toward housing tech platforms, subway accessibility projects, or the perpetually underfunded digitization work still waiting in warehouses across the city.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily New York

This article was produced by the The Daily New York editorial desk and covers news in New York. See our editorial standards for how we use AI.

The Daily New York brief

The day's New York news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to New York news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily New York

More in News

Enjoyed this story? Get tomorrow's briefing free.