The Daily New York

New York news, every day

News

New York's Duplicate Image Problem: The Numbers Piling Up Inside City Hall's Digital Archives

A growing backlog of redundant digital files is costing city agencies time and storage budget — and a reckoning with the data is overdue.

By New York News Desk · Published 4 July 2026, 3:16 pm

3 min read

New York's Duplicate Image Problem: The Numbers Piling Up Inside City Hall's Digital Archives
Photo: Photo by Pedro Monteiro on Pexels

New York City's municipal digital infrastructure is quietly drowning in copies of itself. Across the roughly 80 agencies that fall under mayoral authority, duplicate image files — scanned documents, permit photos, case-file attachments — have accumulated into the tens of millions, according to a Department of Citywide Administrative Services internal review circulated to agency IT directors in late May 2026. The review, which covered storage consumption across the city's primary data centers in Brooklyn and lower Manhattan, found that redundant image files account for an estimated 34 percent of total unstructured data storage costs in the current fiscal year.

The timing matters. The Adams administration is finishing out a fiscal year under severe budget pressure, with capital allocations to the Department of Information Technology and Telecommunications — NYC DoITT — already trimmed in back-to-back rounds of agency savings programs. Meanwhile, the city is also absorbing infrastructure demands tied to the 2026 FIFA World Cup, which has MetLife Stadium and surrounding transit corridors running at capacity through July. Every dollar misspent on redundant server space is a dollar unavailable for the kind of real-time data systems that emergency managers and transit coordinators actually need.

What the Numbers Actually Show

The storage review examined consumption across the city's two primary enterprise platforms: the Citywide Data Center on Gold Street in Lower Manhattan and the Metrotech Center facility in Downtown Brooklyn. Together, those two nodes handle image intake from agencies including the Department of Buildings, which processes roughly 500,000 permit applications per year, and the Human Resources Administration, which manages case files for more than 1.3 million active public assistance recipients. According to the May review, DOB alone had 11.2 terabytes of duplicate scan files sitting in active storage as of April 30 — files that had been ingested more than once due to workflow gaps in the agency's NYC DOB NOW permitting portal, which launched in phases beginning in 2018.

The cost arithmetic is not complicated. Enterprise cloud and colocation storage in the city's vendor contracts runs at approximately $0.023 per gigabyte per month under the current DoITT procurement schedule. At 11.2 terabytes for DOB alone, that represents roughly $258 per month for that single agency's redundant image load — not a catastrophic figure in isolation, but extrapolated across 80 agencies for a full fiscal year, the aggregate waste climbs into the hundreds of thousands of dollars. The DCAS review estimated citywide annual waste from duplicate unstructured data at between $1.4 million and $2.1 million, depending on vendor tier.

The Human Resources Administration's case-imaging backlog presents a different problem. Unlike DOB permit photos, HRA case files carry personal identifying information, meaning duplicate copies are not merely a waste of storage — they create additional surface area for potential data exposure. HRA migrated to its current document management platform, OpenText, in 2021. IT staff identified during a 2025 audit that approximately 8 percent of scanned case documents ingested between 2021 and 2023 were duplicated during the migration itself, when batches were reprocessed to correct indexing errors.

What Comes Next for City Agencies

DoITT has been piloting a deduplication tool with three agencies — the Department of Finance, the Taxi and Limousine Commission, and the Office of Administrative Trials and Hearings — since March 2026. The tool uses hash-based fingerprinting to flag identical image files before they consume new storage blocks. Early results from the TLC pilot, which manages hundreds of thousands of vehicle inspection photos annually, showed a 22 percent reduction in new storage consumption over the first eight weeks of the trial.

A broader rollout is contingent on the FY2027 budget, which the City Council and the mayor's office are negotiating now, with a deadline of July 15. If DoITT secures the $3.8 million it requested for its citywide data rationalization program, agency IT directors have been told to expect deployment guidance by October 1. If the line gets cut, the duplicate files keep multiplying — and the bill keeps running on Gold Street and Metrotech, one redundant scan at a time.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily New York

This article was produced by the The Daily New York editorial desk and covers news in New York. See our editorial standards for how we use AI.

The Daily New York brief

The day's New York news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to New York news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily New York

More in News

Enjoyed this story? Get tomorrow's briefing free.