The Daily New York

New York news, every day

News

New York's Duplicate Image Problem: The Numbers Hiding Inside City Hall's Digital Records

A deep dive into the data reveals how thousands of redundant and mislabeled images are clogging municipal databases, costing agencies time and money at the worst possible moment.

By New York News Desk · Published 4 July 2026, 2:43 pm

3 min read

New York City's public agencies collectively manage tens of millions of digital image files — property photographs, permit documentation, infrastructure inspection records, and archival scans — and a growing body of internal audits suggests that a significant share of those files are exact or near-exact duplicates sitting in separate folders, sometimes across different agency servers, eating up storage budgets that city departments can ill afford to waste.

The problem has sharper edges right now because the city is in the middle of a $52 billion, five-year capital program tied partly to MTA subway improvements and congestion pricing infrastructure upgrades, both of which generate enormous volumes of photographic documentation. Every rail inspection, every tolling gantry installation along the approach to the Hugh L. Carey Tunnel, every before-and-after site survey produces image files that flow into departmental records systems — systems that, by multiple independent IT assessments, were not designed to deduplicate automatically.

What the Numbers Actually Show

The scale of the redundancy problem becomes clearest when you look at storage costs. Commercial cloud storage for enterprise clients in the New York metro area runs roughly $0.023 per gigabyte per month at standard tiers, according to publicly available AWS and Google Cloud pricing schedules. A city agency holding 500,000 duplicate image files — each averaging 4 megabytes — is sitting on two terabytes of wasted capacity, translating to a recurring monthly charge of roughly $46, which sounds trivial until you multiply it across 50-plus city agencies and factor in on-premises server infrastructure, staff retrieval time, and backup redundancy costs.

The New York City Department of Information Technology and Telecommunications, known as DoITT and rebranded in recent years under the Mayor's Office of Technology and Innovation, has acknowledged storage optimization as a priority in its fiscal year 2025 and 2026 budget submissions to the City Council. The agency has not published a specific dollar figure for duplicate-image-related waste, but industry benchmarks from the Storage Networking Industry Association suggest that between 25 and 40 percent of unmanaged enterprise image libraries contain duplicate or near-duplicate files at any given time.

At the city's Department of Buildings, which processes thousands of permit applications monthly from neighborhoods including Midtown, Williamsburg, and the South Bronx, inspectors are required to upload photographic evidence of compliance checks through the Buildings Information System. Staff who work with that system have long noted — in public testimony before the City Council's Technology Committee — that the upload interface does not flag when an identical image has already been submitted under a different job number, creating redundancy that slows search and retrieval.

The Cost of Doing Nothing

The timing matters for another reason entirely: the 2026 FIFA World Cup. MetLife Stadium in East Rutherford is the venue for several matches, including the final on July 19, and New York City's own coordinating agencies — including the Mayor's Office of Special Enforcement and the NYPD — have been generating venue-assessment and crowd-management imagery since early spring. Those records feed into shared law enforcement databases. Any deduplication gap in those systems compounds retrieval latency precisely when speed matters most.

The city's current fiscal year 2026 budget, passed by the Council in June, allocated approximately $1.1 billion to the Mayor's Office of Technology and Innovation across all programs. How much of that flows toward data hygiene and storage rationalization versus higher-profile AI and broadband initiatives is not broken out in the summary budget documents posted to the Office of Management and Budget's website.

Agencies looking to address the problem have options that don't require new procurement. Open-source deduplication tools like dupeGuru and rmlint can process large image directories using perceptual hashing algorithms — a technique that catches near-duplicates even when file names differ. For agencies already paying for Microsoft 365 licenses, which the city holds at an enterprise level, SharePoint's built-in versioning and duplicate-detection features are already included in the contract. The first practical step, according to standard IT governance frameworks, is a full inventory audit — something any agency could commission through existing vendor relationships without waiting for a citywide mandate.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily New York

This article was produced by the The Daily New York editorial desk and covers news in New York. See our editorial standards for how we use AI.

The Daily New York brief

The day's New York news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to New York news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily New York

More in News

Enjoyed this story? Get tomorrow's briefing free.