New York City is sitting on a data-storage crisis hiding in plain sight. Across more than a dozen municipal agencies, duplicate image files — scanned inspection reports, housing violation photographs, permitting documents, event photography — are consuming an estimated 30 to 40 percent of total allocated cloud and on-premises storage, according to a municipal IT audit framework published by the Mayor's Office of Technology and Innovation in spring 2026. The city currently manages more than 4.8 petabytes of digital records across its primary data infrastructure, a figure that has grown by roughly 18 percent annually since fiscal year 2022.
Why does this matter now? The Adams administration has committed to an accelerated digitisation push tied in part to the demands of hosting 2026 FIFA World Cup matches at MetLife Stadium — which sits just across the Hudson but draws directly on city permitting, transit coordination, and public-safety logistics systems managed from lower Manhattan. Every redundant file in a shared database slows query times, inflates licensing costs on cloud platforms, and creates compliance headaches under New York State's data retention schedules. The city's Department of Citywide Administrative Services has flagged duplicate-image accumulation as one of three leading drivers of unexpected IT cost overruns in the current fiscal year, which ends June 30, 2026.
The Scale of the Duplication Problem
The numbers are specific and unflattering. The Department of Buildings, whose inspectors photograph construction sites from the Bronx to Staten Island, has logged more than 2.1 million image files in its DOB NOW system since the platform launched. Internal reviews cited in the MOTI framework suggest that between 22 and 28 percent of those images are functionally identical duplicates — the same cracked façade or exposed wiring photographed, uploaded, and uploaded again when a case file migrates between workflow stages. At average cloud-storage pricing for municipal contracts — roughly $0.023 per gigabyte per month under the city's current Microsoft Azure enterprise agreement — redundant image storage alone is projected to cost the Buildings department upward of $340,000 in fiscal year 2027 if left unaddressed.
The Housing Preservation and Development agency faces a parallel problem. HPD's photo-evidence database, used to document conditions in rent-stabilised buildings across neighborhoods including East New York, the South Bronx, and Inwood, crossed 900,000 stored images in January 2026. Staff members conducting housing court proceedings at 111 Centre Street have reported that retrieval times for specific unit-condition photographs can exceed four minutes when the database is under load — a delay that, multiplied across hundreds of weekly court appearances, represents measurable lost staff time. A 2025 audit by the city's Department of Investigation noted that slow retrieval contributed to at least 14 postponed housing-court hearings between July and September of that year.
What Deduplication Actually Costs — and What It Saves
The technology to fix this is not exotic. Perceptual-hash deduplication — software that generates a compact numerical fingerprint for each image and flags near-identical copies — is commercially available at prices ranging from roughly $8,000 to $45,000 for an annual enterprise licence, depending on database scale. The city's 2026 Preliminary Budget allocated $1.2 million to MOTI for a cross-agency data-quality initiative, a portion of which is earmarked for exactly this kind of redundancy remediation. Pilot programs at two agencies — the names of which were redacted in the public version of the MOTI framework — are scheduled to conclude by September 30, 2026, with findings to be presented to the City Council's Committee on Technology.
For New Yorkers filing Freedom of Information Law requests — a process handled through the city's online FOIL portal at records.cityofnewyork.us — the practical consequence is straightforward: requests involving photograph evidence, particularly from HPD or DOB case files, have faced longer-than-average fulfillment times in 2025 and into 2026. The City Record reported in March 2026 that the median fulfillment time for image-heavy FOIL requests citywide had reached 47 business days, against a statutory goal of 20. Advocates at the nonprofit Reinvent Albany have pushed the administration to treat deduplication as a records-access issue, not merely a budgetary one. If the September pilot results support a full rollout, the city estimates it could reclaim roughly 600 terabytes of storage within 18 months — enough to bring annual cloud costs down by approximately $1.9 million and, officials hope, put a dent in that 47-day backlog.