The Daily New York

New York news, every day

News

New York Officials Battle Years of Duplicate Files Flooding Public Records System

A backlog of redundant digital files across city agencies has quietly compounded costs and slowed public access to documents for years.

By New York News Desk · Published 4 July 2026, 2:36 pm

3 min read

New York Officials Battle Years of Duplicate Files Flooding Public Records System
Photo: Finegan, Thomas E. (Thomas Edward), 1866-1932 / Public domain (Wikimedia Commons)

Tens of thousands of duplicate image files have accumulated inside New York City's municipal records infrastructure, a problem that archivists, IT administrators, and open-government advocates say has been building since at least the early 2010s, when city agencies began rapid, often uncoordinated migrations to digital document storage. The redundancy has inflated storage costs, slowed database searches, and in some cases made it harder for the public to retrieve accurate, current records through portals like the city's OpenRecords system.

The issue matters particularly now. With the FIFA World Cup bringing an estimated 1.5 million additional visitors to the New York metro area this summer — many of them relying on city-managed venue permits, transit maps, and public safety communications — the pressure on the city's digital infrastructure has sharpened. Simultaneously, the Adams administration has been pushing a broader digitization initiative across agencies, which has surfaced just how disorganized the underlying file systems have become.

How the Duplication Problem Took Root

The roots run back to a period when individual city agencies essentially built their own digital filing systems with little central oversight. The Department of Buildings, the Department of City Planning, and the Mayor's Office of Management and Budget each developed parallel repositories. When the city's 311 system expanded its document intake functions around 2014 and 2015, scanned images of permit applications, inspection reports, and community board filings were often uploaded multiple times — once by the originating agency and again when forwarded through interdepartmental workflows.

The Municipal Archives on Chambers Street, which maintains the official historical record for the city, began flagging the redundancy problem in internal reviews by 2018. Staff there identified a pattern: when agencies upgraded scanning hardware or switched document management vendors, old image batches were frequently re-ingested alongside new ones rather than replaced. A single building inspection photograph could exist under four or five separate file identifiers, each consuming server space and appearing as a distinct result in public search queries.

Community groups in neighborhoods like Sunset Park and the South Bronx, where residents frequently pull city records related to landlord compliance and zoning disputes, have complained for years that OpenRecords searches return pages of visually identical documents, making it difficult to identify which version is authoritative. The Legal Aid Society, which uses city records extensively for housing litigation, has raised the issue in correspondence with the Department of Records and Information Services, according to advocates familiar with the exchanges.

Where Things Stand Heading Into 2026

The Department of Records and Information Services, headquartered at 31 Chambers Street in Lower Manhattan, has been working since late 2024 on a deduplication protocol — a technical process that uses hash-matching algorithms to identify identical image files and consolidate them under a single record identifier. The project is part of a broader contract with the city's Department of Information Technology and Telecommunications, which manages central server infrastructure across five boroughs.

The scale is significant. City budget documents from Fiscal Year 2026 allocated funds toward cloud storage consolidation across agencies, though the specific line items covering the deduplication effort fall within broader IT modernization appropriations. Independent estimates from government technology consultants have placed the cost of carrying redundant municipal image files — in storage fees alone — in the range of several million dollars annually for a city the size of New York, though the city has not published a specific figure for its own redundancy costs.

For residents and researchers, the practical upshot is that searches through the city's online portals should gradually return cleaner, more navigable results as the deduplication work proceeds. Anyone pulling records today from the Department of Buildings' BIS portal or the City Planning Commission's ZOLA mapping tool may still encounter duplicate images, particularly for older filings from before 2020. The best current workaround, according to open-government groups like Reinvent Albany, is to note file creation dates and prioritize the most recent version of any scanned document when the content appears identical. The deduplication project has no publicly announced completion date, but city technology officials have described the work as ongoing through at least the end of calendar year 2026.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily New York

This article was produced by the The Daily New York editorial desk and covers news in New York. See our editorial standards for how we use AI.

The Daily New York brief

The day's New York news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to New York news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily New York

More in News

Enjoyed this story? Get tomorrow's briefing free.