New York City's public records infrastructure is riddled with redundant digital images — the same photograph catalogued under different file names, the same scanned document stored in three separate agency databases simultaneously — a problem that cost the city's Department of Records and Information Services (DORIS) measurable staff hours and storage budget every year until a remediation program quietly launched in late 2024. The scope of the duplication problem, long acknowledged internally but rarely discussed publicly, shaped how the city now handles everything from housing inspection photographs filed through the Department of Buildings on Worth Street to crime scene evidence logs managed by the NYPD's Records Management Division in Jamaica, Queens.
The issue matters now because New York is deep into a digital-infrastructure overhaul ahead of the 2026 FIFA World Cup, with city agencies under pressure to make public-facing databases faster and more reliable for the millions of visitors and journalists who will query them between June and July. Duplicate image files slow query response times, inflate cloud storage costs, and create legal exposure when two versions of the same document carry different metadata — a problem that surfaced in at least one housing court proceeding in the Bronx in 2023, according to court filings reviewed at the time by local legal advocates.
A Problem Built Into the Original Contracts
The duplication issue has roots in the Bloomberg-era push to digitize city records beginning around 2003. Individual agencies contracted separately with vendors, often without a unified file-naming convention or a central deduplication layer. The Department of City Planning, headquartered at 120 Broadway, built its own image archive. The Landmarks Preservation Commission, based on Vesey Street in Lower Manhattan, built another. Neither talked to the other in any automated way. By the time the de Blasio administration attempted a partial consolidation through the NYC Open Data initiative after 2014, the backlog of duplicated records was already enormous. Estimates circulated internally at DORIS suggested that as many as one in five scanned images in the legacy system had a functional duplicate stored elsewhere — though DORIS has not published a verified official figure.
The problem was compounded by staff turnover. The city's IT workforce saw significant churn between 2020 and 2022, and institutional knowledge about which vendor contracts had created which archive structures largely walked out the door. The NYC Office of Technology and Innovation, which absorbed several legacy IT functions after its reorganization in 2022, inherited a patchwork of storage environments that included both on-premises servers and early cloud migrations with incompatible folder taxonomies.
What the Remediation Program Actually Involves
The current fix, operating under a program framework the Office of Technology and Innovation refers to internally as its Records Deduplication Initiative, involves automated hash-matching — comparing unique digital fingerprints of image files across agency databases to identify exact and near-exact copies. Staff at the DORIS facility on Fulton Street in Lower Manhattan have been manually reviewing flagged matches since January 2025 to confirm before deletion. The process is slow by design: a wrongly deleted evidentiary photograph or a demolished-building inspection image carries legal and historical consequences that outweigh the cost of extra storage.
Storage costs are not trivial. Cloud storage rates for government contracts typically run several cents per gigabyte per month at scale, and the city maintains petabytes of image data across its agencies. Even a ten-percent reduction in stored volume would represent a meaningful budget line for agencies already squeezed by the housing affordability crisis and competing capital demands from MTA subway investment commitments.
For New Yorkers, the practical consequence of getting this right is faster, more reliable access to public records — housing inspection histories on buildings in East New York or the South Bronx, landmark designation photographs for contested blocks in Harlem, permit records for businesses along Flushing's Main Street. The World Cup deadline has given the remediation program a hard end-date that prior administrations never imposed. Whether the full backlog gets cleared by June 2026 is an open question city officials have not answered on the record. The program's next formal review is scheduled for September 2025, when the Office of Technology and Innovation is expected to report progress metrics to the City Council's technology committee.