New York City's municipal agencies collectively store an estimated tens of millions of duplicate digital images across their records systems — scanned documents, permit photos, inspection snapshots — and the redundancy is costing the city in server capacity, staff hours, and delayed public access to records. The problem did not emerge overnight. It accumulated through years of uncoordinated digitization drives, competing agency software platforms, and a persistent absence of citywide standards for how scanned images should be catalogued and deduplicated before archiving.
The timing matters because city government is under mounting pressure to modernize. The Adams administration has pushed a series of digital-services initiatives, and the 2026 FIFA World Cup — with matches played at MetLife Stadium just across the Hudson in East Rutherford, and fan zones planned throughout Midtown Manhattan — has forced agencies including the Department of Transportation and the NYPD to accelerate document-sharing workflows that expose just how fragmented the city's back-end systems remain.
A Problem With Deep Roots
The duplication crisis traces to at least 2012, when the Bloomberg administration began a major push to scan paper records across agencies ranging from the Department of Buildings to the City Clerk's office at 141 Worth Street in Lower Manhattan. Each agency contracted separately. The Department of Buildings landed on one imaging vendor; the Department of City Planning, headquartered at 120 Broadway, adopted a different platform with incompatible metadata standards. When files moved between systems — or when agencies simply re-scanned documents to meet new request deadlines — duplicates multiplied with no automatic check to catch them.
A 2019 audit by the New York City Comptroller's office flagged fragmented data governance as a systemic risk across multiple agencies, noting that redundant storage was contributing to ballooning IT costs. Storage is not cheap: enterprise-grade archival solutions that meet the city's security requirements run between $8,000 and $15,000 per terabyte annually for managed services, according to publicly available government procurement schedules from the Mayor's Office of Contract Services. Even conservative estimates of the volume of duplicate imagery suggest the waste runs into seven figures annually.
The Municipal Archives on Chambers Street, which serves as the official repository for permanent city records, has for years flagged the gap between what agencies say they are digitizing and what actually arrives with clean, deduplicated metadata. Staff there have described processing backlogs that stretch months, in part because incoming files require manual review to catch exact-copy and near-duplicate images before ingestion. The Archives holds records dating to the Dutch colonial period, and its staff treats the integrity of the collection as non-negotiable — which means the deduplication burden largely falls on them rather than on the originating agencies.
What Has to Change
The City Council's Committee on Technology passed a non-binding resolution in March 2025 urging the Department of Information Technology and Telecommunications — known as DoITT, now folded into the Mayor's Office of Technology and Innovation — to establish unified image-hashing standards across all agencies by the end of fiscal year 2026, which runs through June 30. That deadline has passed without a published implementation plan, according to the committee's public calendar.
Advocates for open government, including the Reinvent Albany coalition, have long argued that duplicate records don't just waste money — they complicate Freedom of Information Law requests by creating uncertainty about which version of a document is authoritative. A single permit photo that exists in three slightly different scanned versions across three agency servers can stall a FOIL response for weeks while staff determine which copy to release.
For residents and journalists filing records requests, the practical advice is straightforward: be specific. Cite exact document numbers, addresses, and date ranges in any FOIL submission to the relevant agency. The more precise the request, the less likely staff are to pull multiple redundant versions and charge per-page fees — currently capped at 25 cents per page under state law — for duplicates. The longer fix requires the city to finally build the cross-agency image registry that auditors have been recommending since the Obama administration was still in its first term.