New York City's Department of Records and Information Services logged more than 2.3 million digital image files across its municipal archive holdings as of January 2026, and a growing share of that library is duplicated — sometimes dozens of times over. The problem, city archivists have warned in internal working documents, is costing the city real money in storage contracts and slowing the public's access to land records, permits, and historical photographs through the NYC Municipal Archives portal on Centre Street in lower Manhattan.
The issue has become newly urgent this summer. With the FIFA World Cup bringing an estimated 1.5 million additional visitors through New York between June and July 2026, city agencies have leaned harder on digitized records to process everything from vendor permits to historic venue documentation for use at MetLife Stadium in East Rutherford and ancillary sites across the five boroughs. Redundant image files — sometimes the same scanned deed appearing under three different catalog entries — have slowed processing times and created discrepancies in public-facing databases.
The city's response has centered on two programs. The first is a deduplication initiative run through the Department of Citywide Administrative Services, which contracted with a data management vendor in late 2025 to audit image repositories across more than a dozen agencies. The second is a pilot program embedded within the New York Public Library's digitization partnership, which shares archival scanning infrastructure with the Municipal Archives and has begun flagging redundant files at the point of ingest rather than after the fact — a prevention model, rather than a cleanup operation.
How New York Stacks Up Against London and Tokyo
Other major cities have grappled with the same problem, with varying degrees of success. London's Metropolitan Archives, which holds records dating to the twelfth century, rolled out an automated hash-matching system in 2023 under a £4.2 million contract with the Greater London Authority's digital infrastructure unit. By late 2024, the system had reduced duplicate image files in active circulation by roughly 34 percent, according to the GLA's published digital services report. Tokyo's Metropolitan Government launched a similar effort in fiscal year 2024 under its DX — Digital Transformation — Action Plan, targeting municipal photograph libraries held by ward offices across all 23 special wards. Tokyo's approach leaned on AI-assisted visual matching rather than file-hash comparison, which proved more effective for scanned physical documents where file metadata is often inconsistent.
New York has not yet published a comparable reduction figure for its own deduplication work. The DCAS contract, valued at $1.8 million according to city procurement records posted to the Mayor's Office of Contract Services database, runs through December 2026. Archivists familiar with the project say the audit phase alone — covering agencies including the Department of Buildings and the Landmarks Preservation Commission — took nearly five months to complete.
The Landmarks Preservation Commission is a particular pressure point. Its image library, which documents more than 37,000 individually landmarked properties citywide, has been cited internally as one of the most heavily duplicated repositories, partly because photographs were ingested from multiple sources over two decades without a unified naming convention.
What Comes Next for the Archive
The NYPL partnership pilot, operating out of the library's Stephen A. Schwarzman Building on Fifth Avenue and 42nd Street, is scheduled to expand to three additional borough branches by September 2026. If the ingest-level deduplication model proves out, city officials have indicated it could be written into standard procurement language for any future digitization contract — a structural fix rather than a recurring cleanup expense.
For members of the public, the practical effect right now is uneven. Searches on the NYC Municipal Archives online portal can still return multiple versions of the same document, particularly for property records in neighborhoods like Flatbush and Mott Haven where intensive rezoning activity generated heavy scanning volumes between 2018 and 2023. The city advises users encountering suspected duplicates to use the portal's feedback form to flag discrepancies — a manual workaround that archivists themselves acknowledge is far from ideal while the automated systems are still being built out.