The Daily New York

New York news, every day

News

New York Leads on Cracking Down on Duplicate Images in Public Records — But Other Cities Are Closing the Gap

As municipalities worldwide scramble to clean up digitized archives bloated with redundant files, New York's approach offers a model — and a cautionary tale.

By New York News Desk · Published 4 July 2026, 2:48 pm

3 min read

New York Leads on Cracking Down on Duplicate Images in Public Records — But Other Cities Are Closing the Gap
Photo: Photo by Mingyang LIU on Pexels

New York City's Department of Records and Information Services logged more than 2.3 million digital image files across its municipal archive holdings as of January 2026, and a growing share of that library is duplicated — sometimes dozens of times over. The problem, city archivists have warned in internal working documents, is costing the city real money in storage contracts and slowing the public's access to land records, permits, and historical photographs through the NYC Municipal Archives portal on Centre Street in lower Manhattan.

The issue has become newly urgent this summer. With the FIFA World Cup bringing an estimated 1.5 million additional visitors through New York between June and July 2026, city agencies have leaned harder on digitized records to process everything from vendor permits to historic venue documentation for use at MetLife Stadium in East Rutherford and ancillary sites across the five boroughs. Redundant image files — sometimes the same scanned deed appearing under three different catalog entries — have slowed processing times and created discrepancies in public-facing databases.

The city's response has centered on two programs. The first is a deduplication initiative run through the Department of Citywide Administrative Services, which contracted with a data management vendor in late 2025 to audit image repositories across more than a dozen agencies. The second is a pilot program embedded within the New York Public Library's digitization partnership, which shares archival scanning infrastructure with the Municipal Archives and has begun flagging redundant files at the point of ingest rather than after the fact — a prevention model, rather than a cleanup operation.

How New York Stacks Up Against London and Tokyo

Other major cities have grappled with the same problem, with varying degrees of success. London's Metropolitan Archives, which holds records dating to the twelfth century, rolled out an automated hash-matching system in 2023 under a £4.2 million contract with the Greater London Authority's digital infrastructure unit. By late 2024, the system had reduced duplicate image files in active circulation by roughly 34 percent, according to the GLA's published digital services report. Tokyo's Metropolitan Government launched a similar effort in fiscal year 2024 under its DX — Digital Transformation — Action Plan, targeting municipal photograph libraries held by ward offices across all 23 special wards. Tokyo's approach leaned on AI-assisted visual matching rather than file-hash comparison, which proved more effective for scanned physical documents where file metadata is often inconsistent.

New York has not yet published a comparable reduction figure for its own deduplication work. The DCAS contract, valued at $1.8 million according to city procurement records posted to the Mayor's Office of Contract Services database, runs through December 2026. Archivists familiar with the project say the audit phase alone — covering agencies including the Department of Buildings and the Landmarks Preservation Commission — took nearly five months to complete.

The Landmarks Preservation Commission is a particular pressure point. Its image library, which documents more than 37,000 individually landmarked properties citywide, has been cited internally as one of the most heavily duplicated repositories, partly because photographs were ingested from multiple sources over two decades without a unified naming convention.

What Comes Next for the Archive

The NYPL partnership pilot, operating out of the library's Stephen A. Schwarzman Building on Fifth Avenue and 42nd Street, is scheduled to expand to three additional borough branches by September 2026. If the ingest-level deduplication model proves out, city officials have indicated it could be written into standard procurement language for any future digitization contract — a structural fix rather than a recurring cleanup expense.

For members of the public, the practical effect right now is uneven. Searches on the NYC Municipal Archives online portal can still return multiple versions of the same document, particularly for property records in neighborhoods like Flatbush and Mott Haven where intensive rezoning activity generated heavy scanning volumes between 2018 and 2023. The city advises users encountering suspected duplicates to use the portal's feedback form to flag discrepancies — a manual workaround that archivists themselves acknowledge is far from ideal while the automated systems are still being built out.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily New York

This article was produced by the The Daily New York editorial desk and covers news in New York. See our editorial standards for how we use AI.

The Daily New York brief

The day's New York news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to New York news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily New York and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily New York

More in News

Enjoyed this story? Get tomorrow's briefing free.