The New York City Department of Buildings has spent the better part of two years scanning and uploading permit records for properties across all five boroughs — and the effort has surfaced a problem that nobody put in the budget: thousands of duplicate scanned images clogging the city's public-facing document portals, some properties showing the same inspection photo filed three or four times across overlapping database entries. It is a mundane-sounding headache with real consequences, particularly for tenants fighting housing court cases and contractors pulling permits in neighborhoods like Bushwick and the South Bronx where renovation activity has surged.
The timing matters. New York is in the middle of an unprecedented push to modernize its municipal data systems — partly driven by congestion pricing revenue commitments and partly by federal infrastructure dollars flowing through the 2021 Bipartisan Infrastructure Law. The city's Office of Technology and Innovation, operating out of 1 Centre Street in Lower Manhattan, has flagged duplicate image management as one of several technical debt problems that need resolving before the broader Open Data expansion slated for late 2026 can proceed cleanly. Bad image data upstream creates bad search results downstream, and in a city where housing court filings at 111 Centre Street run into the tens of thousands annually, the administrative friction adds up fast.
What the City Is Actually Doing
The Department of Buildings began piloting an automated deduplication script in January 2026, initially on records tied to properties in Community Board 3 in Manhattan — covering the Lower East Side and parts of the East Village — before expanding the test to selected blocks in Astoria, Queens. The tool flags images with more than 85 percent pixel-similarity against existing database entries and routes them for human review before deletion. It is not a fast process. As of late June, the city had reviewed roughly 40,000 flagged image pairs, according to internal progress documents discussed at a May 2026 City Council Committee on Technology hearing. The backlog across the full Buildings Department archive runs into the hundreds of thousands.
The Metropolitan Transportation Authority has dealt with a parallel version of this problem inside its maintenance documentation systems. MTA technicians photographing subway infrastructure — particularly along the A/C/E lines during the ongoing signal modernization program — generated duplicate image submissions whenever field workers uploaded photos before syncing with the central asset management server. The MTA began addressing this in 2024 using off-the-shelf deduplication software from a vendor already contracted for the Capital Program's digital asset tracking.
How New York Compares to London and Tokyo
London's planning authority, the Greater London Authority, completed a similar deduplication exercise across its development planning portal in 2023, clearing approximately 1.2 million redundant documents from a system covering 33 boroughs. The GLA used a hybrid approach: automated hashing for identical files, and machine-learning similarity scoring for near-duplicates like scanned photographs taken seconds apart. The cleanup took 14 months and cost roughly £2.3 million. Tokyo's metropolitan government, managing property and infrastructure records across 23 special wards, embedded deduplication checks directly into its upload workflow starting in 2021, preventing the problem from accumulating in the first place — a prevention-over-remediation model that New York's Office of Technology and Innovation has acknowledged studying.
New York is playing catch-up, and officials have not hidden that. The city's five-year Digital Master Plan, published in March 2025, listed legacy data quality issues — duplicate records among them — as a top-tier infrastructure risk. The plan allocated $18 million over three fiscal years to address data integrity across multiple agencies, though the Buildings Department deduplication effort draws from a separate capital line.
For anyone navigating the city's online permit or housing inspection portals right now, the practical advice is straightforward: if a property record on the Department of Buildings' BIS Portal shows conflicting or repeated inspection images, file a data correction request through the agency's online feedback tool, introduced in February 2026. Community organizations like Housing Rights Initiative, based in the Flatiron District, have been advising tenants to document portal inconsistencies as part of housing court case preparation. The city has said it expects the core deduplication backlog to be substantially cleared by the first quarter of 2027 — a deadline that will arrive faster than the bureaucracy probably expects.