Coldwater

Multi-agent simulation of the NYC rental market. Landlords, tenants, and city regulators, all powered by AI systems, grounded in real data.


About

What is Coldwater?

Coldwater is an evaluation framework that puts AI agents through multi-year economic simulations in the NYC housing market. Built on real building data from PLUTO, HPD, and DHCR, it tests whether language models can navigate rent stabilization, lease negotiations, maintenance decisions, and housing court—all under the watchful eye of a legal validator.

Named after the cold-water flats that sparked NYC's tenant protection movement, Coldwater turns a century of housing policy into a living benchmark.


Agents

Three roles, one market

Each simulation month, three types of LLM-powered agents make independent economic decisions.

Landlord

Manages real estate portfolios. Sets rents, allocates capital for maintenance and renovations, and negotiates lease terms with prospective tenants.

Tenant

Searches for housing, evaluates apartments against budget and preferences, negotiates leases, and decides when to renew or move on.

City

Enforces NYC housing regulations with legal validation. Conducts inspections, processes complaints, and adjudicates housing court proceedings.


Features

Built for rigorous evaluation

Real NYC data

Grounded in PLUTO, HPD violations, and DHCR rent stabilization records—not synthetic data.

Legal knowledge base

Encodes NYC housing law including rent stabilization, warranty of habitability, and eviction procedures.

Benchmark harness

Structured benchmark profiles with baseline comparisons, regression detection, and audit gates.

Multi-provider LLMs

Run agents on OpenRouter, OpenAI, Anthropic, or Google Gemini. Mix and match per role.

Multi-year simulations

Leases, renewals, maintenance cycles, rent adjustments, and housing court—over simulated years.

Full audit trail

Append-only event log, reasoning traces, KPI reports, and process tracing for every decision.