Multi-agent simulation of the NYC rental market. Landlords, tenants, and city regulators, all powered by AI systems, grounded in real data.
Coldwater is an evaluation framework that puts AI agents through multi-year economic simulations in the NYC housing market. Built on real building data from PLUTO, HPD, and DHCR, it tests whether language models can navigate rent stabilization, lease negotiations, maintenance decisions, and housing court—all under the watchful eye of a legal validator.
Named after the cold-water flats that sparked NYC's tenant protection movement, Coldwater turns a century of housing policy into a living benchmark.
Each simulation month, three types of LLM-powered agents make independent economic decisions.
Manages real estate portfolios. Sets rents, allocates capital for maintenance and renovations, and negotiates lease terms with prospective tenants.
Searches for housing, evaluates apartments against budget and preferences, negotiates leases, and decides when to renew or move on.
Enforces NYC housing regulations with legal validation. Conducts inspections, processes complaints, and adjudicates housing court proceedings.
Grounded in PLUTO, HPD violations, and DHCR rent stabilization records—not synthetic data.
Encodes NYC housing law including rent stabilization, warranty of habitability, and eviction procedures.
Structured benchmark profiles with baseline comparisons, regression detection, and audit gates.
Run agents on OpenRouter, OpenAI, Anthropic, or Google Gemini. Mix and match per role.
Leases, renewals, maintenance cycles, rent adjustments, and housing court—over simulated years.
Append-only event log, reasoning traces, KPI reports, and process tracing for every decision.