Coldwater

Multi-agent simulation of the NYC rental market. Landlords, tenants, and city regulators, all powered by AI systems, grounded in real data.


About

What is Coldwater?

Coldwater is an evaluation framework that puts AI agents through multi-year economic simulations in the NYC housing market. Built on real building data from PLUTO, HPD, and DHCR, it tests whether language models can navigate rent stabilization, lease negotiations, maintenance decisions, and housing court—all under the watchful eye of a legal validator.

Named after the cold-water flats that sparked NYC's tenant protection movement, Coldwater turns a century of housing policy into a living benchmark.


Agents

Three roles, one market

Each simulation month, three types of LLM-powered agents make independent economic decisions.

Landlord

Manages real estate portfolios. Sets rents, allocates capital for maintenance and renovations, and negotiates lease terms with prospective tenants.

Tenant

Searches for housing, evaluates apartments against budget and preferences, negotiates leases, and decides when to renew or move on.

City

Enforces NYC housing regulations with legal validation. Conducts inspections, processes complaints, and adjudicates housing court proceedings.


Features

Built for rigorous evaluation

Real NYC data

Grounded in PLUTO, HPD violations, and DHCR rent stabilization records—not synthetic data.

Legal knowledge base

Encodes NYC housing law including rent stabilization, warranty of habitability, and eviction procedures.

Benchmark harness

Structured benchmark profiles with baseline comparisons, regression detection, and audit gates.

Multi-provider LLMs

Run agents on OpenRouter, OpenAI, Anthropic, or Google Gemini. Mix and match per role.

Multi-year simulations

Leases, renewals, maintenance cycles, rent adjustments, and housing court—over simulated years.

Full audit trail

Append-only event log, reasoning traces, KPI reports, and process tracing for every decision.


Latest Run

12-Month Simulation Results

A completed run with 3 real Upper West Side buildings, 200 initial tenants, and all realism features enabled. Model: gpt-4.1-mini via OpenRouter.

$932K
Net Operating Income
22.6%
Occupancy Rate
181
Units Occupied
89
Leases Signed
61
Leases Renewed
11
Violations Issued
2,263
LLM Calls
0
LLM Errors
Month NOI Occupancy Signed Renewed Moved Out Violations Complaints
Jan 2025$1,091K25.5%501011
Feb 2025$1,079K25.3%57517
Mar 2025$1,054K24.8%3104111
Apr 2025$1,074K25.0%88519
May 2025$1,088K25.4%85415
Jun 2025$1,084K25.4%112718
Jul 2025$1,073K25.1%95718
Aug 2025$1,029K24.3%741114
Sep 2025$981K23.4%1031317
Oct 2025$961K23.0%75719
Nov 2025$944K22.8%88719
Dec 2025$932K22.6%84614

Landlord Portfolios

Apex Holdings

Buildings2
Units337
Occupied142 (42.1%)
Violations11
NOI Est.$132,133
Cash PressureHigh

Atlas Properties

Buildings1
Units463
Occupied39 (8.4%)
Violations0
NOI Est.$242,382
Cash PressureLow

Audit Quality

Every agent decision is validated against evidence requirements and policy citations.

Evidence Coverage 100% Policy Citations 100% Market Comps 100% Tool Results 100% Web Citations 100%

Run c646c207 · Seed 42 · 20.8M tokens · ~$9.65 estimated cost · gpt-4.1-mini via OpenRouter