Coldwater — Multi-Agent NYC Rental Market Simulation

About

What is Coldwater?

Coldwater is an evaluation framework that puts AI agents through multi-year economic simulations in the NYC housing market. Built on real building data from PLUTO, HPD, and DHCR, it tests whether language models can navigate rent stabilization, lease negotiations, maintenance decisions, and housing court—all under the watchful eye of a legal validator.

Named after the cold-water flats that sparked NYC's tenant protection movement, Coldwater turns a century of housing policy into a living benchmark.

Agents

Three roles, one market

Each simulation month, three types of LLM-powered agents make independent economic decisions.

Landlord

Manages real estate portfolios. Sets rents, allocates capital for maintenance and renovations, and negotiates lease terms with prospective tenants.

Tenant

Searches for housing, evaluates apartments against budget and preferences, negotiates leases, and decides when to renew or move on.

City

Enforces NYC housing regulations with legal validation. Conducts inspections, processes complaints, and adjudicates housing court proceedings.

Features

Built for rigorous evaluation

Real NYC data

Grounded in PLUTO, HPD violations, and DHCR rent stabilization records—not synthetic data.

Legal knowledge base

Encodes NYC housing law including rent stabilization, warranty of habitability, and eviction procedures.

Benchmark harness

Structured benchmark profiles with baseline comparisons, regression detection, and audit gates.

Multi-provider LLMs

Run agents on OpenRouter, OpenAI, Anthropic, or Google Gemini. Mix and match per role.

Multi-year simulations

Leases, renewals, maintenance cycles, rent adjustments, and housing court—over simulated years.

Full audit trail

Append-only event log, reasoning traces, KPI reports, and process tracing for every decision.

Latest Run

12-Month Simulation Results

A completed run with 3 real Upper West Side buildings, 200 initial tenants, and all realism features enabled. Model: gpt-4.1-mini via OpenRouter.

$932K

Net Operating Income

22.6%

Occupancy Rate

181

Units Occupied

Leases Signed

Leases Renewed

Violations Issued

2,263

LLM Calls

LLM Errors

Month	NOI	Occupancy	Signed	Renewed	Moved Out	Violations	Complaints
Jan 2025	$1,091K	25.5%	5	0	1	0	11
Feb 2025	$1,079K	25.3%	5	7	5	1	7
Mar 2025	$1,054K	24.8%	3	10	4	1	11
Apr 2025	$1,074K	25.0%	8	8	5	1	9
May 2025	$1,088K	25.4%	8	5	4	1	5
Jun 2025	$1,084K	25.4%	11	2	7	1	8
Jul 2025	$1,073K	25.1%	9	5	7	1	8
Aug 2025	$1,029K	24.3%	7	4	11	1	4
Sep 2025	$981K	23.4%	10	3	13	1	7
Oct 2025	$961K	23.0%	7	5	7	1	9
Nov 2025	$944K	22.8%	8	8	7	1	9
Dec 2025	$932K	22.6%	8	4	6	1	4

Landlord Portfolios

Apex Holdings

Buildings2

Units337

Occupied142 (42.1%)

Violations11

NOI Est.$132,133

Cash PressureHigh

Atlas Properties

Buildings1

Units463

Occupied39 (8.4%)

Violations0

NOI Est.$242,382

Cash PressureLow

Audit Quality

Every agent decision is validated against evidence requirements and policy citations.

Evidence Coverage 100% Policy Citations 100% Market Comps 100% Tool Results 100% Web Citations 100%

Run c646c207 · Seed 42 · 20.8M tokens · ~$9.65 estimated cost · gpt-4.1-mini via OpenRouter