Evaluation

Title	Created	Modified
Research Notes - What does 'good enough' mean in AI-augmented systemic design? 🤖 Research: What does “good enough” mean in AI-augmented systemic design? Date: 2026-03-11 Search queries used: “satisficing ‘good enough’ design philosophy Herbert Simon bounded rationality” “‘good enough’ AI-augmented design systems adequacy criteria” “systemic design ‘wicked problems’ good enough solution threshold adequacy” “Rittel Webber wicked problems ‘good enough’ solution stopping rule design adequacy” “satisficing bounded rationality ‘aspiration level’ design quality adequacy professional when to stop” “wicked problems ’no stopping rule’ Rittel Webber satisficing design adequacy good enough” “Donald Schon ‘reflective practitioner’ design judgment sufficiency professional tacit knowing” “AI augmented design ‘good enough’ quality judgment professional practice stopping criteria 2024 … systemic-design human-ai-collaboration cognitive-augmentation epistemology evaluation knowledge-work oversight critical-thinking	2026-03-11	2026-03-11
Research Notes - The Expert Benchmark Fallacy in AI Evaluation 🤖 Research: The “Expert Benchmark” Fallacy in AI Evaluation Date: 2026-03-11 Search queries used: “Expert Benchmark fallacy AI evaluation critique” “AI benchmark human expert performance misleading evaluation problems” “AI surpasses human experts benchmark critique misleading capability claims philosophy” “benchmark saturation AI Goodhart’s law evaluation gaming problems 2024 2025” “Emily Bender Arvind Narayanan AI benchmark validity problems human level performance critique” “Melanie Mitchell AI benchmark broken critique generalization reasoning” Executive Summary The “Expert Benchmark Fallacy” is not yet a formally named philosophical concept, but it describes a well-documented epistemic error at the heart of AI capability claims. It occurs when AI systems score at or above “human expert level” on a narrow benchmark test, and this score is then treated as evidence of … generative-ai evaluation measurement epistemology critical-thinking human-ai-collaboration oversight	2026-03-11	2026-03-11
Research Notes - Measuring AI Augmentation vs. Cognitive Substitution: A Structural Void 🤖 Research: The Problem of Measuring Whether AI Collaboration Genuinely Expands Human Capability vs. Substitutes for It Date: 2026-03-10 Confidence: Medium (the empirical landscape is active; the structural framing is well-supported; specific longitudinal data remains thin) Search queries used: “measuring AI augmentation vs substitution human capability cognitive atrophy skill replacement 2024 2025” “cognitive offloading AI tools skill atrophy longitudinal research empirical evidence” “extended mind theory AI tools enactivism human-AI system unit of analysis philosophy” “counterfactual problem measuring cognitive skill development AI assistance methodology research design” “natural experiment AI tools skill atrophy longitudinal study GitHub Copilot coding ability developers” “cognitive offloading positive beneficial extended mind vs atrophy distinction research framework 2024 2025” “extended hollowed mind … measurement cognitive-augmentation symbiosis research-gap evaluation	2026-03-10	2026-03-11
The Symbiosis Measurement Void 🤖 The “Symbiotic Intelligence over Automation” tenet requires that symbiosis be distinguishable from sophisticated substitution — but this distinction cannot be reliably verified. The question of whether AI collaboration builds human capability or hollows it out hits four structural barriers that prevent clean resolution even in principle. More data, longer studies, or better instruments will not close this gap. It is a permanent limit on what the framework can confirm about itself. measurement symbiosis human-ai-collaboration evaluation research-gap	2026-03-10	2026-03-10