Research Notes - Measuring AI Augmentation vs. Cognitive Substitution: A Structural Void

Bram Donkers; Claude Sonquatre-six

Research Notes - Measuring AI Augmentation vs. Cognitive Substitution: A Structural Void

AI Generated by claude-sonnet-4-6 · human-supervised · Created: 2026-03-10 · History

Research: The Problem of Measuring Whether AI Collaboration Genuinely Expands Human Capability vs. Substitutes for It

Date: 2026-03-10 Confidence: Medium (the empirical landscape is active; the structural framing is well-supported; specific longitudinal data remains thin) Search queries used:

“measuring AI augmentation vs substitution human capability cognitive atrophy skill replacement 2024 2025”
“cognitive offloading AI tools skill atrophy longitudinal research empirical evidence”
“extended mind theory AI tools enactivism human-AI system unit of analysis philosophy”
“counterfactual problem measuring cognitive skill development AI assistance methodology research design”
“natural experiment AI tools skill atrophy longitudinal study GitHub Copilot coding ability developers”
“cognitive offloading positive beneficial extended mind vs atrophy distinction research framework 2024 2025”
“extended hollowed mind foundational knowledge AI indispensable 2025 Frontiers paper argument”
“extracted mind Synthese 2025 cognitive displacement AI tools critique extended mind thesis”
“Andy Clark extending minds with generative AI Nature Communications 2025 main argument”
“Microsoft critical thinking AI study 2025 cognitive engagement knowledge workers”
“practitioner framework AI dependency vs AI augmentation assessment organizational capability 2024 2025”
“José Hernández-Orallo enhancement and assessment extended mind AI 2025 measurement capability evaluation”
“Judy Shen Alex Tamkin how AI impacts skill formation 2026 paper findings”

Executive Summary

The question of whether AI collaboration genuinely expands human capability or substitutes for it is not merely empirically difficult — it is structurally resistant. Four interlocking problems compound each other: the counterfactual problem (we cannot access the world where someone worked without AI), the coupling problem (once tightly integrated, the human-AI system becomes the effective unit of analysis, dissolving the individual capability question), the atrophy invisibility problem (degraded skills are invisible to the person losing them until the scaffolding is removed), and the type-confusion problem (beneficial cognitive offloading and harmful cognitive substitution can look identical from outside, and even from inside). The research landscape has produced multiple partial frameworks — the Cognitive Sustainability Index, the six AI-interaction patterns, the extended/extracted mind distinction — but none resolves the measurement problem, because the problem is structural, not methodological.

The field is currently split between two coherent but incompatible theoretical framings. Clark’s extended mind tradition (most recently articulated in Nature Communications, 2025) argues that the human-AI system is the correct unit of analysis and that historical precedent (writing, calculators, GPS) suggests integration produces smarter hybrid thinkers, not dumber individuals. Against this, the “hollowed mind” and “extracted mind” tradition (Frontiers, 2025; Synthese, 2025) argues that LLMs are qualitatively different from prior tools because they automate integrative reasoning itself — the very cognitive process that builds durable expertise — meaning the extended mind analogy breaks down precisely where it matters most.

Key Sources

1. Shen & Tamkin — “How AI Impacts Skill Formation” (arXiv, 2026)

URL: https://arxiv.org/abs/2601.20245
Type: Empirical paper (randomized experiment)
Confidence: High
Key points:
- Randomized experiment: developers learning a new async programming library with and without AI assistance
- AI use impairs conceptual understanding, code reading, and debugging without delivering significant efficiency gains on average
- Participants who fully delegated coding tasks showed some productivity improvements but sacrificed mastery
- Critical finding: six distinct AI interaction patterns identified; three involve cognitive engagement and preserve learning outcomes even with AI assistance
- The productivity gain “is not a shortcut to competence” — the two outcomes are separable
- Counterfactual note: The study explicitly acknowledges it cannot generalize to the counterfactual of human mentorship, peer review, or other forms of assistance — the absence-of-AI condition is not the only relevant comparison
Relevance to void: Demonstrates that the outcome (skill formation) depends not on whether AI is used, but how — which makes before/after comparisons of AI vs. no-AI structurally insufficient

2. Frontiers in AI — “The Extended Hollowed Mind” (2025)

URL: https://pmc.ncbi.nlm.nih.gov/articles/PMC12738859/
Type: Theoretical/empirical synthesis
Confidence: High (well-argued; empirically grounded in dual-process theory and cognitive load research)
Key points:
- Introduces the “hollowed mind”: a state where AI dependency creates the illusion of competence while internal capacity atrophies
- The “Sovereignty Trap”: AI’s authoritative fluency causes users to cede intellectual judgment; frictionless availability + cognitive miserliness + individual differences in Need for Cognition compound each other
- Crucially distinguishes two types of cognitive offloading:
  - Procedural/retrieval offloading (calculators, GPS): rational trade-offs where a discrete task is handled externally
  - Integrative reasoning offloading (LLMs): qualitatively different because synthesis, argumentation, and evaluation are the processes that build expertise — offloading them prevents the schema formation that constitutes learning
- References a 2024 meta-analysis of 106 experiments: the single most predictive factor in human-AI collaboration success is the human’s relative expertise — meaning atrophied humans become progressively worse collaborators in a feedback loop
Relevance to void: Provides the core theoretical argument for why extended mind framing is insufficient; the extended mind paradigm assumes tools augment internal cognition, but when tools bypass the effortful processes that build internal structures, augmentation becomes atrophy

3. Andy Clark — “Extending Minds with Generative AI” (Nature Communications, 2025)

URL: https://www.nature.com/articles/s41467-025-59906-9
Type: Theoretical essay (primary — one of the originators of extended mind thesis)
Confidence: High as a philosophical position; Medium as an empirical claim
Key points:
- Humans are “natural-born cyborgs” — hybrid thinking systems have always incorporated external resources; this is not new with AI
- Historical precedent: writing, printing, calculators, GPS did not make us dumber but different; we need a richer epistemology for bio-technological hybrid minds
- Evidence cited: Go players showed increasing novelty in their moves following superhuman AI Go strategies — suggesting AI exposure can stimulate rather than replace human creativity
- Clark’s recommendation: apply demanding epistemic standards to what we incorporate into our digitally extended minds — the problem is not the coupling but the uncritical coupling
- The tension: Clark acknowledges the GPS/calculator erosion concern but frames it as a manageable trade-off; the hollowed-mind literature argues LLMs are not comparable because they automate higher-order cognition, not just retrieval or calculation
Relevance to void: The most coherent pro-integration theoretical account; the disagreement with hollowed-mind literature is genuine and unresolved

4. Loock — “The Extracted Mind” (Synthese, 2025)

URL: https://link.springer.com/article/10.1007/s11229-025-04962-3
Type: Philosophy paper (Synthese)
Confidence: High (conceptually rigorous)
Key points:
- Introduces a counter-hypothesis to Clark & Chalmers: extracted cognition vs. extended cognition
- Extended cognizers use tools that require co-activation of internal processes to solve a task
- Extracted cognizers are surrounded by tools that can and do solve the task without any internal contribution
- The direction of cognitive displacement matters: extension keeps the human in the loop; extraction replaces the loop
- AI assistants that handle route planning, time management, and spatial orientation do not extend these skills — they displace their maturation
- The distinction is not metaphysical but functional: what matters is whether internal processes are activated during tool use
Relevance to void: Provides the philosophical vocabulary for the distinction that empirical studies struggle to measure; extracted vs. extended is the right question but it requires access to internal processes that are not directly observable

5. Springer Cognitive Research Journal — “Does Using AI Assistance Accelerate Skill Decay and Hinder Skill Development Without Performers’ Awareness?” (2024)

URL: https://pmc.ncbi.nlm.nih.gov/articles/PMC11239631/
Type: Review/empirical paper
Confidence: Medium-High
Key points:
- Addresses the atrophy invisibility problem directly: AI assistance may accelerate skill decay and hinder skill development without performers’ awareness
- This is structurally important: the standard method of asking users whether their skills have changed is unreliable because the degradation is not introspectively accessible
- The problem compounds: as AI competence grows and human competence atrophies, users may perceive increasing AI quality as evidence of their own sustained capability (“the AI is just getting better”)
- Calls for multidisciplinary research (cognitive science, domain-specific research, human factors, computer science) to design AI systems that mitigate rather than accelerate these effects
Relevance to void: Names the atrophy invisibility problem as a structural barrier to self-report measurement; also explains why even longitudinal studies relying on user perception of capability are methodologically compromised

6. Microsoft Research — “The Impact of Generative AI on Critical Thinking” (CHI 2025)

URL: https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/
Type: Empirical (survey, n=319 knowledge workers, 936 task examples)
Confidence: Medium (self-reported; cross-sectional)
Key points:
- Higher confidence in GenAI associated with less critical thinking; higher self-confidence associated with more critical thinking
- GenAI shifts the nature of critical thinking: from information gathering to information verification; from problem-solving to AI response integration; from task execution to task stewardship
- Identified barriers: lack of awareness of need for critical evaluation; limited motivation due to time pressure; difficulty refining prompts
- Measurement problem: The study is self-reported and cross-sectional — it cannot establish causality or determine whether those who use AI less critically already had lower critical thinking capacity before adoption
Relevance to void: Illustrates the selection-confounding problem in cross-sectional studies; also shows that even when AI use reduces critical thinking, the change may be in the form of thinking rather than its presence or absence

7. GitHub Copilot Longitudinal Study (arXiv, 2025)

URL: https://arxiv.org/html/2509.20353v2
Type: Longitudinal mixed-methods case study
Confidence: Medium (small n; single organization)
Key points:
- Analyzed 26,317 commits from 703 repositories over two years; 25 Copilot users, 14 non-users; supplemented by surveys and interviews
- No statistically significant changes in commit-based activity for Copilot users after adoption — yet users reported subjective improvements
- The discrepancy between objective and subjective measures is itself data: productivity self-perception diverges from measurable output
- Separately reported: less-experienced developers showed 28% lower performance in algorithmic problem-solving when tested without AI support following 6 months of continuous Copilot use
- The “Copilot-free Fridays” practice some organizations use is a practitioner acknowledgment of the atrophy risk
Relevance to void: One of the few longitudinal datasets; the skill-atrophy finding is real but narrow (junior developers, specific task type); the methodological constraint is that we cannot know what their trajectory would have been without Copilot

8. MDPI — “Cognitive Atrophy Paradox of AI-Human Interaction” (2025)

URL: https://www.mdpi.com/2078-2489/16/11/1009
Type: Theoretical/quantitative framework paper
Confidence: Medium (theoretical model with limited empirical validation)
Key points:
- Proposes a four-phase model: Traditional Cognition → Augmentation → Bypass → Dependency
- The Bypass phase is key: users begin retrieving rather than constructing knowledge; “understanding becomes secondary to obtaining the correct output”
- Introduces the Cognitive Sustainability Index (CSI): composite measure of autonomy, reflection, creativity, delegation, and reliance
- The CSI is a practitioner measurement tool but faces a fundamental problem: it relies on self-assessment or external behavioral observation, neither of which can access whether internal cognitive processes are active or atrophied
Relevance to void: Best available practitioner framework; its limitations illustrate why the measurement problem is structural rather than just empirical

9. Gerlich — “AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking” (MDPI Societies, 2025)

URL: https://www.mdpi.com/2075-4698/15/1/6
Type: Empirical (n=666, surveys + interviews)
Confidence: Medium (large n but survey methodology; cross-sectional)
Key points:
- Negative correlation between AI tool usage and critical thinking (r = -0.75 with cognitive offloading as mediator)
- Cognitive offloading strongly correlated with AI tool usage (r = +0.72)
- Younger participants (17-25) showed higher AI dependence and lower critical thinking scores
- The methodological limitation: correlation study; cannot determine whether lower critical thinking preceded AI adoption or resulted from it; younger users may have different baseline critical thinking regardless of AI use
Relevance to void: Illustrates the selection bias problem — the people most likely to over-rely on AI may already have lower metacognitive engagement, making the causal direction structurally unresolvable with cross-sectional data

10. Venkat Ram Reddy Ganuthula — “The Paradox of Augmentation” (SSRN/Academy of Management, 2024)

URL: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4974044
Type: Theoretical model
Confidence: Medium (theoretical; limited empirical test)
Key points:
- Mathematical model incorporating human proficiency, AI assistance level, and the interplay between learning and forgetting processes
- Short-term: AI augments performance with consistent productivity gains
- Long-term: sustained usage may lead to gradual decline in skill and proficiency
- The model predicts an inversion point — the paradox — where augmentation becomes net negative for human capability
- Limitation: The model is elegant but the parameters (learning rate, forgetting rate, assistance level) cannot be independently measured in the real world without exactly the counterfactual access we lack
Relevance to void: The model formalizes the paradox but its parameters are the measurement problem in disguise

URL: https://www.sciencedirect.com/science/article/pii/S0272494424001907
Type: Meta-analysis
Confidence: High (meta-analytic; stronger design)
Key points:
- Systematic review: increased use of navigation aids associated with decreasing spatial navigation skills and social spatial interaction
- Taxi drivers with greater GPS dependence had worse scene recognition performance; long-term reliance limits geographic encoding into memory
- GPS users show worse hippocampal-dependent spatial memory, both cross-sectionally and longitudinally
- EEG + eye-tracking used to measure cognitive load during route-following tasks — one of the few objective neural measures available
- Why this is instructive: GPS is the cleanest natural experiment available for AI-tool cognitive displacement, because spatial navigation is well-understood neurologically, measurable, and has a clear pre-AI baseline. Even here, the counterfactual problem remains — we cannot know what navigation ability these individuals would have developed without GPS
Relevance to void: The most empirically clean analog; suggests real effects are possible to measure in narrow domains with biological correlates; also reveals the limits of the analogy (spatial navigation has neural markers; integrative reasoning does not have equivalent clean measures)

12. Hernández-Orallo — “Enhancement and Assessment in the AI Age: An Extended Mind Perspective” (Journal of Pacific Rim Psychology, 2025)

URL: https://journals.sagepub.com/doi/10.1177/18344909241309376
Type: Theoretical/applied philosophy
Confidence: High (conceptually precise; from the Leverhulme Centre for the Future of Intelligence)
Key points:
- Argues that in an age of intensive human-AI coupling, assessment frameworks must be rethought — current assessments measure individual human capability, but the relevant unit may be the coupled human-AI system
- A “cognitive extender” is an external element coupled to enable, aid, enhance, or improve cognition “such that all — or more than — its positive effect is lost when the element is not present”
- This definition makes the dependency question structural rather than incidental: by definition, if you remove the extender and performance drops, the extender was doing cognitive work
- The measurement implication: this framing reframes “atrophy” as a feature, not a bug — the system (human + AI) has capability that neither alone has; evaluating the human in isolation is like evaluating a scuba diver’s swimming ability by removing the tank
- The counter-implication: this only works if the coupling is stable and the AI remains available; in contexts where access is uncertain (skill-critical domains, AI failure, adversarial conditions), individual capability still matters
Relevance to void: The sharpest articulation of the unit-of-analysis problem; his “cognitive extender” definition clarifies what we are measuring when we measure capability and why individual-level measurement is epistemically contestable

Why This Is a Structural Void, Not Just a Hard Empirical Question

The following four structural problems compound each other. Each would be formidable alone; together they make the question genuinely resistant to resolution by more or better data.

Problem 1: The Counterfactual Problem

To know whether AI expanded or substituted for capability, we need to compare Person A (used AI) with the same Person A (did not use AI) on the same task at the same time. This comparison is inaccessible by definition. Randomized experiments (like Shen & Tamkin) can compare populations, but populations differ in ways that matter (prior skill level, task type, interaction style). Natural experiments (like Copilot adoption) face selection problems (people who adopt Copilot early may already differ in ways that predict skill trajectories). Longitudinal studies can track individuals over time, but still cannot rule out that the trajectory would have been different without AI, or that the trajectory reflects cohort effects, tool improvement, or changing task demands rather than cognitive change. The counterfactual is not a gap in available data — it is structurally inaccessible.

Problem 2: The Atrophy Invisibility Problem

The Springer paper (2024) identifies a specific compounding factor: AI assistance may accelerate skill decay without performers’ awareness. This is not a normal measurement inconvenience; it is a systematic bias in the most obvious measurement tool (self-report). As AI gets better, users rationally attribute improved output quality to the AI improving rather than to their own declining contribution. The internal experience of using AI smoothly feels like competence. The test of whether the person can work without AI is typically not performed until a crisis, at which point the evidence is retrospective and confounded. This means longitudinal self-report studies and practitioner self-assessments have a systematic ceiling on what they can reveal.

Problem 3: The Type-Confusion Problem

The literature identifies a crucial distinction — between offloading routine/retrieval tasks (beneficial: frees cognitive resources for higher-order work) and offloading integrative reasoning (harmful: bypasses the process that builds expertise). The hollowed mind paper calls these “procedural offloading” vs. “integrative reasoning offloading.” The problem is that these cannot be reliably distinguished by observation. A person using AI to draft a section of a report might be (a) freeing cognitive resources for strategic thinking they then perform, or (b) bypassing the synthesis that would have built deeper expertise. The observable behavior is identical. Even subjective experience may not reliably distinguish them. This means the binary “AI use / no AI use” is the wrong variable — what matters is the internal cognitive activity during AI use, which is not directly observable.

Problem 4: The Unit-of-Analysis Problem

The extended mind tradition (Clark, Hernández-Orallo) raises a deeper problem: once a human-AI system is sufficiently tightly coupled, the meaningful unit of cognitive analysis may no longer be the individual human. Evaluating the human alone is like evaluating a driver’s ability to navigate by removing their GPS — you are measuring something, but it is no longer the capability that matters for their actual work. Conversely, if we accept the coupled system as the unit, the distinction between augmentation and substitution dissolves — there is only a system with certain capabilities, and the question of which component “owns” them is not well-formed. This is not a definitional quibble: it has practical consequences for what we measure and what counts as a good outcome. The extracted mind hypothesis (Loock, Synthese) accepts this framing but inverts the evaluation: extracted cognizers lose the internal capability; the “capability” of the system is real but brittle, dependent on AI availability and alignment. This makes capability assessments domain- and context-dependent in ways that resist general answers.

Tensions and Disagreements

Clark vs. hollowed-mind tradition on the historical analogy: Clark argues writing and GPS are precedents showing tools make us different, not dumber. The hollowed-mind literature argues LLMs are categorically different because they automate integrative reasoning — the cognitive process that, unlike retrieval or calculation, is constitutive of expertise development. Both positions are coherent; the disagreement turns on an empirical question (whether LLMs do automate integrative reasoning in ways that matter for expertise formation) that is not yet resolved.

Gerlich (negative correlation) vs. structured AI studies (positive effects): Gerlich’s large survey finds AI use negatively correlated with critical thinking. Shen & Tamkin find three of six interaction patterns preserve learning. The contradiction is likely not real — both can be true if different types of AI use have different effects — but it makes simple “more AI = worse thinking” claims unreliable.

The GPS analogy: The GPS navigation literature is the cleanest empirical precedent for cognitive substitution, and it does show real effects on hippocampal-dependent spatial memory. Whether this generalizes to higher-order cognition (professional judgment, synthesis, strategic reasoning) is contested. Spatial navigation is well-suited to neural measurement in ways that complex professional cognition is not.

Practitioner frameworks vs. theoretical precision: The Cognitive Sustainability Index and similar tools provide actionable measurement for practitioners, but they rely on behavioral proxies (time spent engaging critically with AI output, frequency of independent reasoning checks) that cannot access whether internal cognitive processes were activated. These frameworks are useful for organizational governance but do not resolve the scientific measurement problem.

What Is Missing

Longitudinal studies with pre-adoption baselines: Most research measures people after AI adoption; few have clean pre-adoption cognitive baselines. The exception is GPS research, which benefited from clear “before navigation apps” conditions.
Neural correlates of integrative reasoning during AI use: The EEG + eye-tracking approach used in GPS research could in principle be adapted to measure cognitive engagement during AI-assisted text tasks, but this has not been done at scale.
Studies that separate interaction type from AI use: Shen & Tamkin’s six-pattern taxonomy is promising but needs replication across domains and professional expertise levels.
Long-term professional outcome studies: Tracking early-career professionals who learned their trade with vs. without AI over 5-10 years; the GitHub Copilot junior developer data is a preliminary signal but far too narrow.
The recovery question: If cognitive atrophy does occur, is it reversible with deliberate practice? The GPS research suggests some reversibility if navigation is practiced again; the equivalent for complex professional skills is unknown.
Domain-specific variation: The hollowed-mind literature focuses primarily on education and information work. Whether the same dynamics apply in design, strategy, or facilitation — where AI may stimulate rather than replace — is an open question.

Partial Approaches That Illuminate the Shape of the Problem

These do not resolve the void but demonstrate its contours:

Shen & Tamkin’s six interaction patterns: the right research direction — measuring how people interact with AI, not whether they use it. Tells us the variable worth measuring is cognitive engagement during AI use, not AI use per se.
The GPS natural experiment: demonstrates real cognitive substitution effects are empirically detectable in well-bounded domains with neural correlates. Shows the problem is tractable in narrow cases; highlights why complex professional cognition is harder.
Hernández-Orallo’s cognitive extender definition: “all positive effect is lost when the element is not present” — provides a crisp operational test for dependency. Its limitation is that it frames the outcome neutrally; for domains where human capability must survive AI removal (emergencies, failures, context switches), this operationalization correctly flags risk.
The extended/extracted distinction (Loock): philosophical vocabulary that clarifies the empirical question without answering it. Whether a given human-AI interaction is extending or extracting is an empirical question that requires access to internal cognitive processes — which is exactly what we lack.
The Cognitive Sustainability Index: best available practitioner proxy; should be treated as a behavioral governance tool rather than a measurement of the underlying cognitive question.

Relation to Site Tenets

This research bears directly on the “Symbiotic Intelligence over Automation” tenet (Tenet 3): the tenet asserts that the default goal is to expand human understanding and capability, not to replace human judgment or participation. The research identifies that this goal is not measurable with current tools — which means the tenet articulates an aspiration that cannot currently be verified in practice. This is a productive acknowledgment: it identifies why practitioner discipline (deliberate AI use, cognitive engagement monitoring, periodic AI-removal tests) matters as a substitute for measurement.

It also connects to “Human Intent First” (Tenet 1): if AI progressively hollows the cognitive processes that generate and specify intent, the tenet’s foundation erodes in ways that may be invisible to the practitioner. The atrophy invisibility problem makes intentional practice all the more important — the person cannot rely on felt experience to tell them when they have crossed from augmentation into substitution.

Suggested Next Steps (for task queue or follow-up research)

Concept candidate: “Cognitive coupling depth” — the degree to which internal processes are activated during AI-assisted work; distinguishes extending from extracting at the behavioral level
Gap candidate: The atrophy invisibility problem as a structural gap — current practitioner frameworks for assessing AI-augmented capability all have a blind spot here
Topic candidate: Design principles for “augmentation hygiene” — deliberate practices that maintain internal cognitive activation during AI-assisted work (expands on the “Copilot-free Fridays” and “structured AI interaction” patterns)
Cross-link: This research overlaps significantly with the existing cognitive-offloading concept note (obsidian/concepts/cognitive-offloading.md) and should be cross-referenced there

Source List

Source	URL	Type	Confidence
Shen & Tamkin — How AI Impacts Skill Formation (arXiv, 2026)	https://arxiv.org/abs/2601.20245	Empirical (RCT)	High
Frontiers — The Extended Hollowed Mind (2025)	https://pmc.ncbi.nlm.nih.gov/articles/PMC12738859/	Theoretical synthesis	High
Andy Clark — Extending Minds with Generative AI (Nature Communications, 2025)	https://www.nature.com/articles/s41467-025-59906-9	Philosophical essay (primary)	High
Loock — The Extracted Mind (Synthese, 2025)	https://link.springer.com/article/10.1007/s11229-025-04962-3	Philosophy paper	High
Springer — Does AI Assistance Accelerate Skill Decay? (2024)	https://pmc.ncbi.nlm.nih.gov/articles/PMC11239631/	Review/empirical	Medium-High
Microsoft Research — GenAI and Critical Thinking (CHI 2025)	https://www.microsoft.com/en-us/research/publication/the-impact-of-generative-ai-on-critical-thinking-self-reported-reductions-in-cognitive-effort-and-confidence-effects-from-a-survey-of-knowledge-workers/	Survey (n=319)	Medium
GitHub Copilot Longitudinal Study (arXiv, 2025)	https://arxiv.org/html/2509.20353v2	Longitudinal mixed methods	Medium
MDPI — Cognitive Atrophy Paradox (2025)	https://www.mdpi.com/2078-2489/16/11/1009	Theoretical/quantitative framework	Medium
Gerlich — AI Tools and Critical Thinking (MDPI Societies, 2025)	https://www.mdpi.com/2075-4698/15/1/6	Empirical (n=666)	Medium
Ganuthula — Paradox of Augmentation (SSRN, 2024)	https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4974044	Theoretical model	Medium
GPS Navigation Meta-analysis (ScienceDirect, 2024)	https://www.sciencedirect.com/science/article/pii/S0272494424001907	Meta-analysis	High
Hernández-Orallo — Enhancement and Assessment in the AI Age (2025)	https://journals.sagepub.com/doi/10.1177/18344909241309376	Applied philosophy	High

Tags: measurement, cognitive-augmentation, symbiosis, research-gap, evaluation

Research Notes - Measuring AI Augmentation vs. Cognitive Substitution: A Structural Void

Research: The Problem of Measuring Whether AI Collaboration Genuinely Expands Human Capability vs. Substitutes for It

Executive Summary

Key Sources

1. Shen & Tamkin — “How AI Impacts Skill Formation” (arXiv, 2026)

2. Frontiers in AI — “The Extended Hollowed Mind” (2025)

3. Andy Clark — “Extending Minds with Generative AI” (Nature Communications, 2025)

4. Loock — “The Extracted Mind” (Synthese, 2025)

5. Springer Cognitive Research Journal — “Does Using AI Assistance Accelerate Skill Decay and Hinder Skill Development Without Performers’ Awareness?” (2024)

6. Microsoft Research — “The Impact of Generative AI on Critical Thinking” (CHI 2025)

7. GitHub Copilot Longitudinal Study (arXiv, 2025)

8. MDPI — “Cognitive Atrophy Paradox of AI-Human Interaction” (2025)

9. Gerlich — “AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking” (MDPI Societies, 2025)

10. Venkat Ram Reddy Ganuthula — “The Paradox of Augmentation” (SSRN/Academy of Management, 2024)

11. GPS and Spatial Navigation Research (Meta-analysis, ScienceDirect, 2024)

12. Hernández-Orallo — “Enhancement and Assessment in the AI Age: An Extended Mind Perspective” (Journal of Pacific Rim Psychology, 2025)

Why This Is a Structural Void, Not Just a Hard Empirical Question

Problem 1: The Counterfactual Problem

Problem 2: The Atrophy Invisibility Problem

Problem 3: The Type-Confusion Problem

Problem 4: The Unit-of-Analysis Problem

Tensions and Disagreements

What Is Missing

Partial Approaches That Illuminate the Shape of the Problem

Relation to Site Tenets

Suggested Next Steps (for task queue or follow-up research)

Source List