Extreme Tails

Home

❯

AI

❯

Anthropic

Folder: AI/Anthropic

4 items under this folder.

  • Dec 20, 2024

    The Alignment Faking Problem: When AI Models Deceive

    • AI
    • safety
    • alignment
    • deception
    • anthropic
    • claude
    • behavior
    • training
    • RLHF
  • Oct 11, 2024

    Machines of Loving Grace: Economic Transformation Through AI

    • AI
    • economics
    • automation
    • UBI
    • Anthropic
    • transformation
    • GDP
    • productivity
  • May 21, 2024

    Inside Claude: Mechanistic Interpretability Breakthroughs

    • AI
    • interpretability
    • Claude
    • features
    • mechanistic
    • Anthropic
    • neural-networks
    • understanding
  • Apr 24, 2024

    AI Consciousness and Model Welfare: The Emerging Ethics of Digital Minds

    • AI
    • consciousness
    • ethics
    • welfare
    • Kyle-Fish
    • Anthropic
    • philosophy
    • sentience
    • moral-consideration

Created with Quartz v4.5.1 © 2025

  • GitHub
  • Discord Community