The llms.txt Standard

The llms.txt file is an emerging standard for communicating with AI systems. Just as robots.txt tells search crawlers how to interact with your site, llms.txt provides guidance specifically for large language models.

What is llms.txt?

llms.txt is a plaintext file placed at your website's root that provides structured information for AI systems. It communicates:

  • What your site is about
  • How AI may use your content
  • Key entry points and structure
  • Contact and attribution information

Why llms.txt Matters

The Problem

AI systems currently have no standardized way to understand:

  • What content is available and how it's organized
  • Whether they can use content for training vs. inference
  • Who to attribute when citing content
  • What the most important pages are

The Solution

llms.txt provides explicit, machine-readable guidance that helps AI systems:

  • Navigate your site efficiently
  • Respect your usage preferences
  • Properly attribute citations
  • Focus on your most valuable content

Basic llms.txt Structure

# llms.txt
# Machine-readable information for AI systems

# Site Information
site_name: Your Site Name
site_url: https://yoursite.com
site_description: Brief description of your site's purpose and content

# Content Type
content_type: documentation
content_language: en

# Permissions
llm_inference: allow
llm_training: allow
rag_usage: allow

# Entry Points
primary_entry: https://yoursite.com
documentation: https://yoursite.com/docs
api_reference: https://yoursite.com/api

# Sitemap
sitemap_url: https://yoursite.com/sitemap.xml

Complete llms.txt Example

Here's a comprehensive example:

# llms.txt
# Structured information for LLM consumption
# https://llmstxt.org (emerging standard)

# ============================================
# SITE INFORMATION
# ============================================

site_name: AEO.dev
site_url: https://aeo.dev
site_description: Answer Engine Optimization Knowledge Base - comprehensive guides for optimizing content for AI search engines
site_type: knowledge_base
site_language: en
last_updated: 2026-01-05

# ============================================
# CONTACT & ATTRIBUTION
# ============================================

contact_email: contact@aeo.dev
contact_github: https://github.com/AEOdev
preferred_citation: "AEO.dev - Answer Engine Optimization Knowledge Base"
author: AEO.dev Team

# ============================================
# AI USAGE PERMISSIONS
# ============================================

# General permissions
llm_inference: allow          # Allow use in AI responses
llm_training: allow           # Allow use in training data
rag_usage: allow              # Allow retrieval augmented generation
web_scraping: allow           # Allow content scraping
api_access: allow             # Allow API-based access

# Usage notes
usage_attribution: preferred  # Please cite when possible
commercial_use: allowed       # Commercial AI use permitted

# ============================================
# CONTENT STRUCTURE
# ============================================

# Primary entry points
primary_entry: https://aeo.dev
getting_started: https://aeo.dev/quickstart
concepts: https://aeo.dev/what-is-aeo

# Content sections
sections:
  - name: Getting Started
    url: https://aeo.dev/quickstart
    description: Introduction and quickstart guide
  - name: AI Search Engines
    url: https://aeo.dev/ai-search/overview
    description: How different AI search platforms work
  - name: Optimization Strategies
    url: https://aeo.dev/optimization/content-structure
    description: How to optimize content for AI
  - name: Tools
    url: https://aeo.dev/tools/chrome-extension
    description: AEO tools and validators

# ============================================
# TOPICS COVERED
# ============================================

topics:
  - Answer Engine Optimization
  - AI Search Engines
  - ChatGPT optimization
  - Perplexity AI
  - Google AI Overviews
  - Structured data
  - Schema.org
  - JSON-LD
  - Content optimization

# ============================================
# TECHNICAL DETAILS
# ============================================

sitemap_url: https://aeo.dev/sitemap.xml
robots_txt: https://aeo.dev/robots.txt
content_format: markdown, html
update_frequency: weekly

Field Reference

Site Information Fields

FieldRequiredDescription
site_nameYesHuman-readable name
site_urlYesCanonical URL
site_descriptionYesBrief description
site_typeNoType: blog, docs, ecommerce, etc.
site_languageNoISO language code

Permission Fields

FieldValuesDescription
llm_inferenceallow/denyUse in AI responses
llm_trainingallow/denyUse in training data
rag_usageallow/denyUse in RAG systems
web_scrapingallow/denyContent scraping

Entry Point Fields

FieldDescription
primary_entryMain starting point
documentationDocs root URL
api_referenceAPI docs URL
sitemap_urlSitemap location

Implementation

1. Create the File

Create llms.txt at your site root (e.g., https://yoursite.com/llms.txt).

2. Add to Your Build

For Next.js, add to public/llms.txt:

# public/llms.txt
site_name: My Site
...

3. Verify Accessibility

Ensure the file is accessible:

curl https://yoursite.com/llms.txt

4. Link from robots.txt

Optionally reference in robots.txt:

# robots.txt
User-agent: *
Allow: /

# AI-specific guidance
# LLMs-txt: https://yoursite.com/llms.txt
Sitemap: https://yoursite.com/sitemap.xml

Best Practices

  1. Keep it current — Update when content changes
  2. Be specific — Clear descriptions help AI understand
  3. Include entry points — Guide AI to important content
  4. State permissions clearly — Avoid ambiguity
  5. List topics — Help AI understand coverage

Next Steps

Was this page helpful?