Does Perplexity respect robots.txt?

Partially. Perplexity operates two crawlers: PerplexityBot (for AI search indexing, respects robots.txt) and Perplexity-User (triggered by user queries, does not follow robots.txt). If you block PerplexityBot in robots.txt, your content won't be indexed proactively, but it may still be fetched when a user's query triggers a direct retrieval. Cloudflare has documented instances of undeclared crawlers from Perplexity bypassing robots.txt restrictions.

How many sources does Perplexity cite per answer?

Perplexity averages roughly 22 citations per response — nearly 3x ChatGPT's rate (~8 citations). Every Perplexity answer includes numbered inline citations, making it the most citation-transparent AI platform. This higher citation volume means more opportunities for your content to appear as a source.

Does Perplexity Pro give different citations than the free version?

Perplexity Pro uses more advanced models and can process more complex queries, but the citation mechanics are similar. Pro may retrieve more sources and provide deeper answers, but the underlying signals that determine citation selection — recency, authority, content structure — remain the same across both tiers.

How quickly can my content appear in Perplexity results?

Perplexity searches the web for every query in real time, so new content can appear within days of publication — sometimes hours for high-authority domains. This is significantly faster than ChatGPT, where visibility depends more on training data cycles. Ensuring PerplexityBot can access your pages (check with Crawl Radar) is the prerequisite for fast indexing.

How to Get Cited by Perplexity

Perplexity is the most citation-transparent AI search platform. Every answer includes numbered inline citations — [1], [2], [3] — with each number linking directly to the source URL. It averages roughly 22 citations per response, nearly 3x ChatGPT's rate. And unlike ChatGPT or Gemini, Perplexity searches the web for every query in real time, meaning fresh content can appear in results within days of publication. For brands looking to build AI visibility, Perplexity is the most accessible entry point.

But getting cited by Perplexity requires passing two gates: retrieval selection (your page is chosen as a source) and answer absorption (your content is actually used in the generated answer). Here's how to pass both.

How PerplexityBot Works

Perplexity operates two crawlers:

PerplexityBot — The primary indexing crawler. User agent: PerplexityBot/1.0. Respects robots.txt. Crawls and indexes your site proactively. IP addresses published at perplexity.com/perplexitybot.json.
Perplexity-User — Triggered when a user's query requires real-time retrieval. Does not follow robots.txt. This crawler fetches content on-demand, similar to how a user clicking a link works.

This dual-crawler system means blocking PerplexityBot in robots.txt reduces your proactive indexation but doesn't prevent your pages from being retrieved when a user asks a relevant question. However, you want proactive indexation — it increases the likelihood your content is in the retrieval pool before anyone asks.

The prerequisite: If PerplexityBot can't reach your pages, your content won't be indexed. Check your robots.txt configuration and use Crawl Radar to verify access. Cloudflare WAF rules, bot protection, and CDN settings are common blockers.

What Perplexity Favours

Recency

Perplexity is the most recency-sensitive AI platform. It searches the web for every query, pulling from live sources rather than relying on training data. A page published in 2024 competes poorly against a 2026 version with updated data and examples.

Action: Update your key product, category, and comparison pages at least quarterly. Add timestamps or "last updated" dates — Perplexity's retrieval system uses freshness as a ranking signal.

Answer-First Content Structure

Perplexity prioritises content that immediately addresses the user's question. The BLUF (Bottom Line Up Front) principle applies — put the direct answer in the first 1-2 sentences of each section, then elaborate.

Action: Restructure key pages to lead with the answer, not the context. "The best CRM for small businesses in India is [category] because [reason]" beats "When choosing a CRM, there are many factors to consider."

Factual Density

Perplexity values content with specific, verifiable data points. Quantified claims ("rated 4.8/5 by 2,300 users"), comparison tables, and specification lists give Perplexity concrete information to cite. When you're the only source for a specific data point, Perplexity has no choice but to cite you.

Action: Add specific numbers, dates, and measurements to your content. Replace vague claims ("industry-leading") with verifiable ones ("processed 2.4M transactions in Q1 2026").

Schema Markup

Pages with properly implemented schema markup receive more citations. Article, FAQ, Product, and HowTo schema help Perplexity understand content structure without inferring it from raw text.

Action: Implement at minimum Article schema on blog posts, FAQ schema on FAQ pages, and Product schema on product pages. Use GEO Score to check your schema implementation.

Third-Party Presence

Nearly half of top Perplexity citations come from community and third-party sources — Reddit, review sites, industry publications. Perplexity doesn't just cite your website; it cites the ecosystem of sources that discuss you.

Action: Build authentic presence on relevant subreddits, review platforms (G2, Trustpilot, Amazon), and industry publications. Don't promote — contribute genuinely. Perplexity's retrieval system picks up organic mentions.

Technical Checklist

Run through this checklist to ensure PerplexityBot can access and parse your content:

robots.txt — Confirm PerplexityBot is not blocked. Check for blanket User-agent: * Disallow rules that might catch it.
Page load speed — Target under 2 seconds. Slow pages are deprioritised in real-time retrieval.
JavaScript rendering — Perplexity's crawler handles JavaScript, but server-rendered or static HTML content is more reliably indexed. If your key content loads via client-side JavaScript, verify it renders for bots.
Cloudflare / WAF — Bot protection rules may block or challenge PerplexityBot. Check your WAF logs for blocked requests from Perplexity's published IP ranges.
llms.txt — Create an llms.txt file at your domain root. While Perplexity hasn't confirmed reading it, the structured brand context it provides aids all AI platforms.
Sitemap.xml — Ensure your sitemap is current and includes all pages you want indexed by AI crawlers.

Perplexity vs Other Platforms

Understanding where Perplexity differs from other AI platforms helps you prioritise:

Signal	Perplexity	ChatGPT	Gemini
Recency weight	Very High	Low-Medium	High
Citation transparency	Inline numbered	Collapsible panel	Source cards
Citations per response	~22	~8	~8
Brand-owned site citations	Moderate	Low	~52%
Real-time web search	Every query	Some queries	Via Google index
Schema markup weight	Medium	Low	High

Perplexity rewards fresh, factually dense, answer-first content from accessible pages. If you optimise for Perplexity, you're building habits that transfer well to other platforms — but each platform has its own emphasis. See how the platforms differ for a full breakdown.

Key Takeaways

Perplexity is the most citation-transparent platform — ~22 inline citations per response, 3x ChatGPT's rate
It searches the web for every query in real time, making recency the strongest signal — update key pages quarterly
Two crawlers operate: PerplexityBot (respects robots.txt, proactive indexing) and Perplexity-User (real-time retrieval, ignores robots.txt)
Lead with direct answers (BLUF), add specific data points, and implement schema markup to maximise citation likelihood
Nearly half of top Perplexity citations come from third-party sources — Reddit, review sites, and publications matter as much as your own content
Check PerplexityBot access with Crawl Radar before optimising content — technical accessibility is the prerequisite