๐Ÿ“„ PageIndex API Documentation

Welcome to the PageIndex API documentation. PageIndex converts PDF documents into structured hierarchical tree representations optimized for agentic RAG. Learn more here (opens in a new tab).

See an example of the output structure generated by PageIndex from the example PDF document (2023 Annual Report of the Board of Governors of the Federal Reserve System).

If you have any questions, please join our Discord community (opens in a new tab).


๐Ÿ”‘ API Key

Get your API key here (opens in a new tab).


๐Ÿš€ Quickstart

The PageIndex API consists of two main endpoints:

  • Submit Endpoint (/pageindex): Submit a PDF document for processing.
  • Status Endpoint (/pageindex/status): Check processing status and retrieve results when completed.

Python Usage Example

import requests
 
# Submit PDF for processing
with open('./2023-annual-report.pdf', 'rb') as f:
    submit_response = requests.post(
        "https://api.vectify.ai/pageindex",
        headers={'api_key': 'YOUR_API_KEY_HERE'},
        files={'file': f}
    )
task_id = submit_response.json()["task_id"]
 
# Check processing status
status_response = requests.post(
    "https://api.vectify.ai/pageindex/status",
    headers={'api_key': 'YOUR_API_KEY_HERE'},
    json={"task_id": task_id}
)
status_data = status_response.json()
 
# Retrieve results when processing is complete
if status_data["status"] == "completed":
    print("Tree Structure Result:", status_data["result"])

๐Ÿ“ Notes

  • Currently supports PDF files only.
  • Future updates will include additional document formats, improved parsing, and enhanced database integration.

For support or feedback, please join our Discord community (opens in a new tab).