Document Processing Agent Example

This example demonstrates how to set up and use a document processing agent with the PilottAI framework.

Features

Text extraction from various document formats
Content analysis capabilities
Document summarization
Configurable processing tools

Setup

Install required dependencies:

pip install pilott

Configure your environment:

export OPENAI_API_KEY="your-api-key"

Tools Included

Text Extractor

Extracts text content from documents:

text_extractor = Tool(
    name="text_extractor",
    parameters={
        "file_path": "str",
        "format": "str"
    }
)

Content Analyzer

Analyzes document content:

content_analyzer = Tool(
    name="content_analyzer",
    parameters={
        "text": "str",
        "analysis_type": "str"
    }
)

Summarizer

Generates document summaries:

summarizer = Tool(
    name="summarizer",
    parameters={
        "text": "str",
        "max_length": "int"
    }
)

Quick Start

from pilottai import Serve
from pilottai.core import AgentConfig, LLMConfig

# Initialize and run
async def main():
    pilott = Serve(name="DocumentProcessor")

    # Add document processing agent
    doc_processor = await pilott.add_agent(
        role="document_processor",
        goal="Process documents efficiently",
        tools=["text_extractor", "content_analyzer", "summarizer"]
    )

    # Process a document
    task = {
        "type": "document_analysis",
        "document": {
            "path": "document.pdf",
            "type": "pdf"
        }
    }

    result = await pilott.execute([task])

Supported Document Types

PDF files
Text documents
Word documents (docx)
HTML files

Common Use Cases

Document Analysis

task = {
    "type": "document_analysis",
    "description": "Analyze quarterly report"
}

Text Extraction

task = {
    "type": "text_extraction",
    "document": {"path": "file.pdf"}
}

Content Summarization

task = {
    "type": "summarization",
    "document": {"path": "article.txt"}
}

Configuration Options

Customize agent behavior:

config = AgentConfig(
    role="document_processor",
    goal="Process documents efficiently",
    max_concurrent_tasks=5,
    task_timeout=300
)

Best Practices

Document Handling
- Validate document formats before processing
- Handle large documents in chunks
- Implement proper error handling
Performance
- Configure appropriate timeouts
- Use concurrent processing when possible
- Monitor memory usage for large documents
Error Handling
- Validate input documents
- Handle unsupported formats gracefully
- Implement retry logic for failed operations

Troubleshooting

Common issues and solutions:

File Access Errors
- Ensure proper file permissions
- Verify file paths are correct
- Check file format compatibility
Processing Timeouts
- Adjust task_timeout in configuration
- Process large documents in smaller chunks
- Monitor system resources

Example Output

# Example result
{
    'success': True,
    'output': {
        'summary': 'Document summary...',
        'analysis': 'Content analysis...',
        'metadata': {
            'pages': 5,
            'format': 'pdf',
            'processing_time': '2.3s'
        }
    }
}

Code

Ready to use code document_processor.py

Getting Started

Core Components

Orchestration

Tools

Contributing

Document processor

Document Processing Agent Example

Features

Setup

Tools Included

Text Extractor

Content Analyzer

Summarizer

Quick Start

Supported Document Types

Common Use Cases

Configuration Options

Best Practices

Troubleshooting

Example Output

Code

Getting Started

Core Components

Orchestration

Tools

Contributing

​Document Processing Agent Example

​Features

​Setup

​Tools Included

​Text Extractor

​Content Analyzer

​Summarizer

​Quick Start

​Supported Document Types

​Common Use Cases

​Configuration Options

​Best Practices

​Troubleshooting

​Example Output

​Code

Document Processing Agent Example

Features

Setup

Tools Included

Text Extractor

Content Analyzer

Summarizer

Quick Start

Supported Document Types

Common Use Cases

Configuration Options

Best Practices

Troubleshooting

Example Output

Code