Post

PayrollPilot - LLM Parsing + Accountant's World Automation

PayrollPilot - LLM Parsing + Accountant's World Automation

Open in Github Page Python Streamlit App Gemini API Playwright Status License

PayrollPilot is a production-grade automation system that uses LLMs (Google Gemini) and Playwright-based bots to parse messy payroll reports and upload them seamlessly to Accountant’s World. It was used in a live business, processing over 1,000 payrolls, saving $20K+ in labor costs, and generating $2K+ in new revenue.


πŸš€ Key Features

  • πŸ” LLM-powered Parsing: Converts messy payroll reports (PDF, RTF, Excel) into structured JSON.
  • πŸ“„ CSV Auto-Fill: Populates Accountant’s World-compatible CSV templates per client.
  • πŸ€– Portal Automation: Automates CSV uploads and tax payments to Accountant’s World (AW).
  • 🧠 Smart Field Detection: Dynamically maps earnings, deductions, and tax categories.
  • πŸ“‚ Batch Payroll Support: Processes multiple client folders in one click.
  • πŸ“Š Streamlit Interface: Simple UI to review extracted data and approve uploads.

πŸ“Œ Use Cases

  • Bookkeeping firms handling 50+ client payrolls
  • Automation-first accountants aiming to cut labor costs
  • LLM startups showcasing GenAI for operations
  • Any business tired of manual AW uploads

🧠 Tech Stack

  • Python 3.10+
  • Streamlit – UI
  • Playwright – Headless browser automation
  • Google Gemini API – Large Language Model parsing
  • pandas, PyMuPDF, python-docx, striprtf – Data + doc processing

🧰 Project Structure

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Payroll-LLM-Extractor/
β”‚
β”œβ”€β”€ streamlit_app.py # Main Streamlit app
β”œβ”€β”€ upload_runner.py # Automates payroll CSV uploads to AW
β”œβ”€β”€ upload_tax.py # Automates tax form filling on AW
β”‚
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ send_chunk_llm.py # Gemini API integration
β”‚ β”œβ”€β”€ excel_raw_text_chunk.py # Raw text extraction from Excel
β”‚ └── populate_csv_template.py # Populates CSV template
β”‚
β”œβ”€β”€ utils/
β”‚ β”œβ”€β”€ gemni_parser.py # Gemini prompt + parsing logic
β”‚ β”œβ”€β”€ extract_rtf_pdf.py # PDF/RTF chunker
β”‚ β”œβ”€β”€ label_analysis.py # Detects suspicious earnings
β”‚ β”œβ”€β”€ populate_csv.py # Final CSV output generator
β”‚
β”œβ”€β”€ agent_project/
β”‚ β”œβ”€β”€ agents.py # Playwright login + upload logic
β”‚ └── opt.py # Browser headless setup
β”‚
β”œβ”€β”€ misc/ # Experimental scripts + prompts
β”œβ”€β”€ requirements.txt
└── README.md

πŸ§ͺ How It Works

  1. Upload a client’s payroll file (PDF, RTF, Excel).
  2. Extracts text chunks β†’ src/excel_raw_text_chunk.py
  3. Sends chunks to Gemini β†’ src/send_chunk_llm.py
  4. Post-processes earnings/deductions β†’ utils/label_analysis.py
  5. Auto-fills CSV using template β†’ src/populate_csv_template.py
  6. Uploads CSV and fills taxes on AW β†’ upload_runner.py, upload_tax.py

▢️ Streamlit Demo

1
2
3
pip install -r requirements.txt
streamlit run streamlit_app.py

Make sure you have a .env file with your Gemini API key:

1
GEMINI_API_KEY=your_gemini_key_here

AW credentials are managed securely within the agent_project/ automation scripts or set manually before automation.


πŸ’‘ Created for a live payroll system in NY: 1,000+ payrolls processed, $20K+ labor saved, $2K+ new revenue.


🧭 Future Enhancements

βœ… SharePoint sync for final reports

βœ… AI-driven field mapping & validation

βœ… Full audit trail + override logging

βœ… Email notifications for payroll events

βš–οΈ License

MIT β€” free to use and modify with credit.

This post is licensed under CC BY 4.0 by the author.