At my current role, I was tasked with the secure document archival and destruction process. This involved scanning documents before sending them to a shredding company.
- Scanned documents were sent to my workspace in the H: drive via a printer.
- I noticed that this process was time-consuming due to frequent printer jams.
- To minimize the impact of jams, I split documents into smaller stacks, reducing the number of pages needing rescanning if a jam occurred.
- However, the task of combining the multiple PDF files into one was also time-consuming. To address this, I developed a Python script that automated the combining process.
The script reduced this process of combining PDF files, moving to the archived file server, and clearing the H: drive from 1 minute to 9 seconds, resulting in a 85% improvement in efficiency.
This Python script automates the process of managing scanned PDF files. It:
- Combines all PDF files in a specified directory into a single PDF file.
- Saves the combined PDF file to a designated network share (another drive).
- Deletes the original PDF files from the source directory after processing.
- Automatically scans a directory for all PDF files.
- Combines PDFs into a single document with a user-specified filename.
- Moves the final combined file to a secure network archive.
- Cleans up the source directory by deleting the individual PDF files.
Run the script from the command line, passing the document number as an argument.
python scanscript.py 8993This will:
- Combine all PDFs in the source directory.
- Save the resulting file as
8993.pdfin the network archive. - Delete the original PDF files from the source directory.
- Python Version: Python 3.7 or higher
- Dependencies: Install required Python packages using pip:
pip install PyPDF2