SUMMARY This is a collection of tools that helps me digitizing books. In particular, it helps assembling a bunch of random page scans into a book with correct page order, mainly by using OCR and text (number) recognition. I use it to prepare my book releases on torrents. SYSTEM REQUIREMENTS Theoretically should work on any system that supports Python 3.9+ and has required dependencies, but might need some minor modifications in the code. Tested only on FreeBSD 13. DEPENDENCIES System utilities: - tesseract - pdftoppm Python packages: - pytesseract - Pillow AUTHORS rootless (c) 2023 LICENSE BSD-2-Clause
Description
Languages
Python
93.8%
Shell
6.2%