Project Gutenberg Mirror
This project aims to create a simple, searchable, updatable mirror of Project Gutenberg's ebook collection. It downloads the HTML source for all books in the collection, converts them to ePub and Kindle formats, and provides a simple web interface to search for books and download them.
update-mirror.sh
This script downloads books (in HTML, with images) to the html-src directory. As of 13 September 2024, the collection includes 61878 titles and needs 133.2 GB of disk space.
It also downloads pg_catalog.csv, which provides the input to the next stage:
db_update.py
This script processes pg_catalog.csv, sorting its contents into a MariaDB database to facilitate searching: on authors, subjects, titles, etc. Database configuration is passed to it through environment variables.
gutenberg_mirror.sql
This is the database schema, which will need to be loaded into MariaDB before running db_update.py.
html_to_epub.sh
This script converts books from HTML to ePub and Kindle formats. It relies on a containerized version of Project Gutenberg's Ebookmaker tool. The server I'm using for this mirror is basically a Docker host with a bunch of containers on it. At some point, everything here will most likely get bundled up into one container for ease of installation.