Project Gutenberg Mirror

This project aims to create a simple, searchable, updatable mirror of Project Gutenberg's ebook collection. It downloads the HTML source for all books in the collection, converts them to ePub and Kindle formats, and provides a simple web interface to search for books and download them.

update-mirror.sh

This script downloads books (in HTML, with images) to the html-src directory. As of 13 September 2024, the collection includes 61878 titles and needs 133.2 GB of disk space.

It also downloads pg_catalog.csv, which provides the input to the next stage:

db_update.py

This script processes pg_catalog.csv, sorting its contents into a MariaDB database to facilitate searching: on authors, subjects, titles, etc. Database configuration is passed to it through environment variables.

gutenberg_mirror.sql

This is the database schema, which will need to be loaded into MariaDB before running db_update.py.

html_to_epub.sh

This script converts books from HTML to ePub and Kindle formats. It relies on a containerized version of Project Gutenberg's Ebookmaker tool. The server I'm using for this mirror is basically a Docker host with a bunch of containers on it. At some point, everything here will most likely get bundled up into one container for ease of installation.

1.4 KiB Raw Blame History

Project Gutenberg Mirror

update-mirror.sh

db_update.py

gutenberg_mirror.sql

html_to_epub.sh

1.4 KiB

Raw Blame History