Files
gutenberg-mirror/README.md
2024-09-13 15:12:30 -07:00

1.4 KiB

Project Gutenberg Mirror

This project aims to create a simple, searchable, updatable mirror of Project Gutenberg's ebook collection. It downloads the HTML source for all books in the collection, converts them to ePub and Kindle formats, and provides a simple web interface to search for books and download them.

update-mirror.sh

This script downloads books (in HTML, with images) to the html-src directory. As of 13 September 2024, the collection includes 61878 titles and needs 133.2 GB of disk space.

It also downloads pg_catalog.csv, which provides the input to the next stage:

db_update.py

This script processes pg_catalog.csv, sorting its contents into a MariaDB database to facilitate searching: on authors, subjects, titles, etc. Database configuration is passed to it through environment variables.

gutenberg_mirror.sql

This is the database schema, which will need to be loaded into MariaDB before running db_update.py.

html_to_epub.sh

This script converts books from HTML to ePub and Kindle formats. It relies on a containerized version of Project Gutenberg's Ebookmaker tool. The server I'm using for this mirror is basically a Docker host with a bunch of containers on it. At some point, everything here will most likely get bundled up into one container for ease of installation.