A side project that I’ve completed recently is a set of Text To Speech (TTS) scripts to generate Audio Books. My work commute is between 1 – 1.5 hrs each day, which has given me the opportunity to listen to many books that I would like to read but rarely have time for. I’ve finished 3 audio books so far, but I realize that I will soon run out of interesting content to listen to (I’ve been listening to LibreVox free public domain audio books).
Exploring the world of Text To Speech (TTS) software led me to first examine espeak, which had too much of a robotic tone for my liking. I then stumbled upon Pico TTS on my cheap android tablet, which sounded too good to be true. Looking around, I found a Linux project that uses it, PicoSpeaker. Pico is a TTS solution from the company SVOX Mobile Voices, which apparently specializes in text to speech solutions for devices. I’m not sure how the product ended up in Linux as the packages sox and libttspico0, but they are there, and they work reasonably well. The frustrating problem I found, was that PicoSpeaker didn’t accept large files. So frustrating was this problem, that I continued to look around at different fixes.
I then checked out Festival, installed better voices, and still found the quality lacking in comparison to Pico TTS. I played with the gain, rate, pitch to make the different voices sound better to me, but it failed to make a difference (I tried out the MBROLA and CMU Arctic voices, samples here). Even though I could convert a complete file with these, they didn’t sound as good to my subjective ears.
To cut a long story short, much of my Saturday was spent on getting a text to speech solution that would help me convert Text Ebooks to Audiobooks. To fix the file size limitation problem, I split up the file into 100 line parts with:
split -l 100 -d -a 4 Ebook_ Text_To_Convert.txt
This creates a set of text files with no extension, starting at Ebook_0000. Next I created the following script, which I named convert.sh:
#!/bin/bash if [ $# -eq 0 ] then echo "Type the base name of the file to convert, followed by enter:" read name echo "Type name of author: " read author echo "Type name of book: " read book else name=$1 author=$2 book=$3 fi for f in $name*; do echo "Converting $f .." cat $f | ./picospeaker -o $f.ogg; echo "Now adding tag information" lltag --yes --clear -a "$author" -A "$book" -t "$f" $f.ogg done
I run this script by making the script executable (chmod +x convert.sh) and provide it with the base name (Ebook_ in this case), the title of the Author (“Henry Thoreau” for example), and the title of the book. Note that if any of those have spaces, you need to put the words in quotes.
The end result of the text to speech scripts is a pretty decent sounding audio book, that I speed up to play at 120% (with the -r 20 flag provided to picospeaker) with all of the words intelligible. Here is a 6 minute sample of the audio, uploaded on Picosong (Picosong seems to be like the imgur of audio links, pretty nice service). This is a sample of it as I like to listen to it.
You may need an additional step to convert the audio into an mp3 format, and to do that, add the following before lltag:
ffmpeg -i $f.ogg -ab 128k $f.mp3
So far I have listened to Henry Thoreau’s “Walden” and I feel like I could understand 99.9% of the words spoken. I have noticed that the text to speech can be a little buggy when it comes to tables, special characters, or any other strangely formatted text, but if that’s the price to pay to be able to listen to any text, then I’d gladly pay it.