Text To Speech: Converting Text EBooks into Audiobooks

Text to Speech TTSA side project that I’ve completed recently is a set of Text To Speech (TTS) scripts to generate Audio Books.  My work commute is between 1 – 1.5 hrs each day, which has given me the opportunity to listen to many books that I would like to read but rarely have time for.  I’ve finished 3 audio books so far, but I realize that I will soon run out of interesting content to listen to (I’ve been listening to LibreVox free public domain audio books).

Exploring the world of Text To Speech (TTS) software led me to first examine espeak, which had too much of a robotic tone for my liking.  I then stumbled upon Pico TTS on my cheap android tablet, which sounded too good to be true.  Looking around, I found a Linux project that uses it, PicoSpeaker. Pico is a TTS solution from the company SVOX Mobile Voices, which apparently specializes in text to speech solutions for devices.  I’m not sure how the product ended up in Linux as the packages sox and libttspico0, but they are there, and they work reasonably well.  The frustrating problem I found, was that PicoSpeaker didn’t accept large files.  So frustrating was this problem, that I continued to look around at different fixes.

I then checked out Festival, installed better voices, and still found the quality lacking in comparison to Pico TTS.  I played with the gain, rate, pitch to make the different voices sound better to me, but it failed to make a difference (I tried out the MBROLA and CMU Arctic voices, samples here). Even though I could convert a complete file with these, they didn’t sound as good to my subjective ears.

To cut a long story short, much of my Saturday was spent on getting a text to speech solution that would help me convert Text Ebooks to Audiobooks.  To fix the file size limitation problem, I split up the file into 100 line parts with:

split -l 100 -d -a 4 Ebook_ Text_To_Convert.txt

This creates a set of text files with no extension, starting at Ebook_0000.  Next I created the following script, which I named convert.sh:

if [ $# -eq 0 ]
echo "Type the base name of the file to convert, followed by enter:"
read name
echo "Type name of author: "
read author
echo "Type name of book: "
read book
for f in $name*;
echo "Converting $f .."
cat $f | ./picospeaker -o $f.ogg;
echo "Now adding tag information"
lltag --yes --clear -a "$author" -A "$book" -t "$f" $f.ogg

I run this script by making the script executable (chmod +x convert.sh) and provide it with the base name (Ebook_ in this case), the title of the Author (“Henry Thoreau” for example), and the title of the book.  Note that if any of those have spaces, you need to put the words in quotes.

The end result of the text to speech scripts is a pretty decent sounding audio book, that I speed up to play at 120% (with the -r 20 flag provided to picospeaker) with all of the words intelligible. Here is a 6 minute sample of the audio, uploaded on Picosong (Picosong seems to be like the imgur of audio links, pretty nice service).  This is a sample of it as I like to listen to it.

You may need an additional step to convert the audio into an mp3 format, and to do that, add the following before lltag:

ffmpeg -i $f.ogg -ab 128k $f.mp3

So far I have listened to Henry Thoreau’s “Walden” and I feel like I could understand 99.9% of the words spoken.  I have noticed that the text to speech can be a little buggy when it comes to tables, special characters, or any other strangely formatted text, but if that’s the price to pay to be able to listen to any text, then I’d gladly pay it.


11 Responses to Text To Speech: Converting Text EBooks into Audiobooks

  1. coolreader says:

    Another option is to hook your android device to your car and use the free app “coolreader” to load your ebook and tell it to read out loud. coolreader will let you pick the reading speed and save your place that way you don’t need to pregenerate audio files and the ivona voices sound very nice

    • Dustin Reynolds says:

      That was an option I considered, but I decided against it since there are nice advantages to playing an audio book from an MP3 player, such as if I missed a word, I can easily seek back by pressing a physical button. With an mp3 player I can completely operate the device while keeping my eye’s on the road, without missing a beat.

  2. Vagner Rener says:

    Thx a lot 4 your help on making audiobooks. I have managed to and I am making an audio book from the project . But I would like to know how can I make Picospeak to speak out a “txt” file or perhaps a “pdf” one. I have tried it:

    $ picospeak -l en-GB the_linux_command_line.txt

    But picospeak speaks only that bit and not the file inside it. I also tried:

    $ nano the_linux_command_line.txt | picospeak -l en-GB

    But did not work either.

    Finally, I made a test of putting mp3 bits togheter after splitting them up with cat and as a test it worked, but I have to be sure that they will follow the book order:

    $ cat the_linux_command_line_1.mp3 the_linux_command_line_2.mp3 > the_linux_command_line_all.mp3

    The sound quality of picospeak is much better than easpeak and festival! Great article!

  3. Vagner Rener says:

    I forgot to mention the project address:

    The free project > the linux command line > William Shotts > http://linuxcommand.org

    That I am turning into an audio book for me to study

    • Dustin Reynolds says:

      I think you’ll have the most success by first getting all of the text that your interested into converting into a single text file. There are a few options you should consider, as discussed here at the askubuntu forums.

      Once you’ve got that, you can split it up into small bits that picospeaker can play:

      split -l 100 -d -a 4 Ebook_Text_To_Convert.txt

      If you noticed in my script, I used cat filename | ./picospeaker -o $f.ogg which pipes all of the text, not the filename, into picospeaker. I got the best results by piping in the text that I want picospeaker to play, into picospeaker.

  4. Vagner Rener says:

    Ok. I got that and after converting the pdf book with pdftotext. Then I split the “txt” file into many pieces and used your script to make “*.ogg” files. Now I am converting them into “*.mp3″ files with “winff”. After that I will try to use “cat” to make a single “*.mp3″ book file. But, my question was: can I make picospeak to speak the “txt” book file without converting it into “*.ogg” files as I can with “espeak” and “festival”. Like this:

    $ espeak -ven-gb -f the_linux_command_line.txt


    • Dustin Reynolds says:

      You can use mplayer to get it to play: cat Walden_0000 | ./picospeaker | mplayer –

  5. Eric says:


    I tried running this script, but it keeps throwing the “FIXME: File too large” error at me. I broke the files up with the script you wrote and made sure that the individual files were only 100 lines long.

    Any idea on what might be going on?

    • Dustin Reynolds says:

      The picopeaker script responds with the FIXME error when it runs pico2wave and encounters a problem. That problem could be that the file is too large or that the pico2wave binary isn’t installed.

      On Debian Wheezy I needed to install libttspico0, libttspico-data, libttspico-utils, libttspico-dev for the pico2wave binary to be installed.

      First I would recommend that you test the functionality of pico2wave by testing pico2wave directly: pico2wave -w test.wav “This is a test”. You can play it using: play test.wav.

  6. ron says:

    The current best text to speech software is Text Speaker. It has customizable pronunciation, reads anything on your screen, and it even has talking reminders. It is great for learning languages as it highlights the words as they are being read. The bundled voices are well priced and sound very human. Voices are available in English, French, Italian, Spanish, German, and more. Easily converts blogs, email, e-books, and more to MP3 or for listening instantly.

  7. D says:

    I too have tried picospeaker, and to be frank, the audio quality is *much* better using pico2wave. Indeed, sometimes picospeaker’s rendition is unintelligible. I do not know why this should be.

Add your thoughts