Hardware, Software & Product Development | Sparx EngineeringHardware, Software & Product Development | Sparx EngineeringHardware, Software & Product Development | Sparx EngineeringHardware, Software & Product Development | Sparx Engineering
  • Home
  • Expertise
    • Software Engineering
    • Electrical Engineering
    • Chemical Products and Services
    • Biomedical Engineering
    • Mechanical Engineering
    • Production Management
    • Automation
    • Industrial Design
  • Blog
  • About Us
NextPrevious

Text To Speech: Converting Text EBooks into Audiobooks

By dreynolds | Miscellaneous, Software | 12 comments | 29 January, 2014 | 0

Text to Speech TTSA side project that I’ve completed recently is a set of Text To Speech (TTS) scripts to generate Audio Books.  My work commute is between 1 – 1.5 hrs each day, which has given me the opportunity to listen to many books that I would like to read but rarely have time for.  I’ve finished 3 audio books so far, but I realize that I will soon run out of interesting content to listen to (I’ve been listening to LibreVox free public domain audio books).

Exploring the world of Text To Speech (TTS) software led me to first examine espeak, which had too much of a robotic tone for my liking.  I then stumbled upon Pico TTS on my cheap android tablet, which sounded too good to be true.  Looking around, I found a Linux project that uses it, PicoSpeaker. Pico is a TTS solution from the company SVOX Mobile Voices, which apparently specializes in text to speech solutions for devices.  I’m not sure how the product ended up in Linux as the packages sox and libttspico0, but they are there, and they work reasonably well.  The frustrating problem I found, was that PicoSpeaker didn’t accept large files.  So frustrating was this problem, that I continued to look around at different fixes.

I then checked out Festival, installed better voices, and still found the quality lacking in comparison to Pico TTS.  I played with the gain, rate, pitch to make the different voices sound better to me, but it failed to make a difference (I tried out the MBROLA and CMU Arctic voices, samples here). Even though I could convert a complete file with these, they didn’t sound as good to my subjective ears.

To cut a long story short, much of my Saturday was spent on getting a text to speech solution that would help me convert Text Ebooks to Audiobooks.  To fix the file size limitation problem, I split up the file into 100 line parts with:

1
split -l 100 -d -a 4 Ebook_ Text_To_Convert.txt

This creates a set of text files with no extension, starting at Ebook_0000.  Next I created the following script, which I named convert.sh:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/bin/bash
if [ $# -eq 0 ]
then
echo "Type the base name of the file to convert, followed by enter:"
read name
echo "Type name of author: "
read author
echo "Type name of book: "
read book
else
name=$1
author=$2
book=$3
fi
for f in $name*;
do
echo "Converting $f .."
cat $f | ./picospeaker -o $f.ogg;
echo "Now adding tag information"
lltag --yes --clear -a "$author" -A "$book" -t "$f" $f.ogg
done

I run this script by making the script executable (chmod +x convert.sh) and provide it with the base name (Ebook_ in this case), the title of the Author (“Henry Thoreau” for example), and the title of the book.  Note that if any of those have spaces, you need to put the words in quotes.

The end result of the text to speech scripts is a pretty decent sounding audio book, that I speed up to play at 120% (with the -r 20 flag provided to picospeaker) with all of the words intelligible. Here is a 6 minute sample of the audio, uploaded on Picosong (Picosong seems to be like the imgur of audio links, pretty nice service).  This is a sample of it as I like to listen to it.

You may need an additional step to convert the audio into an mp3 format, and to do that, add the following before lltag:

1
ffmpeg -i $f.ogg -ab 128k $f.mp3

So far I have listened to Henry Thoreau’s “Walden” and I feel like I could understand 99.9% of the words spoken.  I have noticed that the text to speech can be a little buggy when it comes to tables, special characters, or any other strangely formatted text, but if that’s the price to pay to be able to listen to any text, then I’d gladly pay it.

Audio, bash, EBook, Linux, open source, scripts, TTS

dreynolds

More posts by dreynolds

Related Posts

  • Reading line-by-line from a serial port (or other byte-oriented stream)

    By Ben Voigt | 10 comments

    With many .NET developers moving from the traditional (and broken) System.IO.Ports.SerialPort DataReceived event handling to either the correct and more efficient BaseStream.BeginRead / BaseStream.EndRead pair I promoted in my last post or the newer BaseStream.ReadAsyncRead more

  • How to Model NPT Threads in Solidworks

    By rmontifar | 2 comments

    National Pipe Thread Taper or NPT threaded pipes and fittings are deployed in a variety of fields where transportation or containment of liquids, gases, steam, or hydraulic fluid is required. The NPT geometry allows internalRead more

  • Multi-Tiered Linux Backup System – Part I

    By dreynolds | 0 comment

    Backing up important data and memories is an important task that should not be neglected. Just as important as performing Linux backups is verifying that the backups made are good and can be used toRead more

  • Clojure: An improved workflow

    By dfohl | 0 comment

    Like many beginning Clojure programmers, I started off following Stuart Sierra’s “Reloaded” workflow guide. While it was a great starting point, there were a number of things that I wanted to change. If the projectRead more

  • An Engineers Review of the Fitbit Charge HR

    By jhenry | 8 comments

    I purchased a Fitbit Charge HR to help keep me active.  I’m a bit of a tech geek and follow the fitness wearable industry very closely.  I’m also an engineer, so I like data.  AtRead more

12 comments

  • coolreader Reply February 23, 2014 at 11:10 am

    Another option is to hook your android device to your car and use the free app “coolreader” to load your ebook and tell it to read out loud. coolreader will let you pick the reading speed and save your place that way you don’t need to pregenerate audio files and the ivona voices sound very nice

    • Dustin Reynolds Reply February 24, 2014 at 8:38 am

      That was an option I considered, but I decided against it since there are nice advantages to playing an audio book from an MP3 player, such as if I missed a word, I can easily seek back by pressing a physical button. With an mp3 player I can completely operate the device while keeping my eye’s on the road, without missing a beat.

  • Vagner Rener Reply May 2, 2014 at 3:28 pm

    Thx a lot 4 your help on making audiobooks. I have managed to and I am making an audio book from the project . But I would like to know how can I make Picospeak to speak out a “txt” file or perhaps a “pdf” one. I have tried it:

    $ picospeak -l en-GB the_linux_command_line.txt

    But picospeak speaks only that bit and not the file inside it. I also tried:

    $ nano the_linux_command_line.txt | picospeak -l en-GB

    But did not work either.

    Finally, I made a test of putting mp3 bits togheter after splitting them up with cat and as a test it worked, but I have to be sure that they will follow the book order:

    $ cat the_linux_command_line_1.mp3 the_linux_command_line_2.mp3 > the_linux_command_line_all.mp3

    The sound quality of picospeak is much better than easpeak and festival! Great article!

  • Vagner Rener Reply May 2, 2014 at 3:34 pm

    I forgot to mention the project address:

    The free project > the linux command line > William Shotts > http://linuxcommand.org

    That I am turning into an audio book for me to study

    • Dustin Reynolds Reply May 2, 2014 at 3:55 pm

      I think you’ll have the most success by first getting all of the text that your interested into converting into a single text file. There are a few options you should consider, as discussed here at the askubuntu forums.

      Once you’ve got that, you can split it up into small bits that picospeaker can play:

      split -l 100 -d -a 4 Ebook_Text_To_Convert.txt

      If you noticed in my script, I used cat filename | ./picospeaker -o $f.ogg which pipes all of the text, not the filename, into picospeaker. I got the best results by piping in the text that I want picospeaker to play, into picospeaker.

  • Vagner Rener Reply May 2, 2014 at 5:04 pm

    Ok. I got that and after converting the pdf book with pdftotext. Then I split the “txt” file into many pieces and used your script to make “*.ogg” files. Now I am converting them into “*.mp3” files with “winff”. After that I will try to use “cat” to make a single “*.mp3” book file. But, my question was: can I make picospeak to speak the “txt” book file without converting it into “*.ogg” files as I can with “espeak” and “festival”. Like this:

    $ espeak -ven-gb -f the_linux_command_line.txt

    🙂

    • Dustin Reynolds Reply May 2, 2014 at 5:18 pm

      You can use mplayer to get it to play: cat Walden_0000 | ./picospeaker | mplayer –

  • Eric Reply October 28, 2014 at 9:29 pm

    Hi,

    I tried running this script, but it keeps throwing the “FIXME: File too large” error at me. I broke the files up with the script you wrote and made sure that the individual files were only 100 lines long.

    Any idea on what might be going on?

    • Dustin Reynolds Reply November 3, 2014 at 4:08 pm

      The picopeaker script responds with the FIXME error when it runs pico2wave and encounters a problem. That problem could be that the file is too large or that the pico2wave binary isn’t installed.

      On Debian Wheezy I needed to install libttspico0, libttspico-data, libttspico-utils, libttspico-dev for the pico2wave binary to be installed.

      First I would recommend that you test the functionality of pico2wave by testing pico2wave directly: pico2wave -w test.wav “This is a test”. You can play it using: play test.wav.

  • ron Reply December 11, 2015 at 12:44 am

    The current best text to speech software is Text Speaker. It has customizable pronunciation, reads anything on your screen, and it even has talking reminders. It is great for learning languages as it highlights the words as they are being read. The bundled voices are well priced and sound very human. Voices are available in English, French, Italian, Spanish, German, and more. Easily converts blogs, email, e-books, and more to MP3 or for listening instantly.
    http://www.deskshare.com/text-to-speech-software.aspx

  • D Reply February 28, 2016 at 10:23 am

    I too have tried picospeaker, and to be frank, the audio quality is *much* better using pico2wave. Indeed, sometimes picospeaker’s rendition is unintelligible. I do not know why this should be.

  • Ben Reply February 17, 2022 at 2:52 am

    For Converting Text EBooks into Audiobooks , You should try “Text Speaker” – it is the best text to speech app. This app reads aloud my files in human sounding voices. The best feature is the large selection of voices. The Mp3 file creation feature is excellent – it sounds awesome and lets me load the audio onto my mobile device for listening on-the-go. I think this is a useful app for everyone.https://www.deskshare.com/text-to-speech-software.aspx

Leave a Comment

Cancel reply

Your email address will not be published. Required fields are marked *

NextPrevious
  • Home
  • Expertise
  • Blog
  • About Us
Sparx Technologies, LLC. dba Sparx Engineering © 2009 - 2021 | All Rights Reserved
  • Home
  • Expertise
    • Software Engineering
    • Electrical Engineering
    • Chemical Products and Services
    • Biomedical Engineering
    • Mechanical Engineering
    • Production Management
    • Automation
    • Industrial Design
  • Blog
  • About Us
Hardware, Software & Product Development | Sparx Engineering