Welcome to Constant Etherpad!

These pads are archived each night (around 4AM CET) @ http://etherdump.constantvzw.org/
An RSS feed from the etherdump also appears on http://constantvzw.org/

To prevent your public pad from appearing in the archive and RSS feed, put or just leave the following (including the surrounding double underscores) anywhere in the text of your pad:


Changes will be reflected after the next update at 4AM. The Death of the Authors, 1945
nne Frank pour Public Domain Day 2017
Git Repository: https://gitlab.constantvzw.org/death-of-the-authors/1945-Anne_Frank
Sources of texts: 
    Olivier Ertzscheid http://affordance.typepad.com//mon_weblog/2016/01/anne-frank.html

- Check browser type and add warning if not Chrome or Firefox.
- add recording conditions Voxforge: http://www.voxforge.org/home/dev/mansegaudio
- document & add source code + link on http://publicdomainday.constantvzw.org
- add reference Séverine
- test sound micro on computer Steph - ok
- buy new battery microphone (An)
- finalize css/images http://publicdomainday.constantvzw.org (Femke) 

To contribute to VoxForge, we have to make a post on the forum here: http://www.voxforge.org/home/audacity
Template: http://www.voxforge.org/home/audacity/audio-file-submission#rU976vlXOBuEIjX51SpZKA
Example: http://www.voxforge.org/home/audacity/audio-file-submission---spanish#7mzZmIMdnjf-nsu_LcS7FA

Toutes les ligatures ont sautées dans le OCR
Liste de remplacements:
    " jn" = "  mooi", " lm" = " film", " rma" = " firma", " ets" = " fiets", " a e" = "  afle", " uister" = " fluister", "proe es" = "proefles", " its" = "  flits", " ink " = " flink ", "a oopt" = "afloopt", " auw" = " flauw", "  ge irt" = " geflirt", "a oop" = "afloop", "kof e" = "koffie", " anel" =  "flanel", " guurlijk" = " figuurlijk", " uks" = " fluks", "zel ngenomen"  = "zelfingenomen", " uweel" = " fluweel", " ltreren" = " filtreren",  "ophef ng" = "opheffing", " irt" = " flirt", "Ongeloo ijk" =  "Ongelooflijk", "Magni ek" = "Magnifiek", " nesse" = " finesse",  "philoso e" = "philosofie", "biogra e" = "biografie", "saf aantjes" =  "????" (saffietjes ?, gerolde sigaretjes), "in atie" = "inflatie", "pam  et" = "pamflet", "proe anding" = "proeflanding", "of cier" = "officier",  aflopen, floot/fluiten, enfin, fles(je)  

version corrigée en tex: https://github.com/skadge/diary-anne-frank/blob/master/anne-frank.tex

Text-to-Speech tools:
- espeak
- espeak-ng
- festival
- say
- spd-say
- kdeaccessibility-jovie
- kdeaccessibility-kmouth
- Orca
- flite
- gespeaker (frontend for espeak
- blather (python + gstreamer)
- epos
- marytts
- ivona
- mbrola
- mimic
- python-pyttsx
- python2-pyvona
- python-espeak
- svox-pico-bin (on Android phones)
- praat (nl)

Speech recognition tools:
- julius
- freespeech-vr-devel
- htk
- opensmile
- pocketsphinx

NLP tools:
- Frog
- python-frog
- python-speechrecognition (Google-powered)
-  python2-gtts (interface to Google speech)

sudo apt-get install festival
Test your setup by typing in a Terminal 
You will be presented with a > prompt. Type  
The computer should say "hello". 
To listen to a text file named FILENAME, type  
Note FILENAME must be in quote marks. 
--> not for DUTCH

ESPEAK & mbrola -> will be the solution
sudo apt-get install espeak
espeak --stdout -f text.txt > text.wav

Dutch model was very old, it has not been updated for 5 years. I've just uploaded a new model on cmusphinx website.
It should be more accurate but still it is trained only with 13 hours  of data. English models are trained with 1000+ hours. We need more  transcribed Dutch data.

Someone made a language model for Dutch, published here:
    "VoxForge was set up to collect transcribed speech for use with Free and  Open SourceSpeech Recognition Engines (on Linux, Windows and Mac).  
We will make available all submitted audio files under the GPL license, and then 'compile' them into acoustic models for use with Open Source speech recognition engines such as CMU Sphinx, ISIP, Julius (github) and HTK (note: HTK has distribution restrictions). "

install Pocketsphinx
1. install Sphinxbase: https://github.com/cmusphinx/sphinx4
add dependencies: libtool, swig
follow instructions of README / make install (as root!)
./configure --enable-fixed
2. install Pocketsphinx
3. test installation Pocketsphinx: pocketsphinx_continuous -inmic yes - ok

TODO NEXT: look at Gijs' Obamascript

1. texte corrigé ok
2. espeak in NL + women voice + recording  ok
3. espeak in NL + women voice + whispering + recording ok -> this result is not understandable if you don't see the text
3.bis. record using mbrola voice with espeak
4. RESULT: we have wav file, to be passed in Pocketsphinx
5. write contextualising introduction
6. lay-out text file + introduction
7. Print min 2 books & upload pdf

Options: espeak reading html & including breaks, silences....
* names/titles/days-dates/'Lieve Kitty'/'Je Anne' in 'highlighted slower voice'
* add phonemes for frequently used words that have bad accents now
* punctuation: replace by words so we can find it back?

Lire le livre
durée: 7h de lecture (170 fragments)
construire la voix d'Anne Frank
website avec 1 fragment au hasard
contacter developeur voxforge NL

When you need to train
    You want to create an acoustic model for new language/dialect
    OR you need specialized model for small vocabulary application
    AND you have plenty of data to train on:
        1 hour of recording for command and control for single speaker
        5 hour of recordings of 200 speakers for command and control for many speakers
        10 hours of recordings for single speaker dictation
        50 hours of recordings of 200 speakers for many speakers dictation
    AND you have knowledge on phonetic structure of the language
    AND you have time to train the model and optimize parameters (1 month)

When you don't need to train
    You need to improve accuracy - do acoustic model adaptation instead
    You don't have enough data - do acoustic model adaptation instead
    You don't have enough time
    You don't have enough experience

Mail to Kmaclean@voxforge.org
We would like to produce a series of sound files by min 170 people based  on Anne Franks' diary. 
It would be great to contribute them to Voxforge. Could you let us know  what form you prefer for this? 
Rather sentences? At random? Or fragments of the text? 

For your information: 
We are two artists from Brussels, working exclusively with Free Software  and Free Licenses. 
For the Belgian Day of the Public Domain we contribute with an  installation based on Anne Frank's diary, in the series 'The Death of  the Authors': 
As the first version of the diary is officially in the public domain,  but called back by the 2 foundations based on specific copyright  conventions, we would like to call upon citation right in order to have  the book as a collection of speech fragments by a large group of people,  as such somehow reconstructing Anne Frank's voice by people who defend  her work as a call for peace. 
Ideally - in a second stage - we would like to use these recordings to  create an Anne Frank's Language Model for Speech-to-Text, but we are  looking for a person who has the right skills for that or might be happy  to include this into her/his research. Or guide us somehow. 

Instructions for useful recordings - to mention on the homepage/recording:
In Dutch: