Welcome to Etherpad!

This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!

Get involved with Etherpad at http://etherpad.org
*****************************
********** README ***********
*****************************

writing-with-film
=================

NOTE: all scripts are written in Python2.7 (because of Pattern of Python)

First install the dependencies:

>>> sudo apt-get install mongodb libsphinxbase1 swig



* virtual environment* 


>>> virtualenv venv
>>> . venv/bin/activate

you can then install the requirements, which will then only stay within this virtual environment. 
the following requirements are then installed: 


>>> pip install -r requirements.txt


*fromsrt.py*

The subtitle files are parsed in fromsrt.py and added to the database in the following formats:

sentence = {   
}

sentence['words'].append({
})

and then (at the very bottom of the script) added to the database.
edit the 'collectionname' in this line, to make different collections of srt files:



* database * 

The parsed .srt files are placed into a MongoDB database called 'algolit', which gives an interface to the database, and enables us to write specific queries later. 
To show all the databases in your Mongo installation, run: 

to enter the algolit database, run:
>>> mongo algolit

to show all the collections, run:
>>> show collections

to print all the items in the collection: 
>>> db.collectionname.find()


* sources * 

The video sources we used to built the vocabulary are listed here: 
http://pad.constantvzw.org/p/video-sources-links