*****************************
********** README ***********
*****************************

writing-with-film
=================

NOTE: all scripts are written in Python2.7 (because of Pattern of Python)

First install the dependencies:

>>> sudo apt-get install mongodb libsphinxbase1 swig



* virtual environment* 


>>> virtualenv venv
>>> . venv/bin/activate

you can then install the requirements, which will then only stay within this virtual environment. 
the following requirements are then installed: 


>>> pip install -r requirements.txt


*fromsrt.py*

The subtitle files are parsed in fromsrt.py and added to the database in the following formats:

sentence = {   
}

sentence['words'].append({
})

and then (at the very bottom of the script) added to the database.
edit the 'collectionname' in this line, to make different collections of srt files:



* database * 

The parsed .srt files are placed into a MongoDB database called 'algolit', which gives an interface to the database, and enables us to write specific queries later. 
To show all the databases in your Mongo installation, run: 

to enter the algolit database, run:
>>> mongo algolit

to show all the collections, run:
>>> show collections

to print all the items in the collection: 
>>> db.collectionname.find()


* sources * 

The video sources we used to built the vocabulary are listed here: 
http://pad.constantvzw.org/p/video-sources-links