Tokenizing
index

deduct start from end time = length of sentence
string length: put time on letter
calculate at what position it starts + bit time before/after
-> numbers are confusing - rewrite numbers as words
-> weight on punctuation