from jogi@mur.at to [Observatory] When dirty.db get's dirty Dear all, as promised yesterday, here my little report regarding the broken etherpad. ### When dirty.db get's dirty When I got to WTC on Monday morning the etherpad on etherbox.local was disfunct. Later someone said that in fact etherpad had stopped working the evening before, but it was unclear why. So I started looking around the pi's filesystem to find out what was wrong. Took me a while to find the relevant lines in /var/log/syslog but it became clear that there was a problem with the database. Which database? Where does etherpad 'live'? I found it in /opt/etherpad and in a subdirectory named var/ there it was: dirty.db, and dirty it was. A first look at the file revealed no apparent problem. The last lines looked like this: ``` {"key":"sessionstorage:Ddy0gw7okwbkv5BzkR1DuSLCV_IA5_jQ","val":{"cookie ":{"path":"/","_expires":null,"originalMaxAge":null,"httpOnly":true,"secure":false}}} {"key":"sessionstorage:AU1cffgcTf_q6BV9aIdAvES2YyXM7Gm1","val":{"cookie ":{"path":"/","_expires":null,"originalMaxAge":null,"httpOnly":true,"secure":false}}} {"key":"sessionstorage:_H5SdUlDvQ3XCuPaZEXQ5lx0K6aAEJ9m","val":{"cookie ":{"path":"/","_expires":null,"originalMaxAge":null,"httpOnly":true,"se cure":false}}} ``` What I did not see at the time was that there were some (AFAIR something around 150) binary zeroes at the end of the file. I used tail for the first look and that tool silently ignored the zeroes at the end of the file. It was Martino who suggested using different tools (xxd in that case) and that showed the cause of the problem. The file looked something like this: ``` 00013730: 6f6b 6965 223a 7b22 7061 7468 223a 222f okie":{"path":"/ 00013740: 222c 225f 6578 7069 7265 7322 3a6e 756c ","_expires":nul 00013750: 6c2c 226f 7269 6769 6e61 6c4d 6178 4167 l,"originalMaxAg 00013760: 6522 3a6e 756c 6c2c 2268 7474 704f 6e6c e":null,"httpOnl 00013770: 7922 3a74 7275 652c 2273 6563 7572 6522 y":true,"secure" 00013780: 3a66 616c 7365 7d7d 7d0a 0000 0000 0000 :false}}}....... 00013790: 0000 0000 0000 0000 0000 0000 0000 0000 ................ ``` So Anita, Martino and I stuck our heads together to come up with a solution. Our first attempt to fix the problem went something like this: ``` dd if=dirty.db of=dirty.db.clean bs=1 count=793080162 ``` which means: write the first 793080162 blocks of size 1 byte to a new file. After half an hour or so I checked on the size of the new file and saw that some 10% of the copying had been done. No way this would get done in time for the walk-in-clinic. Back to the drawing board. Using a text editor was no real option btw since even vim has a hard time with binary zeroes and the file was really big. But there was hexedit! Martino installed it and copied dirty.db onto his computer. After some getting used to the various commands to navigate in hexedit the unwanted zeroes were gone in an instant. The end of the file looked like this now: ``` 00013730: 6f6b 6965 223a 7b22 7061 7468 223a 222f okie":{"path":"/ 00013740: 222c 225f 6578 7069 7265 7322 3a6e 756c ","_expires":nul 00013750: 6c2c 226f 7269 6769 6e61 6c4d 6178 4167 l,"originalMaxAg 00013760: 6522 3a6e 756c 6c2c 2268 7474 704f 6e6c e":null,"httpOnl 00013770: 7922 3a74 7275 652c 2273 6563 7572 6522 y":true,"secure" 00013780: 3a66 616c 7365 7d7d 7d0a :false}}}. ``` Martino asked about the trailing '.' character and I checked a different copy of the file. No '.' there, so that had to go too. My biggest mistake in a long time! The '.' we were seeing in Martino's copy of the file was in fact a '\n' (0a)! We did not realize that, copied the file back to etherbox.local and waited for etherpad to resume it's work. But no luck there, for obvious reasons. We ended up making backups of dirty.db in various stages of deformation and Martino started a brandnew pad so we could use pads for the walk- in-clinic. The processing tool chain has been disabled btw. We did not want to mess up any of the already generated .pdf, .html and .md files. We still don't know why exactly etherpad stopped working sometime Sunday evening or how the zeroes got into the file dirty.db. Anita thought that she caused the error when she adjusted time on etherbox.local, but the logfile does not reflect that. The last clean entry in /var/log/syslog regarding nodejs/etherpad is recorded with a timestamp of something along the line of 'Jun 10 10:17'. Some minutes later, around 'Jun 10 10:27' the first error appears. These timestamps reflect the etherbox's understanding of time btw, not 'real time'. It might be that the file just got too big for etherpad to handle it. The size of the repaired dirty.db file was already 757MB. That could btw explain why etherpad was working somewhat slugishly after some days. There is still a chance that the time adjustment had an unwanted side effect, but so far there is no obvious reason for what had happened. -- J.Hofmüller http://thesix.mur.at/