from jogi@mur.at to [Observatory] When dirty.db get's dirty
Dear all,
as promised yesterday, here my little report regarding the broken
etherpad.
<md>
### When dirty.db get's dirty
When I got to WTC on Monday morning the etherpad on etherbox.local was disfunct. Later someone said that in fact etherpad had stopped working
the evening before, but it was unclear why. So I started looking around the pi's filesystem to find out what was wrong. Took me a while to
find the relevant lines in /var/log/syslog but it became clear that there was a problem with the database. Which database? Where does
etherpad 'live'? I found it in /opt/etherpad and in a subdirectory named var/ there it was: dirty.db, and dirty it was.
A first look at the file revealed no apparent problem. The last lines looked like this:
```
{"key":"sessionstorage:Ddy0gw7okwbkv5BzkR1DuSLCV_IA5_jQ","val":{"cookie
":{"path":"/","_expires":null,"originalMaxAge":null,"httpOnly":true,"secure":false}}}
{"key":"sessionstorage:AU1cffgcTf_q6BV9aIdAvES2YyXM7Gm1","val":{"cookie
":{"path":"/","_expires":null,"originalMaxAge":null,"httpOnly":true,"secure":false}}}
{"key":"sessionstorage:_H5SdUlDvQ3XCuPaZEXQ5lx0K6aAEJ9m","val":{"cookie
":{"path":"/","_expires":null,"originalMaxAge":null,"httpOnly":true,"se
cure":false}}}
```
What I did not see at the time was that there were some (AFAIR
something around 150) binary zeroes at the end of the file. I used
tail for the first look and that tool silently ignored the zeroes at
the end of the file. It was Martino who suggested using different
tools (xxd in that case) and that showed the cause of the problem. The
file looked something like this:
```
00013730: 6f6b 6965 223a 7b22 7061 7468 223a 222f okie":{"path":"/
00013740: 222c 225f 6578 7069 7265 7322 3a6e 756c ","_expires":nul
00013750: 6c2c 226f 7269 6769 6e61 6c4d 6178 4167 l,"originalMaxAg
00013760: 6522 3a6e 756c 6c2c 2268 7474 704f 6e6c e":null,"httpOnl
00013770: 7922 3a74 7275 652c 2273 6563 7572 6522 y":true,"secure"
00013780: 3a66 616c 7365 7d7d 7d0a 0000 0000 0000 :false}}}.......
00013790: 0000 0000 0000 0000 0000 0000 0000 0000 ................
```
So Anita, Martino and I stuck our heads together to come up with a
solution. Our first attempt to fix the problem went something like
this:
```
dd if=dirty.db of=dirty.db.clean bs=1 count=793080162
```
which means: write the first 793080162 blocks of size 1 byte to a new
file. After half an hour or so I checked on the size of the new file
and saw that some 10% of the copying had been done. No way this would
get done in time for the walk-in-clinic. Back to the drawing board.
Using a text editor was no real option btw since even vim has a hard
time with binary zeroes and the file was really big. But there was
hexedit! Martino installed it and copied dirty.db onto his
computer. After some getting used to the various commands to navigate
in hexedit the unwanted zeroes were gone in an instant. The end of the
file looked like this now:
```
00013730: 6f6b 6965 223a 7b22 7061 7468 223a 222f okie":{"path":"/
00013740: 222c 225f 6578 7069 7265 7322 3a6e 756c ","_expires":nul
00013750: 6c2c 226f 7269 6769 6e61 6c4d 6178 4167 l,"originalMaxAg
00013760: 6522 3a6e 756c 6c2c 2268 7474 704f 6e6c e":null,"httpOnl
00013770: 7922 3a74 7275 652c 2273 6563 7572 6522 y":true,"secure"
00013780: 3a66 616c 7365 7d7d 7d0a :false}}}.
```
Martino asked about the trailing '.' character and I checked a
different copy of the file. No '.' there, so that had to go too. My
biggest mistake in a long time! The '.' we were seeing in Martino's
copy of the file was in fact a '\n' (0a)! We did not realize that,
copied the file back to etherbox.local and waited for etherpad to
resume it's work. But no luck there, for obvious reasons.
We ended up making backups of dirty.db in various stages of deformation
and Martino started a brandnew pad so we could use pads for the walk-
in-clinic. The processing tool chain has been disabled btw. We did
not want to mess up any of the already generated .pdf, .html and .md
files.
We still don't know why exactly etherpad stopped working sometime
Sunday evening or how the zeroes got into the file dirty.db. Anita
thought that she caused the error when she adjusted time on
etherbox.local, but the logfile does not reflect that. The last clean
entry in /var/log/syslog regarding nodejs/etherpad is recorded with a
timestamp of something along the line of 'Jun 10 10:17'. Some minutes
later, around 'Jun 10 10:27' the first error appears. These timestamps
reflect the etherbox's understanding of time btw, not 'real time'.
It might be that the file just got too big for etherpad to handle
it. The size of the repaired dirty.db file was already 757MB. That
could btw explain why etherpad was working somewhat slugishly after
some days. There is still a chance that the time adjustment had an
unwanted side effect, but so far there is no obvious reason for what
had happened.
</md>
--
J.Hofmüller
http://thesix.mur.at/