[philiptellis] /bb|[^b]{2}/
Never stop Grokking

Friday, August 27, 2004

Undelete in FreeBSD

A colleague of mine deleted a source file he'd been working on for over a week.

How do you undelete a file on a UFS partition? I'd done it before on ext2, I'd also recovered lost files from a damaged FAT32 partition (one that no OS would recognise), heck, I'd even recovered an ext3 file system that had been overwritten by NTFS. Why should undelete be tough?

Well, the tough part was that in all earlier instances, I had a spare partition to play on, _and_ I had root (login, not sudo) on the box, so could effectively boot in single user mode and mount the affected partition read-only. Couldn't do that here. I'd have to work on a live partition. sweet.

The first thing to do was power down the machine (using the power switch) to prevent any writes to the disk via cron or anything else. We then set about trying to figure out our strategy. A couple of websites had tools that could undelete files, but they'd have to be installed on the affected partition, so that was out of the question.

Now the machine has two partitions, one for / and one for /home. / is significantly smaller than /home, but has enough space to play around 100MB at a time. Decided to give it a try, copying /home to /tmp 10MB at a time.

dd if=/dev/ad0s1e of=deleted-file bs=1024k count=10

searched through the 10MB file for a unique string that should have been in the file. No match. Next 10 MB:
dd if=/dev/ad0s1e of=deleted-file bs=1024k count=10 skip=10

This was obviously going to take all night, so we decided to script it. (the code is broken into multiple lines for readability, we actually had it all on one line).
for i in 10 20 30 40 50 60 70 80 90; do
    dd if=/dev/ad0s1e of=deleted-file bs=1024k count=10 skip=$i;
    grep unique-string deleted-file && echo $i

We'd note down the numbers that hit a positive and then go back and get those sections again. Painful.

Ran through without luck. Had to then go from 100 to 190, so scripted that too with an outer loop:

for j in 1 2 3 4 5 6 7 8 9; do
    for i in 00 10 20 30 40 50 60 70 80 90; do
        dd ..... of=deleted-file ... skip=$j$i; ...

The observant reader will ask why we didn't just put in an increment like i=$[ $i+10 ]
Well, that runs too fast, and we wouldn't be able to break out easily. We'd have to do a Ctrl+C for every iteration to break out. This way the sheer pain of having to type in every number we wanted was enough to keep the limits low. That wasn't the reason. We did it because it would also be useful when we had to test only specific blocks that didn't have successive numbers.

IAC, the number of loops soon increased to 3, and the script further evolved to this:

for s in 1 2 3 4; do
    for j in 0 1 2 3 4 5 6 7 8 9; do
        for i in 00 10 20 30 40 50 60 70 80 90; do
            dd if=/dev/ad0s1e of=deleted-file bs=1024k count=10 skip=$s$j$i &>/dev/null;
            grep unique-string deleted-file && echo $s$j$i

Pretty soon hit a problem when grep turned up an escape sequence that messed up the screen. Also decided that we may as well save all positive hit files instead of rebuilding them later, so... broke out of the loops, and changed the grep line to this:

grep -q unique-string deleted-file-$s$j$i || rm deleted-file-$s$j$i

Were reasonably happy with the script to leave it to itself now. Might have even changed the iteration to an auto-increment, except there was no point changing it now since what we had would work for the conceivable future (going into the 10's place would be as easy as changing s to 10 11 12... and we didn't expect to have to go much further than 12 because the partition didn't have that much used space).

We finally hit some major positives between 8700 and 8900. Then started the process of extracting the data. 10MB files are too big for editors, and contains mostly binary data that we could get rid off. There was also going to be a lot of false positives because the unique (to the project) string also showed up in some config files that hadn't been deleted.

First ran this loop:

for i in deleted-file-*; do strings $i | less; done

and tried to manually search for the data. Gave up very soon and changed it to this:

for i in deleted-file-*; do echo $i; strings $i | grep unique-string; done

This showed us all lines where unique-string showed up so we could eliminate files that had no interesting content.

We were finally left with 3 files of 10MB each and the task of extracting the deleted file from here.

The first thing was to find out where in the file the code was. We first tried this:

less deleted-file-8650

search for the unique string and scroll up to the start of the text we wanted. Ctrl+G told us the position into the file that we were at (as a percent of the total). Then scroll to the end and again find the percent.

Now, we were reading 10 blocks of 1MB so using the percentage range, could narrow that down to 1 block.

Again got a percentage value within this 1MB file, and now swapped the block size and count a bit. So we went from 1 block of 1024k to 256 blocks of 4k each. Also had to change the offset from 8650 to 256 times that much. bc came in handy here.

I should probably mention that at this point we'd taken a break and headed out to Guzzler's Inn for a couple of pitchers and to watch the olympics. 32/8 was a slightly hard math problem on our return. Yathin has a third party account of that.

We finally narrowed down the search to 2 2K sections and one 8K section, with about 100 bytes of binary content (all ASCII NULLs) at the end of one of the 2K sections. This section was the end of the file. Used gvim to merge the files into one 12K C++ file complete with copyright notice and all.

If you plan on doing this yourself, then change the three for loops to this:
while [ $i -lt 9000 ]; do
    dd ...
    i=$[ $i+10 ];

Secondly, you could save a lot of time by using grep -ab right up front so you'd get an actual byte count of where to start looking, and just skip the rest. Some people have suggested doing the grep -ab right on the filesystem, but that could generate more data than we could store (40GB partition, and only 200MB of space to store it on).


December 11, 2005 2:39 AM

Your friend should have been using a source code control system. There are a whole host of reasons for doing so, and avoiding this kind of hoopla is the least of them.

May 08, 2013 5:41 AM

Thanks alot man. Very interesting and useful.

Post a Comment