August 12, 2014

Hacklog #4: Dump format and YAFFS

Leitmotif

Well over twenty years ago, the movie Jurassic Park came out. I remember there was a lot of exposition in the first act. In particular, there's a scene where a small group of people is given access -for the first time!- to an area of the island where very large dinosaurs roam free.

Among them is a man who is some sort of scientist and definitely not a people person. They happen upon a dino-sized dump of dino poo and yer man deadpans:

That is one big pile of shit.

"That is one big pile of shit"

He's not alone, of course. Accompanying him on the tour is Dr Sattler, a paleobotanist. Dr Sattler takes a keen interest in how plant life has been recreated in the park and seizes the opportunity to explore the dump:

"Reverse-engineering"

That scene, and this picture in particular, summarises this whole post.

Reverse engineering is a dirty job but someone's gotta do it.

The Madcap YAFFS

So I finally have a raw Flash ROM dump of the partition on which my phone's /system filesystem lives. I'm no longer constrained by file permissions - everything on that filesystem I now have access to.

I only need to figure out how.

Context

As I saw in the output of mount(8), /system is a yaffs2 filesystem, mounted read-only:

$ mount

/dev/block/mtdblock11 /system yaffs2 ro,noatime 0 0

YAFFS2 is the second revision of YAFFS, a Flash-optimised filesystem that's been around a while.

My first idea was to open the /system dump with a YAFFS access tool. None of them worked with the dumps I got out of my phone and this little project lost momentum and got stalled there for almost three months.

I eventually picked it up again and decided to do this the hard way. In order to access the filesystem's contents out-of-band, I had to spend time reading and digesting both filesystems' specification documents. The YAFFS2 spec is not a standalone document so familiarity with YAFFS stuff is pretty much required.

A YAFFS primer

The basic unit of data storage in YAFFS is a chunk. A chunk may contain either filesystem stuff of the kind displayed by stat(1) (viz. inodes, file names, permissions, etc.) or file data.

Within the scope of the YAFFS documentation, filesystem entities such as files and directories are called objects. Every object has its metadata stored in a dedicated chunk in a data structure the YAFFS documentation calls an object header

The format for object headers is defined in yaffs_guts.h in the form of this plain old C struct:

struct yaffs_obj_hdr {
        enum yaffs_obj_type type;

        /* Apply to everything  */
        int parent_obj_id;
        u16 sum_no_longer_used; /* checksum of name. No longer used */
        YCHAR name[YAFFS_MAX_NAME_LENGTH + 1];

        /* The following apply to all object types except for hard links */
        u32 yst_mode;           /* protection */

        u32 yst_uid;
        u32 yst_gid;
        u32 yst_atime;
        u32 yst_mtime;
        u32 yst_ctime;

        /* File size  applies to files only */
        u32 file_size_low;

        /* Equivalent object id applies to hard links only. */
        int equiv_id;

        /* Alias is for symlinks only. */
        YCHAR alias[YAFFS_MAX_ALIAS_LENGTH + 1];

        u32 yst_rdev;   /* stuff for block and char devices (major/min) */

        u32 win_ctime[2];
        u32 win_atime[2];
        u32 win_mtime[2];

        u32 inband_shadowed_obj_id;
        u32 inband_is_shrink;

        u32 file_size_high;
        u32 reserved[1];
        int shadows_obj;        /* This object header shadows the
                                specified object if > 0 */

        /* is_shrink applies to object headers written when wemake a hole. */
        u32 is_shrink;

};

As the YAFFS documentation points out, this is not just an in-memory structure, but also the format in which header information is stored on the NAND (ie. on the raw MTD). This was my starting point.

A given file may be split across several data chunks, independently of its header chunk. Since the object header format doesn't have any references to other chunks, it follows that some sort of metadata is needed around data chunks for the filesystem implementation to know what file a given chunk belongs to, and where that chunk falls in the ordered list of data chunks that comprise the whole file.

This data is stored in what the YAFFS docs call spare data. The docs mention that spare data is interleaved with "actual" chunk data but doesn't say much more beyond that.

The YAFFS2 document on the author's website is very vague when it comes to the format of spare data. In the original YAFFS docs, spare data is said to have a fixed sized of 16 bytes per chunk, each comprised of 8 bytes of a packed data structure referred to as of tags, 6 bytes of ECC redundancy bits for for the chunk, a 1-byte block damaged chunk flag and an unused 16th byte.

The tags data structure is as follows:

struct yaffs_tags {
        u32 chunk_id:20;
        u32 serial_number:2;
        u32 n_bytes_lsb:10;
        u32 obj_id:18;
        u32 ecc:12;
        u32 n_bytes_msb:2;
};

Let's see what we have here:

  • obj_id - That's the object identifier - this lets us know which filesystem "entity" a given chunk belongs to. I think of that numeric identifier as an inode number.

  • chunk_id - This is effectively the chunk's position in the ordered list of chunks that, together, hold the file's contents.

  • n_bytes_lsb and n_bytes_msb - Together, these tell us how many bytes of actual file data are in the chunk. Unless a file's size happens to be a whole multiple of the chunk size, we'll need to know where to cut off the runt chunk.

  • ecc - Just some extra redundancy bits derived from the tags themselves. This isn't very interesting to me.

  • serial_number - Who knows?

Interestingly, the number of bytes in a chunk is stored on 10 + 2 bits, meaning that there can be up to 4095.

Exploring the dump

I now have some idea of what I can expect to find in the dump. As I mentioned in the first post, I am particularly interested in a file named /system/xbin/tcpdump. I can expect to find a YAFFS object header for this file as well as for its parent dir xbin.

Object Headers

Let's get greppin', yo!

647-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ for offset in $(grep -abo tcpdump ../images/system.img  | cut -d':' -f1); do xxd -g4 -s ${offset} -l 16 ../images/system.img; done
510782b: 74637064 756d7000 0c0c7463 7064756d  tcpdump...tcpdum
5107835: 74637064 756d7020 73746f70 000a0a70  tcpdump stop...p
5107a9b: 74637064 756d7000 0b0b7463 7064756d  tcpdump...tcpdum
5107aa5: 74637064 756d705f 656e6400 0a0a7068  tcpdump_end...ph
6384cf8: 74637064 756d7000 636f6d6d 616e646c  tcpdump.commandl
6384d9b: 74637064 756d7020 70696420 3d202564  tcpdump pid = %d
6384dd1: 74637064 756d7020 73657276 69636520  tcpdump service
6384df0: 74637064 756d7020 70617261 6d657465  tcpdump paramete
6385854: 74637064 756d705f 72657375 6c745f6c  tcpdump_result_l
6385869: 74637064 756d702d 72657375 6c742d25  tcpdump-result-%
6385884: 74637064 756d7000 2d767600 2d733000  tcpdump.-vv.-s0.
63858c1: 74637064 756d705f 72657375 6c745f6c  tcpdump_result_l
64d4b4a: 74637064 756d7000 00000000 00000000  tcpdump.........
655482c: 74637064 756d703a 20436f75 6c646e27  tcpdump: Couldn'
6554868: 74637064 756d703a 20436f75 6c646e27  tcpdump: Couldn'
6ba4412: 74637064 756d7000 73797374 656d2f62  tcpdump.system/b

I've found 16 matches for the string tcpdump. Looking at the bytes that follow, most of these occurrences seem to be part of longer strings (eg. the match at 0x6554868) or to be part of a cluster of NULL-terminated strings (0x5107a9b).

One match stands out, however. The occurrence of tcpdump at 0x64d4b4a is followed by a bunch of NULLs. I know from the struct yaffs_obj_hdr definition that a YAFFS object's name in the filesystem is stored in a fixed-length character array, therefore seeing tcpdump padded with several consecutive NULLs is consistent with what I would expect to find in a header.

Let's have a closer look at the area of the dump around 0x64d4b4a:

670-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ xxd -g4 -s $(printf '%d' 0x64d4b00) -l 3000 ../images/system.img
64d4b00: ffffffff ffffffff ffffffff ffffffff  ................
64d4b10: ffffffff ffffffff ffffffff ffffffff  ................
64d4b20: ffffffff ffffffff ffffffff ffffffff  ................
64d4b30: ff000000 f2aaaaeb b93e2707 919cfcff  .........>'.....
64d4b40: 01000000 26020000 ffff7463 7064756d  ....&.....tcpdum
64d4b50: 70000000 00000000 00000000 00000000  p...............
64d4b60: 00000000 00000000 00000000 00000000  ................
64d4b70: 00000000 00000000 00000000 00000000  ................
64d4b80: 00000000 00000000 00000000 00000000  ................
64d4b90: 00000000 00000000 00000000 00000000  ................
64d4ba0: 00000000 00000000 00000000 00000000  ................
64d4bb0: 00000000 00000000 00000000 00000000  ................
64d4bc0: 00000000 00000000 00000000 00000000  ................
64d4bd0: 00000000 00000000 00000000 00000000  ................
64d4be0: 00000000 00000000 00000000 00000000  ................
64d4bf0: 00000000 00000000 00000000 00000000  ................
64d4c00: 00000000 00000000 00000000 00000000  ................
64d4c10: 00000000 00000000 00000000 00000000  ................
64d4c20: 00000000 00000000 00000000 00000000  ................
64d4c30: 00000000 00000000 00000000 00000000  ................
64d4c40: 00000000 00000000 00ffffff ed810000  ................
64d4c50: 00000000 00000000 65cc5b51 65cc5b51  ........e.[Qe.[Q
64d4c60: 66cc5b51 846b0900 ffffffff ffffffff  f.[Q.k..........
64d4c70: ffffffff ffffffff ffffffff ffffffff  ................
64d4c80: ffffffff ffffffff ffffffff ffffffff  ................
64d4c90: ffffffff ffffffff ffffffff ffffffff  ................
64d4ca0: ffffffff ffffffff ffffffff ffffffff  ................
64d4cb0: ffffffff ffffffff ffffffff ffffffff  ................
64d4cc0: ffffffff ffffffff ffffffff ffffffff  ................
64d4cd0: ffffffff ffffffff ffffffff ffffffff  ................
64d4ce0: ffffffff ffffffff ffffffff ffffffff  ................
64d4cf0: ffffffff ffffffff ffffffff ffffffff  ................
64d4d00: ffffffff ffffffff ffffffff 00000000  ................
64d4d10: ffffffff ffffffff ffffffff ffffffff  ................
64d4d20: ffffffff ffffffff ffffffff ffffffff  ................
64d4d30: ffffffff ffffffff ffffffff ffffffff  ................
64d4d40: ff001000 00280200 8bd2822f 770cfcff  .....(...../w...
64d4d50: ffffffff ffffffff ffffffff ffffffff  ................
64d4d60: ffffffff ffffffff ffffffff ffffffff  ................
64d4d70: ffffffff ffffffff ffffffff ffffffff  ................
64d4d80: ffffffff ffffffff ffffffff ffffffff  ................
64d4d90: ffffffff ffffffff ffffffff ffffffff  ................
64d4da0: ffffffff ffffffff ffffffff ffffffff  ................
64d4db0: ffffffff ffffffff ffffffff ffffffff  ................
64d4dc0: ffffffff ffffffff ffffffff ffffffff  ................
64d4dd0: ffffffff ffffffff ffffffff ffffffff  ................
64d4de0: ffffffff ffffffff ffffffff ffffffff  ................
64d4df0: ffffffff ffffffff ffffffff ffffffff  ................
64d4e00: ffffffff ffffffff ffffffff ffffffff  ................
64d4e10: ffffffff ffffffff ffffffff ffffffff  ................
64d4e20: ffffffff ffffffff ffffffff ffffffff  ................
64d4e30: ffffffff ffffffff ffffffff ffffffff  ................
64d4e40: ffffffff ffffffff ffffffff ffffffff  ................
64d4e50: ffffffff ffffffff ffffffff ffffffff  ................
64d4e60: ffffffff ffffffff ffffffff ffffffff  ................
64d4e70: ffffffff ffffffff ffffffff ffffffff  ................
64d4e80: ffffffff ffffffff ffffffff ffffffff  ................
64d4e90: ffffffff ffffffff ffffffff ffffffff  ................
64d4ea0: ffffffff ffffffff ffffffff ffffffff  ................
64d4eb0: ffffffff ffffffff ffffffff ffffffff  ................
64d4ec0: ffffffff ffffffff ffffffff ffffffff  ................
64d4ed0: ffffffff ffffffff ffffffff ffffffff  ................
64d4ee0: ffffffff ffffffff ffffffff ffffffff  ................
64d4ef0: ffffffff ffffffff ffffffff ffffffff  ................
64d4f00: ffffffff ffffffff ffffffff ffffffff  ................
64d4f10: ffffffff ffffffff ffffffff ffffffff  ................
64d4f20: ffffffff ffffffff ffffffff ffffffff  ................
64d4f30: ffffffff ffffffff ffffffff ffffffff  ................
64d4f40: ffffffff ffffffff ffffffff ffffffff  ................
64d4f50: ff000000 0000ffff a75b4381 75a3f5ff  .........[C.u...
64d4f60: ffffffff ffffffff ffffffff ffffffff  ................
64d4f70: ffffffff ffffffff ffffffff ffffffff  ................
64d4f80: ffffffff ffffffff ffffffff ffffffff  ................
64d4f90: ffffffff ffffffff ffffffff ffffffff  ................
64d4fa0: ffffffff ffffffff ffffffff ffffffff  ................
64d4fb0: ffffffff ffffffff ffffffff ffffffff  ................
64d4fc0: ffffffff ffffffff ffffffff ffffffff  ................
64d4fd0: ffffffff ffffffff ffffffff ffffffff  ................
64d4fe0: ffffffff ffffffff ffffffff ffffffff  ................
64d4ff0: ffffffff ffffffff ffffffff ffffffff  ................
64d5000: ffffffff ffffffff ffffffff ffffffff  ................
64d5010: ffffffff ffffffff ffffffff ffffffff  ................
64d5020: ffffffff ffffffff ffffffff ffffffff  ................
64d5030: ffffffff ffffffff ffffffff ffffffff  ................
64d5040: ffffffff ffffffff ffffffff ffffffff  ................
64d5050: ffffffff ffffffff ffffffff ffffffff  ................
64d5060: ffffffff ffffffff ffffffff ffffffff  ................
64d5070: ffffffff ffffffff ffffffff ffffffff  ................
64d5080: ffffffff ffffffff ffffffff ffffffff  ................
64d5090: ffffffff ffffffff ffffffff ffffffff  ................
64d50a0: ffffffff ffffffff ffffffff ffffffff  ................
64d50b0: ffffffff ffffffff ffffffff ffffffff  ................
64d50c0: ffffffff ffffffff ffffffff ffffffff  ................
64d50d0: ffffffff ffffffff ffffffff ffffffff  ................
64d50e0: ffffffff ffffffff ffffffff ffffffff  ................
64d50f0: ffffffff ffffffff ffffffff ffffffff  ................
64d5100: ffffffff ffffffff ffffffff ffffffff  ................
64d5110: ffffffff ffffffff ffffffff ffffffff  ................
64d5120: ffffffff ffffffff ffffffff ffffffff  ................
64d5130: ffffffff ffffffff ffffffff ffffffff  ................
64d5140: ffffffff ffffffff ffffffff ffffffff  ................
64d5150: ffffffff ffffffff ffffffff ffffffff  ................
64d5160: ff00000f ffffff04 cd2b0a32 9a3df9ff  .........+.2.=..
64d5170: ffffffff ffffffff ffffffff ffffffff  ................
64d5180: ffffffff ffffffff ffffffff ffffffff  ................
64d5190: ffffffff ffffffff ffffffff ffffffff  ................
64d51a0: ffffffff ffffffff ffffffff ffffffff  ................
64d51b0: ffffffff ffffffff ffffffff ffffffff  ................
64d51c0: ffffffff ffffffff ffffffff ffffffff  ................
64d51d0: ffffffff ffffffff ffffffff ffffffff  ................
64d51e0: ffffffff ffffffff ffffffff ffffffff  ................
64d51f0: ffffffff ffffffff ffffffff ffffffff  ................
64d5200: ffffffff ffffffff ffffffff ffffffff  ................
64d5210: ffffffff ffffffff ffffffff ffffffff  ................
64d5220: ffffffff ffffffff ffffffff ffffffff  ................
64d5230: ffffffff ffffffff ffffffff ffffffff  ................
64d5240: ffffffff ffffffff ffffffff ffffffff  ................
64d5250: ffffffff ffffffff ffffffff ffffffff  ................
64d5260: ffffffff ffffffff ffffffff ffffffff  ................
64d5270: ffffffff ffffffff ffffffff ffffffff  ................
64d5280: ffffffff ffffffff ffffffff ffffffff  ................
64d5290: ffffffff ffffffff ffffffff ffffffff  ................
64d52a0: ffffffff ffffffff ffffffff ffffffff  ................
64d52b0: ffffffff ffffffff ffffffff ffffffff  ................
64d52c0: ffffffff ffffffff ffffffff ffffffff  ................
64d52d0: ffffffff ffffffff ffffffff ffffffff  ................
64d52e0: ffffffff ffffffff ffffffff ffffffff  ................
64d52f0: ffffffff ffffffff ffffffff ffffffff  ................
64d5300: ffffffff ffffffff ffffffff ffffffff  ................
64d5310: ffffffff ffffffff ffffffff ffffffff  ................
64d5320: ffffffff ffffffff ffffffff ffffffff  ................
64d5330: ffffffff ffffffff ffffffff ffffffff  ................
64d5340: ffffffff ffffffff ffffffff ffffffff  ................
64d5350: ffffffff ffffffff ffffffff ffffffff  ................
64d5360: ffffffff ffffffff ffffffff ffffffff  ................
64d5370: ff000000 04aaaaca 53a10415 55d2f0ff  ........S...U...
64d5380: 7f454c46 01010100 00000000 00000000  .ELF............
64d5390: 02002800 01000000 509c0000 34000000  ..(.....P...4...
64d53a0: c4670900 00000005 34002000 07002800  .g......4. ...(.
64d53b0: 18001700 06000000 34000000 34800000  ........4...4...
64d53c0: 34800000 e0000000 e0000000 04000000  4...............
64d53d0: 04000000 03000000 14010000 14810000  ................
64d53e0: 14810000 13000000 13000000 04000000  ................
64d53f0: 01000000 01000000 00000000 00800000  ................
64d5400: 00800000 50990800 50990800 05000000  ....P...P.......
64d5410: 00100000 01000000 00a00800 00200900  ............. ..
64d5420: 00200900 9cc60000 08370d00 06000000  . .......7......
64d5430: 00100000 02000000 d0f00800 d0700900  .............p..
64d5440: d0700900 d0000000 d0000000 06000000  .p..............
64d5450: 04000000 51e57464 00000000 00000000  ....Q.td........
64d5460: 00000000 00000000 00000000 06000000  ................
64d5470: 00000000 01000070 c0830800 c0030900  .......p........
64d5480: c0030900 90150000 90150000 04000000  ................
64d5490: 04000000 2f737973 74656d2f 62696e2f  ..../system/bin/
64d54a0: 6c696e6b 65720000 83000000 8b000000  linker..........
64d54b0: 66000000 6e000000 00000000 32000000  f...n.......2...
64d54c0: 0f000000 22000000 00000000 89000000  ...."...........
64d54d0: 3f000000 38000000 00000000 00000000  ?...8...........
64d54e0: 00000000 67000000 80000000 68000000  ....g.......h...
64d54f0: 00000000 00000000 3d000000 29000000  ........=...)...
64d5500: 51000000 00000000 4d000000 5a000000  Q.......M...Z...
64d5510: 5d000000 75000000 4c000000 00000000  ]...u...L.......
64d5520: 0e000000 65000000 76000000 13000000  ....e...v.......
64d5530: 00000000 00000000 00000000 5c000000  ............\...
64d5540: 86000000 46000000 00000000 6c000000  ....F.......l...
64d5550: 2a000000 00000000 00000000 36000000  *...........6...
64d5560: 7a000000 00000000 17000000 00000000  z...............
64d5570: 00000000 00000000 00000000 05000000  ................
64d5580: ff001000 00280200 ce9ca2e9 ab9df1ff  .....(..........
64d5590: 00000000 52000000 47000000 00000000  ....R...G.......
64d55a0: 00000000 64000000 43000000 00000000  ....d...C.......
64d55b0: 00000000 00000000 85000000 00000000  ................
64d55c0: 72000000 61000000 30000000 00000000  r...a...0.......
64d55d0: 57000000 00000000 69000000 01000000  W.......i.......
64d55e0: 78000000 7d000000 28000000 56000000  x...}...(...V...
64d55f0: 77000000 00000000 84000000 00000000  w...............
64d5600: 74000000 35000000 6d000000 00000000  t...5...m.......
64d5610: 42000000 1f000000 00000000 5e000000  B...........^...
64d5620: 00000000 70000000 8a000000 00000000  ....p...........
64d5630: 88000000 5f000000 7e000000 00000000  ...._...~.......
64d5640: 48000000 00000000 71000000 15000000  H.......q.......
64d5650: 00000000 26000000 33000000 1d000000  ....&...3.......
64d5660: 41000000 00000000 3a000000 6f000000  A.......:...o...
64d5670: 6b000000 00000000 82000000 00000000  k...............
64d5680: 49000000 2e000000 00000000 44000000  I...........D...
64d5690: 10000000 00000000 00000000 63000000  ............c...
64d56a0: 83000000 3e000000 87000000 81000000  ....>...........
64d56b0: 06000000 25000000                    ....%...

My tcpdump is preceded by 0xffff. That's consistent with the u16 sum_no_longer_used in the struct declaration - if YAFFS isn't using these 16 bits, it makes sense that it would set them to 1.

I expect the previous member of the struct to be int parent_obj_id and the value 0x00000226 sounds like a reasonable inode number for a file system of this size. To put it differently, it's less far-fetched than if it were 0x9c36281b

The previous member of the struct is the first one, enum yaffs_obj_type type. Its value here is 0x00000001 and I expect tcpdump to be a regular file. According to yaffs_guts.h, the type enumeration for yaffs_obj_type looks like:

enum yaffs_obj_type {
    YAFFS_OBJECT_TYPE_UNKNOWN,
    YAFFS_OBJECT_TYPE_FILE,
    YAFFS_OBJECT_TYPE_SYMLINK,
    YAFFS_OBJECT_TYPE_DIRECTORY,
    YAFFS_OBJECT_TYPE_HARDLINK,
    YAFFS_OBJECT_TYPE_SPECIAL
};

The value we see is indeed consistent with YAFFS_OBJECT_TYPE_FILE.

Based on these findings, I'm fairly confident I have a bona fide YAFFS object header starting at 0x64d4b40 in my dump. That's pretty cool.

Chunks

I know that 0x64d4b40 is a multiple of the chunk size but I still don't know how big my dump's chunks are. In order to find out, I had to scroll down from that offset and carefully watch for patterns:

  • 0x64d4b40 - 0x64d4c5f - That's the start of the object header. Lots of NULLs used to pad tcpdump, followed by what kinda looks like timestamps (0x515bcc66 works out to March 13th 2013)

  • 0x64d4c60 - 0x64d537f - Mostly just 0xffs. What's very interesting is that these are interrupted by 16-byte fragments of non-0xff bytes at 0x64d4d40, 0x64d4f50, 0x64d5160 and 0x64d5370

  • 0x64d5380 - ... - I've seen a lot of ELF headers in my time and this sure looks like one!

I don't know whether the ELF header at 0x64d5380 belongs to tcpdump or any other file but it looks like there's a data chunk starting at that offset. This would put the chunk size at 0x64d5380 - 0x64d4b40 = 2112 bytes.

Spares

2112-byte chunks, eh? That's... not an integer power of 2 and therefore not a very auspicious number. 2048 would be so much better!

It just so happens that 2112 is equal to 2048 + 64 and I've found four 16-byte fragments of data in the object header that stand out from the bytes that surround them. After a week of mulling over what these mystery bytes might be, it's occurred to me that they may just be the chunk's spare data. The fact that they occur like clockwork every 512 bytes suggest they were written programmatically, as opposed to being a feature of either object header of file data.

I tried to test that hypothesis by sweeping other areas of the dump. Since I want to prove that this pattern of 16-byte fragments is not part of the data stored on the filesystem, I decided to look for it in a large area of contiguous, human-readable data.

I don't know for sure what's in the phone's /system directory but I can make an educated guess that there might be a copy of the GPL somewhere in there. Admittedly, that's stretching the definition of human-readable a wee bit.

689-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ grep -iabo 'general public license' ../images/system.img | head -n 1
102164317:General Public License

702-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ echo $(( 102164317 - 102164317 % 528 ))
102164304

704-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ xxd -g4 -s 102164304 -l 2112 ../images/system.img
616e750: 73206f66 20746865 20474e55 2047656e  s of the GNU Gen
616e760: 6572616c 20507562 6c696320 4c696365  eral Public Lice
616e770: 6e736520 76657273 696f6e20 322e0a0a  nse version 2...
616e780: 416c7465 726e6174 6976656c 792c2074  Alternatively, t
616e790: 68697320 736f6674 77617265 206d6179  his software may
616e7a0: 20626520 64697374 72696275 74656420   be distributed
616e7b0: 756e6465 72207468 65207465 726d7320  under the terms
616e7c0: 6f662074 68650a42 5344206c 6963656e  of the.BSD licen
616e7d0: 73652e20 53656520 52454144 4d452061  se. See README a
616e7e0: 6e642043 4f505949 4e472066 6f72206d  nd COPYING for m
616e7f0: 6f726520 64657461 696c732e 0a005468  ore details...Th
616e800: 69732070 726f6772 616d2069 73206672  is program is fr
616e810: 65652073 6f667477 6172653b 20796f75  ee software; you
616e820: 2063616e 20726564 69737472 69627574   can redistribut
616e830: 65206974 20616e64 2f6f7220 6d6f6469  e it and/or modi
616e840: 66790a69 7420756e 64657220 74686520  fy.it under the
616e850: 7465726d 73206f66 20746865 20474e55  terms of the GNU
616e860: 2047656e 6572616c 20507562 6c696320   General Public
616e870: 4c696365 6e736520 76657273 696f6e20  License version
616e880: 32206173 0a707562 6c697368 65642062  2 as.published b
616e890: 79207468 65204672 65652053 6f667477  y the Free Softw
616e8a0: 61726520 466f756e 64617469 6f6e2e0a  are Foundation..
616e8b0: 0a546869 73207072 6f677261 6d206973  .This program is
616e8c0: 20646973 74726962 75746564 20696e20   distributed in
616e8d0: 74686520 686f7065 20746861 74206974  the hope that it
616e8e0: 2077696c 6c206265 20757365 66756c2c   will be useful,
616e8f0: 0a627574 20574954 484f5554 20414e59  .but WITHOUT ANY
616e900: 20574152 52414e54 593b2077 6974686f   WARRANTY; witho
616e910: 75742065 76656e20 74686520 696d706c  ut even the impl
616e920: 69656420 77617272 616e7479 206f660a  ied warranty of.
616e930: 4d455243 48414e54 4142494c 49545920  MERCHANTABILITY
616e940: 6f722046 49544e45 53532046 4f522041  or FITNESS FOR A
616e950: ff008000 00000008 31c0f8d8 3ce1ffff  ........1...<...
616e960: 20504152 54494355 4c415220 50555250   PARTICULAR PURP
616e970: 4f53452e 20205365 65207468 650a474e  OSE.  See the.GN
616e980: 55204765 6e657261 6c205075 626c6963  U General Public
616e990: 204c6963 656e7365 20666f72 206d6f72   License for mor
616e9a0: 65206465 7461696c 732e0a0a 00596f75  e details....You

We can see the familiar text in the dump and just when Lawrence Lessig has built up a good head of steam and starts shouting about the MERCHANTABILITY and the FITNESS FOR A PARTICULAR PURPOSE, a 16-byte fragment of data occurs at offset 0x616e950 that is definitely not part of the GPL.

At this point I'm fairly confident that these fragments are the out-of-band spare data written and read by YAFFS.

Something is off

In order to make things a bit clearer, I've decided to call the combined 2112-byte blob of chunk data interleaved with spare fragments a block.

It looks like I've got 151040 of them in my dump:

705-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ stat ../images/system.img 
  File: ‘../images/system.img’
  Size: 318996480       Blocks: 623040     IO Block: 4096   regular file
Device: fe01h/65025d    Inode: 1066496     Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/  mboyer)   Gid: ( 1000/  mboyer)
Access: 2014-08-11 22:08:27.409957044 +0100
Modify: 2014-06-01 03:59:00.494426437 +0100
Change: 2014-07-06 12:47:15.303356073 +0100
 Birth: -

706-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ echo $(( 318996480 % 2112 ))
0

707-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ echo $(( 318996480 / 2112 ))
151040

The original YAFFS spec mentions a single 16-byte spare for every chunk. Here however, I have a grand total of 64 bytes of out-of-band data for every 2048-byte chunk. Every single 16-byte fragment begins and ends with 0xff, so I only really have 56 bytes of meaningful data in there.

Still, that's a lot more than I expected and it's obvious that the 64-bit struct yaffs_tags declaration I got from the header isn't going to map directly to the dump's spare bytes.

Reversing the spares

Since unlike the object headers I don't know what the dump's spare data should look like, I had to find the equivalents of the struct yaffs_tags members in spare data the hard way.

Finding the obj_id

I know that tcpdump's object header has a parent_obj_id member with a value of 0x00000226 and I know that its parent object is a directory named xbin. Knowing what I know now about block sizes, I can look for that directory's object header and search the block's spare data for that number. I expect the xbin string to start 10 bytes from a block boundary:

714-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ grep -abo 'xbin' ../images/system.img  | awk -F':' '{ if(10==($1 % 2112)){ print $1 - 10} }'
105709824

716-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ xxd -g4 -s 105709824 -l 80 ../images/system.img  
64d0100: 03000000 01000000 ffff7862 696e0000  ..........xbin..
64d0110: 00000000 00000000 00000000 00000000  ................
64d0120: 00000000 00000000 00000000 00000000  ................
64d0130: 00000000 00000000 00000000 00000000  ................
64d0140: 00000000 00000000 00000000 00000000  ................

0x00000003 is consistent with the enumerated type for a directory, my header is looking pretty groovy.

Let's put together the 4 16-byte fragments we've got around that chunk and see what we can see.

725-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I=R=]$ for frag_offset in 512 1040 1568 2096; do xxd -g4 -s $(( 105709824 + frag_offset)) -l 16 ../images/system.img  ; done
64d0300: ff001000 00260200 d98ba090 4fd6feff  .....&......O...
64d0510: ff000000 0000ffff a75b4381 75a3f5ff  .........[C.u...
64d0720: ff00001a ffffff00 2059caf5 ed0bfbff  ........ Y......
64d0930: ff000000 ffaaaa2e 03a502ea 410bffff  ............A...

The only 0x26 byte in there is found at spare offset 0x05 and it is followed by a 0x02. That sounds promising. I repeated this process using other files I knew the path of and was able to determine that the obj_id-equivalent is stored as a little-endian unsigned integer starting on the 40th bit of the spare. The YAFFS1 struct yaffs_tags declaration points to a length of 18 bits although there could be more here.

Finding the chunk_id

According to the YAFFS spec, object header chunks have a chunk_id value of zero whereas file data chunks have a positive integer chunk_id that indicates the chunk's position in the file. There are several contiguous runs of 0x00 bytes in the tcpdump object header spare above. Which one's the chunk_id?

To find out, I had to find at least two consecutive chunks of file data and get compare their spares. I headed back to the GPL. I knew there was a 16-byte spare fragment at 0x616e950.

747-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ echo $(( 102164304 % 2112 ))
528

748-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ echo $(( 102164304 - 528 ))
102163776

750-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ for frag_offset in 512 1040 1568 2096; do xxd -g4 -s $(( 102163776 + frag_offset)) -l 16 ../images/system.img  ; done
616e740: ff001000 00c80100 b1d3e86c 3218f9ff  ...........l2...
616e950: ff008000 00000008 31c0f8d8 3ce1ffff  ........1...<...
616eb60: ff000019 ffffff05 68afaa7c 4a4cfcff  ........h..|JL..
616ed70: ff000000 faaaaa48 d1334c8c 70d3f0ff  .......H.3L.p...

751-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ echo $(( 102163776 + 2112 ))
102165888
752-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ for frag_offset in 512 1040 1568 2096; do xxd -g4 -s $(( 102165888 + frag_offset)) -l 16 ../images/system.img  ; done
616ef80: ff001000 00c80100 c5a1ae9d 713af0ff  ............q:..
616f190: ff008100 00000008 39565abc 5b40f3ff  ........9VZ.[@..
616f3a0: ff00000c ffffff0d a8b75e90 ba0bf8ff  ..........^.....
616f5b0: ff000000 0daaaaa3 25a947e4 f8affcff  ........%.G.....

The only bytes that seem to have incremented from the first chunk's spar to the next's are at spare offset 0x12. This is consistent with what we got from the tcpdump object header's spare where we have 0x00000000 at that offset. I tested that hypothesis on other files' chunks and was able to confirm that the only location in the spare where a chunkid could be found is at 0x12. The length for that field is set to 20 bits in the YAFFS1 tags structure and could be up to 32 bits based on what I have seen.

Finding the chunk's byte count

The last piece of information I need to successfully extract file data from my dump is the number of file data bytes in a given chunk. I know that my chunks are 2048 bytes in length, so I expect to find that value in mid-file chunks' spares. The last chunk in a given file, that is to say with a certain obj_id, should have a length field with a value equal to the file's length taken from the object's header modulo 2048.

767-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ printf '%x\n' 2048
800

There's a 0x0008 in both of my GPL spares, starting at spare offset 0x16. The object header's spare has 0xffff there, which makes sense since the headers don't include any file data. I set out to find tcpdump's last data chunk to test that hypothesis.

I can tell from the object header above that the file's size is 0x00096b84 which is reasonable for a binary. 0x00096b84 % 2048 == 900 so I'll expect the runt chunk to have a byte count of 900. The last file chunk for tcpdump is in block #50363.

790-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ echo $(( 50363 * 2112 ))
106366656

791-mboyer@marylou:~/Hacks/Nam-Phone_G40C/PYaffs [HL4:I±R=]$ for frag_offset in 512 1040 1568 2096; do xxd -g4 -s $(( 106366656 + frag_offset)) -l 16 ../images/system.img  ; done
65708c0: ff001000 00280200 f5453b63 c5b5ffff  .....(...E;c....
6570ad0: ff002e01 00008403 324fc629 baa2f8ff  ........2O.)....
6570ce0: ff000019 ffffff0d 7f1bae87 9e88faff  ................
6570ef0: ff000000 f2aaaa8b 70b3f52b 746dfbff  ........p..+tm..

0x0384 is indeed 900. I'm now quite satisfied that the chunk's byte count is stored at spare offset 0x16.

Conclusion

I've now reverse-engineered enough information about the layout of the /system dump to write a tool that will programmatically extract the contents and metadata of the filesystem. This tool is called PYaffs and I first uploaded it to GitHub about 6 weeks ago.

This is very exciting news to me because it means this blog, which is really just a side-project, has now caught up with the main event and future hacklogs will detail new developments instead of rehashing weeks-old stuff. It's nice when you break even.

Since I've now partially achieved one of the goals I set for myself when I began, and in doing so enabled the other two, I think I should conclude with this other quote from Jurassic Park.

It's a UNIX system, I know this!

"It's a UNIX system"