Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: can kaitai for archive/android_sparse.ksy be used to iterate over files? #699

Open
sandreenko opened this issue Sep 5, 2024 · 4 comments
Labels

Comments

@sandreenko
Copy link

sandreenko commented Sep 5, 2024

Hello, I am working with android sparse images and want to iterate over byte blocks for each files in the image.
Currently I am transforming them to ext2 and then use debugfs to get byte blocks in the file.

Can I do this with kaitai? I could open my files using android_sparse.ksy but I don't see how to go from class Chunk(KaitaiStruct): to actual files. Is it possible?

Thanks!

Offtopic: https://ide.kaitai.io/ for acrhive/android_sparse.ksy looks broken:

Parse error (ValidationNotEqualError): not equal, expected [58,255,38,237], but got [80,75,3,4]
Call stack: Error
    at new KaitaiStream.ValidationNotEqualError (https://ide.kaitai.io/lib/_npm/kaitai-struct/KaitaiStream.js:796:17)
    at AndroidSparse.FileHeaderPrefix.FileHeaderPrefix._read (eval at initCode (https://ide.kaitai.io/js/v1/kaitaiWorker.js:89:9), <anonymous>:83:15)
    at AndroidSparse._read (eval at initCode (https://ide.kaitai.io/js/v1/kaitaiWorker.js:89:9), <anonymous>:49:23)
    at reparse (https://ide.kaitai.io/js/v1/kaitaiWorker.js:96:17)
    at myself.onmessage (https://ide.kaitai.io/js/v1/kaitaiWorker.js:117:47)
@armijnhemel
Copy link
Collaborator

It is possible to create Android sparse image files for files that are not ext2/3/4 file systems (I've done that). I would recommend taking a two step approach.

@generalmimon
Copy link
Member

generalmimon commented Sep 6, 2024

@sandreenko:

Offtopic: https://ide.kaitai.io/ for acrhive/android_sparse.ksy looks broken:

Parse error (ValidationNotEqualError): not equal, expected [58,255,38,237], but got [80,75,3,4]
Call stack: Error
    at new KaitaiStream.ValidationNotEqualError (https://ide.kaitai.io/lib/_npm/kaitai-struct/KaitaiStream.js:796:17)
    at AndroidSparse.FileHeaderPrefix.FileHeaderPrefix._read (eval at initCode (https://ide.kaitai.io/js/v1/kaitaiWorker.js:89:9), <anonymous>:83:15)
    at AndroidSparse._read (eval at initCode (https://ide.kaitai.io/js/v1/kaitaiWorker.js:89:9), <anonymous>:49:23)
    at reparse (https://ide.kaitai.io/js/v1/kaitaiWorker.js:96:17)
    at myself.onmessage (https://ide.kaitai.io/js/v1/kaitaiWorker.js:117:47)

This doesn't look like it's broken. It just tells you that the validation of the magic number at android_sparse.ksy:38-39 failed:

types:
  file_header_prefix:
    seq:
      - id: magic
        contents: [0x3a, 0xff, 0x26, 0xed]

Documentation of the contents key can be found at https://doc.kaitai.io/user_guide.html#magic.

So the first 4 bytes of the input binary file were expected to be 3a ff 26 ed in hex, but they were actually 50 4b 03 04 in hex ([80,75,3,4] in decimal). (You can see this perhaps more clearly at https://ide.kaitai.io/devel/, which has some nice improvements over the "stable" https://ide.kaitai.io/ version.) 50 4b 03 04 is the magic signature for ZIP files, so you tried to parse a .zip file as if it followed the android_sparse format, which won't work.

If you want to try some valid android_sparse files, see https://github.com/kaitai-io/kaitai_struct_samples/tree/master/archive/android_sparse (these are just sample files for testing and don't really contain any useful data, though).

Hello, I am working with android sparse images and want to iterate over byte blocks for each files in the image.

Just to clarify, the Android sparse image format AFAIK doesn't directly support storing multiple files. Check out https://2net.co.uk/tutorial/android-sparse-image-format:

Android generates system.img, userdata.img and cache.img in sparse format.

This suggests that only individual "files" are packed as Android sparse images. But it appears to be common that the Android sparse image format is used to "compress" a file system image (like ext4), which of course can contain many files.

Can I do this with kaitai? I could open my files using android_sparse.ksy but I don't see how to go from class Chunk(KaitaiStruct): to actual files.

You can use Kaitai Struct to unpack an Android sparse image if you want. I guess the standard solution to do it would be to use the simg2img tool instead, which is included in many distributions in a package called android-sdk-libsparse-utils (Debian, Ubuntu) or android-tools (Fedora, openSUSE, Arch Linux). But if you want to do this yourself using the generated Python parser, it should be relatively straightforward. I strongly recommend inspecting the object tree at https://ide.kaitai.io/devel/ first (and make sure you select an actual .img file with an Android sparse image, a .zip file won't work) - it will guide you through the structure of data that you'll get from the parser. You'll notice that there are 4 types of chunks, and I think their meaning should be pretty self-explanatory - let me know if you need more help.

After you unpack the filesystem image from the Android sparse image, I wouldn't recommend using Kaitai Struct to access the files in that filesystem. It will be easier and more robust to just mount it if you're on Linux, or I think 7-Zip (which is primarily for Windows, and also has a command-line interface) can unpack some file system images. There are probably other ways, these are just ones I can think of.

@sandreenko
Copy link
Author

sandreenko commented Sep 6, 2024

Thanks @armijnhemel , do you mean convert to non-sparse and iterate non-sparse?

Thanks @generalmimon for the detailed answer!
I will try the visualizer with an uploaded img. I am actually trying to repack the img into a smaller so I want to extract bytes that belong to files, compress them but then restore the original img, so I want to keep all sparse image metadata intact so cant use simg2img.

I wouldn't recommend using Kaitai Struct to access the files in that filesystem.

Why?

@generalmimon
Copy link
Member

generalmimon commented Sep 6, 2024

@sandreenko:

I wouldn't recommend using Kaitai Struct to access the files in that filesystem.

Why?

If you want to go down this path, feel free to try it. If you're determined, it could work out well in the end. I just wanted to make sure you know about the simpler and more established solutions first.

I don't know much about how complex the filesystems are on the inside, but it will be certainly harder than unpacking the Android sparse image with KS (because that should be very easy; in contrast, I assume filesystems must inherently have at least some degree of complexity). And it will largely depend on what kind of file system it is. If it's ext2 as you mentioned, you might be somewhat in luck because we seem to have a .ksy spec for it (https://formats.kaitai.io/ext2/), but I've personally never tested it, so I don't know how good or complete it is. If it was some other file system, you might have to write a .ksy spec for it yourself, if no one has already done so. Which is perhaps not that hard (of course, it really depends on the complexity of the format), but it definitely takes some time, so hopefully it's not your case.

There is some open issue about ext2 at the moment - #662. Maybe you'll run into it, maybe not. @armijnhemel also notes the following in #662 (comment):

The ext2 parser is, in my opinion, only useful to parse the super block. There are many features (such as the sparse superblock ( https://www.nongnu.org/ext2-doc/ext2.html#def-superblock ) that are not properly supported.

Which suggests that our ext2 specification is far from complete, so you may need to extend it if you really want to use it (please inform us about the problems you're run into or send pull requests so we can improve it in the future). But maybe it will work fine for you as is, who knows. To find out, I would just advise using https://ide.kaitai.io/devel/ extensively first on various ext2 images that your application might encounter to evaluate if it's even viable to go this route (and also to check if you can actually find the files there).

I am actually trying to repack the img into a smaller so I want to extract bytes that belong to files, compress them but then restore the original img, so I want to keep all sparse image metadata intact so cant use simg2img.

Wait, if you want to "repack" everything, does it mean that you also need serialization in addition to parsing? If so, you might be in luck again because serialization has been implemented for Python (and Java) not long ago (see https://doc.kaitai.io/serialization.html), but hasn't been released yet, so you would have to build the compiler from source as explained there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants