Thursday, 19 December 2013

On the beauty of libarchive

In your applications, you might have to deal with compressed files: ISO images of installers, e-book or e-comic types based on ZIP files, video DVD images.

libarchive makes things easier by allowing you not to have to deal with external commands to extract those few files you care about.

The API feels a bit antiquated, compared to using GLib/GIO for files handling, but it's generally easier than dealing with potential security issues launching external tools, or even dealing with shell argv quoting.


totem-pl-parser uses libarchive to determine what type of video disc image are hidden inside an ISO image.

gnome-epub-thumbnailer (as well as its siblings, the Krita and OpenRaster thumbnailers I talked about more recently) uses the ZIP handling to extract particular files, and figure out which file is the cover image.

Other uses and limitations

Boxes could use libarchive to extract files from ISO images for its auto-installer, evince could use it to handle CBZ e-comics.

There's a couple of limitations though. ISO support doesn't handle UDF images (which just means weird filenames, not inaccessible files), and RAR support is still quite young.

I hope that this post can spur on bug fixes for the RAR support, new UDF support, or even a GIO-style wrapper around the library.

The upstream authors have been particularly good at fixing bugs that only showed themselves with broken files, and I'd like to thank them for their very useful work.


Juanjo Marín said...

I am working on the evince backend to use libarchive

Bastien Nocera said...

Juanjo: Success! :)

Unknown said...

I hope this isn't inappropriate (feel free to delete this comment if you think it is), but... If the compressed files are just single files (like bz2, gz, xz, etc.) or streams, not archives, I've been working on something which may be interesting: Squash.

Squash doesn't deal with archives because that's libarchive's role, but for people who don't need that capability it might be a good fit, and the API tends to be much simpler than libarchive. I'm also planning on adding a integration library which will allow people to use GIO, just like GZlibCompressor/GZlibDecompressor.

Stefan Sauer said...

Having gio support for this would be nice. I use libgsf in buzztrax to read/write zip files right now.

Benjamin said...

gvfs supports libarchive. The problem with libarchive is that it sucks (used to suck? I wrote the gvfs archive backend years ago) a lot for writing archives as the only action it supported with decent API was creating archives. Which is why the rchive support in gvfs is read-only.

Also, you get the pain of yet another API (with lots of enums) for permissions, etags and whatnot. And marshalling all of that for gio sounds very painful to me.

Bastien Nocera said...

Benjamin: The gvfs libarchive support is pretty useless for applications. There's no way to easily mount an archive (even if you manage to figure out the URL you should be using), and "foo.epub" mounts popping up in the file manager is less than useful (especially if I open my e-Books folder, huh :).

gvfs archive support is for end-users navigating their files. What I was requesting was a GLib-ish wrapper around the library.