[Aide] File type attribute

Bob Proulx bob at proulx.com
Fri Oct 21 06:27:58 EEST 2005


gentuxx wrote:
> jacob martinson wrote:
> >What do you mean by file type? Unix doesn't have a file typing
> >framework the way Windows does.
>
> At the risk of going off-topic, Unix does recognize file types.  It
> isn't the same type of system you allude to with Windows, i.e. file
> extensions, but it knows the difference:

I know this is drifting pretty far afield but I can't go on letting
people think that the 'file' command is doing something that it is not
doing.

> gentuxx at gentoo ~ $ file install-sparc64-universal-2005.1.iso
> install-sparc64-universal-2005.1.iso: Sun disk label 'SPARC bootable
> CD-ROM: Gentoo Linux SPARC64 2005.1' 0 blocks, boot block present
> gentuxx at gentoo ~ $ file lastRSS.zip
> lastRSS.zip: Zip archive data, at least v2.0 to extract
> gentuxx at gentoo ~ $ file /usr/bin/ls
> /usr/bin/ls: symbolic link to `/bin/ls'
> gentuxx at gentoo ~ $ file /bin/ls
> /bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for
> GNU/Linux 2.4.1, dynamically linked (uses shared libs), stripped

The file command is sampling data from the file at certain locations
and then looking at its map table to determine what to say about it.
Usually the 'file' program's table is /etc/magic because it lists the
magic numbers in the files.  Now /usr/share/misc/file/magic on my
current system so that local customizations to /etc will be preserved.

For example, traditionally a file starting with the four bytes "#! /"
would be reported as a script.  Looking at the string would indicate
what type of script.  (This is one reason why some people prefer the
script start "#! /" instead of "#!/".  Because some old file commands
would look at all four of the first bytes for the signature.  But
personally I prefer the one without the space.)

Meanwhile, the output of 'file' has no bearing on anything else in the
system.  It is just a convenient program that takes a guess the file
contents.  But it is only a guess.  It is often out of date if a new
data file comes into being such as a new compression format or some
such thing.

> At the most fundamental level, it's binary or text. 

I must disagree.  On unix system even text files are binary.

On MS-Windows a distinction is made that text files are different and
upon reading and writing a byte transation is applied by convention.
Logical line endings are encoded with CR-NL sequences and a file is
logically ended by a ^Z character even if more data exists and may be
read after it.  But neither of those are true for unix files.

> It determines file type based on the file header.

With 'file' it is not always a "header".  Files don't really have
headers unless the author designed one into the data of the file.
Whenever possible 'file' uses the data at offset zero because that is
the most convenient data to use.  But it is not required.  The author
of the /etc/magic file tries to find some unique signature in the data
of the file and tries to map that to a string to print to identify the
file.

> To the point of this discussion, it might be handy to know that my
> install-sparc64-universal-2005.1.iso is really:
> 
> gentuxx at gentoo ~ $ file scripts/install-sparc64-universal-2005.1.iso
> scripts/install-sparc64-universal-2005.1.iso: perl script text executable

Such as copying a perl script on top of the iso image thereby
replacing it, as you must have done for your example.

> but if that were to actually happen, the ctime, mtime, (possibly)
> inode, md5 hash, sha1 hash, etc. are all going to be different, which
> would be flagged by 'aide -c'.

Agreed.

Bob


More information about the Aide mailing list