[Aide] manual.html, Understanding Aide rule matching

Sun Dec 18 19:53:21 EET 2005

Hi,

On Mon, Dec 12, 2005 at 09:26:57AM +0200, Virolainen Pablo wrote:
> On Sun, 11 Dec 2005, Marc Haber wrote:
> > On Thu, Dec 08, 2005 at 11:44:05AM +0200, Virolainen Pablo wrote:
> >> If we have rules "/usr/" and "/usr/local/", "/usr/local/" is deeper
> >> match.
> >> Rules are placed to selection tree. If we have rule "/usr/local/[bB]*" it
> >> will create tree
> >> ("/",("/usr/",("/usr/local/",(),"/usr/local/[bB]*",(),()),(),(),()),(),(),())
> >
> > I do not quite understand that notation. If I see correctly, this is a
> > tree of _four_ tuples, not a tree of triples that I would expect. Can
> > you please explain again?
> 
> I might have missed some () somewhere.. (or there might be missing ()..)
> (node-name,(children),(sel rules),(equ rules),(neg rules))

So that description doesn't help at all at the moment. I'm going to
see whether the new debugging patch can make me help understand. At
least now, I know that my understanding of the tree structure is kind
of correct.

> Call check_node_for_match, with as deep match as possible for the 
> filename. (this is easy, because when file are read from the disk by 
> following the tree, starting with "/" node. Or one can use 
> get_seltree_node).

Is it possible to get pseudocode for this operation as well? If so,
one can remove the first_time flag from the recursive function,
resulting in:

	node = (find deepest match possible)
	check( equals list for this node )
	check_node_for_match(node, filename)

That way, the first two lines can be removed from
check_node_for_match, resulting in:

check_node_for_match(node,filename)
        check(regular list for this node)

        if (node is not the root node)
                check_node_for_match(nodes parent,filename)

        if (this file is about to be added)
                check(negative list for this node)

        return (info about whether this file should be added or not and how)

What I need to fully understand is to have the "find deepest match
possible" fleshed out more explicitly.

> > I have the impression that this is simplified too much.
> >
> >  (a) I don't see the equals list and the regular list being processed
> >      differently. Both seem to be handled exactly the same, but
> >      probably the equals list takes precedence.
> >  (b) I do not understand the recursion here. In my understanding, the
> >      "no deeper match" pseudocode is the actual recursion, moving from the
> >      root down towards the leaves. I'd like to have this pseudocode
> >      explained in more detail.
> >  (c) the explictly recursive check_node_for_match call does seem to go
> >      in the wrong direction.
> >  (d) What's the initial call to enter into the recursion scheme?
> >      check_node_for_match("$ROOT_NODE", $FILENAME) for each file in the
> >      file system?
> (a) = rules are compared only first time.

Ok, that shows nicely in the new pseudocode.

> (b) I assume that "no deeper match" is derived from idea we (I and Rami) 
> had  (1998?) while hacking with aide.

It is not any more present in the new pseudocode, so I do not need to
understand it any more.

> (c) It's just fine.

Yes, I understand now.

> (d) Well, one might think that our tree is like directory tree, and first 
> we chdir to correct subdir. (correct  dir == the deepest match).

This is now clear, thanks for the explanation given above.

> > This is not reflected by the pseudocode, and it is neither mentioned
> > nor explained in manual and/or man page.
> 
> Yep. This is a documentation problem.

I have tried to document this.

See the attached patch against current aide CVS for documentation
optimization. Once the "find deepest match possible" has been fleshed
out more, I can continue the work on the docs.

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."    Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 621 72739835
-------------- next part --------------
Index: aide.conf.5.in
===================================================================
RCS file: /cvsroot/aide/aide/doc/aide.conf.5.in,v
retrieving revision 1.4
diff -r1.4 aide.conf.5.in
20,23c20,23
< define/undefine variables. Second, there are lines that used to select
< which files are added to the database. Third there are the macrolines.
< Only the second type of lines are required for aide to do anything.
< Lines beginning with # are ignored as comments.
---
> define/undefine variables. Second, there are selection lines that are used
> to indicate which files are added to the database. Third, macro lines 
> define or undefine variables within the config file. Lines beginning
> with # are ignored as comments.
81c81
< groups listed in it are NOT displayed in the final report.
---
> -groups listed in it are NOT displayed in the final report.
85,94c85,96
< There are three types of selection lines (regular, negative, equals)
< Lines beginning with "/" are regular selective lines. Lines beginning
< with "!" are negative selection lines. And lines beginning with "="
< are equals selection lines. The string following the first character
< is taken as a regular expression matching to a complete filename (with
< path included). In regular selection rule the "/" is included in the
< regular expression. Following the regular expression in an expression.
< See CONFIG LINES for an explanation of exressions. See EXAMPLES and 
< doc/aide.conf for examples.
< 
---
> aide supports three types of selection lines (regular, negative, equals)
> Lines beginning with "/" are regular selection lines. Lines beginning
> with "=" are equals selection lines. And lines beginning with "!"
> are negative selection lines. The string following the first character
> is taken as a regular expression matching to a complete filename,
> including the path. In a regular selection rule the "/" is included in the
> regular expression. Following the regular expression is a group
> definition as explained above. See EXAMPLES and doc/aide.conf for examples.
> .PP
> More in-depth discussion of the selection algorithm can be found in
> the aide manual.
> .IP
Index: manual.html
===================================================================
RCS file: /cvsroot/aide/aide/doc/manual.html,v
retrieving revision 1.1.1.1
diff -r1.1.1.1 manual.html
148,149c148
< documentation for this in aide.conf(5) manual page. Here are a few
< pointers for what to look for.
---
> more documentation for this in aide.conf(5) manual page.
263,270c262,290
< In the initialisation process Aide creates a tree of the regexp
< rules. Each type of rule is placed in a separate list for each node in
< the tree. So we have an equals rule list,a select rule list and a
< negative selection rule list for all nodes. These lists may be empty.
< The node in which a rule is placed is determined by the first special
< regexp character in the rule. For example <code>!/proc</code> would be
< placed in the root node. While <code>!/proc/.*</code> would be placed
< in /proc node. Also in front of each rule Aide adds an implicit ^.
---
> As you already know, aide has three types of selection lines:
> <ul>
> <li>Regular selection lines, beginning with "/".</li>
> <li>Equals selection lines, beginning with "=".</li>
> <li>Negative selection lines, beginning with "!".</li>
> </ul>
> The string following the first character is taken as a regular
> expression matching to a complete filename, including the path. In a
> regular selection rule, the slash is included in the regular
> expression. An implicit ^ is added in front of each rule. A group
> definition follows the regular expression.
> </p>
> <p>
> When reading the configuration file, aide internally builds a tree
> that roughly resembles the directory tree to be checked. Each node
> corresponds to a directory, and each node has one rule list for the
> associated regular selection lines, one for the associated negative
> selection lines and one for the associated equals selection lines. If
> there is no associated rule, the respective list may be empty.
> </p>
> <p>
> aide tries to place a rule as far down in the tree as possible while
> still assuring that it is above all files that it matches. This is
> determined by the first "special" regexp character in the rule. For
> example, <code>!/proc</code> would be placed in the root node,
> <code>!/proc/.*</code> would be placed in the /proc node,
> <code>!/var/log/syslog*</code> is placed in the /var/log node and,
> finally, <code>!/home/[a-z0-9]+/.bashrc$</code> is placed in the /home
> node.
273,274c293,294
< When Aide does rule matching it uses the following algorithm.
< The following is a pseudocode adaptation from src/gen_list.c.
---
> The algorithm that aide uses for rule matching is described in the
> following paragraphs. The pseudocode is an adaption from src/gen_list.c.
277,288c297,308
< check_node_for_match(node,filename)
< 	if(no deeper match found)
< 		check(equals list for this node)
< 
< 	if(no deeper match found)
< 		check(select list for this node)
< 
< 	check_node_for_match(nodes parent,filename)
< 
< 	if(this file is about to be added)
< 		check(negative select list for this node) 
< 	
---
> check_node_for_match(node,filename,first_time)
> 	if (first_time)
>         	check(equals list for this node)
> 
> 	check(regular list for this node)
> 
> 	if (node is not the root node)
> 		check_node_for_match(nodes parent,filename,false)
> 
> 	if (this file is about to be added)
> 		check(negative list for this node)
> 
291a312,322
> When aide needs to determine whether a file found in the file system is
> to be checked, it first determines the deepest possible node x to
> match the current file against (that algorithm is not part of the
> pseudocode above), and then calls check-node_for_match(x, filename,
> true). So, the recursion starts at the deepest possible match.
> </p>
> <p>
> As it can also be seen, equals selection lines are only checked in the
> first recursion step, thus providing some kind of speed optimization
> by reducing the number of necessary regular expression evaluations,
> which is a quite expensive operation.