Regular Expressions in OVAL 5.6

3 messages Options
Embed this post
Permalink
James Hugard

Regular Expressions in OVAL 5.6

Reply Threaded More More options
Print post
Permalink
Some javascript/style in this post has been disabled (why?)

Sorry for bringing this to the table so late in the game, but as discussed at last week’s XCCDF Developers meeting there are several things to consider regarding Regular Expressions (REs) as specified in the OVAL 5.6 Release Candidate specification.  While some of this has been talked about on the OVAL mailing list, there are several points we feel bear additional discussion.

 

On the mailing list, there was some discussion of requiring “PCRE” for OVAL 5.6.  However, it is not clear from the text if this means “Perl Compatible Regular Expression” or if it means specifically the portable PCRE library at http://www.pcre.org/. If the latter, we feel this should be called out in the OVAL specification with a specific version’s syntax required, else incompatibilities in content are sure to crop up.  In any case, we would like to make an alternative proposal which takes the following into consideration.

 

 

DATA

 

1.       Requiring use of the PCRE library (http://www.pcre.org/) will preclude a pure-Java OVAL implementation, or implementation in some other non-C++ languages.

2.       By not introducing a new data type or behavior for Perl5 REs, existing OVAL content which uses POSIX-compatible REs may cease to work, depending on which library is in use.  While the open-source PCRE library and Perl itself support ERE character classes in addition to the Perl ones, even in the same RE, the BOOST library requires one to specify which syntax to use for any given RE: either Perl5, POSIX Basic REs (BRE), or POSIX Extended REs (ERE).

3.       We have observed OVAL content which includes a large number of ERE character classes (“[:alpha:], [:alnum:], [:blank:]”, etc.).

4.       It is fairly trivial to convert ERE character-classes into Perl5 character classes.

5.       Perl5 and ERE character classes do not overlap, so both can be supported simultaneously: ERE character classes are very unlikely to appear in a valid Perl5 bracket expression and Perl5 character classes are illegal in POSIX.

6.        “Perl5 regular expression” is a fairly loose term.  When a product says it uses REs that are “Perl5 compatible,” this tends to mean compatible with the original Perl 5.00 release, not the current release. There are many implementations make this claim, but they only share a subset of the full functionality from that release.

7.       RegEx’s tend to fail silently on incompatibilities between implementations.

8.       The link provided in the OVAL 5.6 specification which defines “Perl5 Regular Expression” (http://www.perl.com/doc/manual/html/pod/perlre.html) is not versioned.  This implies conforming OVAL implementations must track changes to Perl regular expressions and always implement the latest one.

9.       The link documents RE features which may not be widely considered “Perl 5 compatible.”  For example, specifying case sensitivity within the regular expression itself; e.g., “(?i)”.

10.   Since the OVAL specification does not indicate which parts of “Perl5 regular expressions” are required, one could be lead to assume all of them are required.  This implies a requirement to include the experimental “(?{ code })” construct, where “code” consists of arbitrary Perl code that will be evaluated as a zero-width assertion.

11.   While most Perl5 RE implementations support a number of common language elements, almost every RE library or application includes one or more unique language elements or a unique subset.  Because of this, those libraries and applications tend to document their particular flavor of RE.  (To be frank, I cannot think of any commercial or popular open-source application or library that supports regular expressions but that does not define exactly which language elements they support.)

12.   While an OVAL RE implementation should handle Unicode to, for example, match against strings in the Windows Registry, supplying these in hex is problematic because no compatible regular-expression sub-set exists: BOOST, PCRE, and Perl implementations use \x{2345} whereas most others use \u2345.

 

 

PROPOSAL

 

Our proposal consists of:

 

-          Indicating that POSIX ERE support is deprecated and will be removed in a future release.  Since ERE and Perl5 do not overlap and can be supported simultaneously, an implementation is allowed to support both.

-          Providing in the OVAL 5.6 specification a listing of the Regular Expression Language Elements (a Perl5 subset) that must be supported in order to be OVAL 5.6 compatible.  This would only be a listing, with full definitions supplied by the perlre link, or another RE language specification.

-          Providing a link to a specific version of the “perlre” documentation (Perl 5.00), hosted on the OVAL website, rather than pointing to an off-site unversioned resource.

 

The following documents were consulted with regards to this proposal:

 

-          http://msdn.microsoft.com/en-us/library/az24scfc(VS.80).aspx (.NET Framework 2.0)

-          http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html (BOOST)

-          http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html (Java)

-          http://www.ecma-international.org/publications/standards/Ecma-262.htm (JavaScript /ECMA-262)

-          http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx (JavaScript/JScript)

-          http://www.pcre.org/pcre.txt (PCRE)

-          http://perldoc.perl.org/perlre.html (Perl - current version)

-          http://search.cpan.org/~andyd/perl5.003_07/pod/perlre.pod (Perl 5.003_07)

-          http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html (POSIX Regular Expressions/IEEE Std 1003.1)

 

 

Here is proposed text for OVAL 5.6, including a subset of language elements drawn from the perlre web page, reduced to a more common subset:

The 'pattern match' operation allows an item to be tested against a regular expression. When used by an entity in an OVAL Object, the regular expression represents the set of matching objects on the system. OVAL supports a subset of regular expression character classes, operations, expressions and other lexical tokens defined within Perl 5's regular expression specification (See: http://search.cpan.org/~andyd/perl5.003_07/pod/perlre.pod), as noted below.

Modifiers are not supported for the ‘pattern match’ operation: case insensitive is always OFF and multiline is also always OFF.

POSIX Extended Regular Expressions (ERE) (http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html) are deprecated in this release, and will be removed in a future release.

Since POSIX ERE collating symbols (“[.symbol.]”) and equivalence classes (“[=symbol=]”) depend on POSIX-specific locale features, these are not supported by OVAL Regular Expressions.  Instead, use all applicable Unicode code-points within a bracket expression; e.g., rather than specify "[[=a=]b]", "[[=à=]b]", or "[[=â=]b]", instead provide "[aàâb]" for situations that warrant this class of matching.

Escaping characters which are not defined as metacharacters in this specification will result in indeterminate behavior.  Many implementations will support a superset of these metacharacters, with potentially different meaning than other implementations.

Character matching assumes a Unicode character set.  Note that no syntax is supplied for specifying code points in hex; actual Unicode characters must be used instead.

The language elements defined below include a common subset of Perl5-compatible Regular Expression syntaxes.  Specifically, this subset was drawn from .NET, BOOST, Java, JavaScript, PCRE, and Perl 5.0.  In the following, strikethrough text and [NOT IMPLEMENTED] indicate features found in Perl 5.003_07 which are disallowed, because they are not in the common sub-set.

METACHARACTERS

    \   Quote the next metacharacter

    ^   Match the beginning of the line

    .   Match any character (except newline)

    $   Match the end of the line (or before newline at the end)

    |   Alternation

    ()  Grouping

    []  Character class

GREEDY QUANTIFIERS

    *      Match 0 or more times
    +      Match 1 or more times
    ?      Match 1 or 0 times
    {n}    Match exactly n times
    {n,}   Match at least n times
    {n,m}  Match at least n but not more than m times

RELUCTANT QUANTIFIERS

    *?     Match 0 or more times
    +?     Match 1 or more times
    ??     Match 0 or 1 time
    {n}?   Match exactly n times
    {n,}?  Match at least n times
    {n,m}? Match at least n but not more than m times

ESCAPE SEQUENCES

    \t          tab                   (HT, TAB)
    \n          newline               (LF, NL)
    \r          return                (CR)
    \f          form feed             (FF)
    \a          alarm (bell)          (BEL) [NOT SUPPORTED]
    \e          escape (think troff)  (ESC) [NOT SUPPORTED]
    \033        octal char (think of a PDP-11)
    \x1B        hex char
    \c[         control char
    \l          lowercase next char (think vi) [NOT SUPPORTED]
    \u          uppercase next char (think vi) [NOT SUPPORTED]
    \L          lowercase till \E (think vi) [NOT SUPPORTED]
    \U          uppercase till \E (think vi) [NOT SUPPORTED]
    \E          end case modification (think vi) [NOT SUPPORTED]
    \Q          quote regexp metacharacters till \E [NOT SUPPORTED]

CHARACTER CLASSES

    \w  Match a "word" character (alphanumeric plus "_")
    \W  Match a non-word character
    \s  Match a whitespace character
    \S  Match a non-whitespace character
    \d  Match a digit character
    \D  Match a non-digit character

ZERO WIDTH ASSERTIONS

    \b  Match a word boundary
    \B  Match a non-(word boundary)
    \A  Match only at beginning of string [NOT SUPPORTED]
    \Z  Match only at end of string (or before newline at the end [NOT SUPPORTED]
    \G  Match only where previous m//g left off [NOT SUPPORTED]

EXTENSIONS

        (?#text)       - Comment [NOT SUPPORTED]
        (?:regexp)     - Group without capture
        (?=regexp)     - Zero-width positive lookahead assertion
        (?!regexp)     - Zero-width negative lookahead assertion
        (?imsx)        - Pattern match modifiers [NOT SUPPORTED]

VERSION 8 REGULAR EXPRESSIONS

        [chars]        - Match any of the specified characters
        [^chars]       - Match anything that is not one of the specified characters
        [a-b]          - Match any character in the range between “a” and “b”, inclusive
        a|b            - Alternation; match either the left side of the “|” or the right side
        \n             - When ‘n’ is a single digit: the nth capturing group matched.
        

POSIX CHARACTER CLASSES (DEPRECATED)

       [[:xxx:]]       positive POSIX named set
       [[:^xxx:]]      negative POSIX named set
 
       alnum           alphanumeric
       alpha           alphabetic
       blank           space or tab
       cntrl           control character
       digit           decimal digit
       graph           printing, excluding space
       lower           lower case letter
       print           printing, including space
       punct           printing, excluding alphanumeric
       space           whitespace
       upper           upper case letter
       xdigit          hexadecimal digit
 
       [:name:]        charclass definition in the POSIX LC_CTYPE category [NOT SUPPORTED]

 

 

 

 

Here is a discussion of probe-specific behaviors that can be supplied to enable case-insensitivity and multi-line matching.  These are OPTIONAL in our proposal, in that the preceeding can be implemented independently of the following.

 

 

== textfilecontent56_* ==

 

<Deprecate the existing textfilecontent54_* elements, and replace with textfilecontent56_*.  The only difference is to use Textfilecontent56Behaviors rather than TextfilecontentBehaviors>

 

== Textfilecontent56Behaviors ==

 

<text from TextfilecontentBehaviors goes here>

 

Also note that the ‘ignore_case’ and ‘multiline’ attributes of the ‘behaviors’ element apply only to the ‘pattern’ element.  These behaviors modify how specific strings produced by this element are matched against the file content; they do NOT modify the filtering behavior of the ‘pattern match’ operation on the output of this element itself, if such were to be used to filter some other list of pattern-match strings.

 

Attributes:


-

ignore_case

n/a

(optional -- default='false')

-

multiline

n/a

(optional -- default=true')

 

 

 

 

 “Perfection is reached not when there is nothing left to add, but when nothing more can be taken away.”  -- Antoine de Saint-Exupery


James Hugard
Foundstone Architect
[hidden email]
(949) 212-9894 (cell)


McAfee, Inc.
27201 Puerta Real, Suite 400
Mission Viejo, CA 92691
(949) 297-5603 (direct)

 

Note: This email may contain confidential and privileged information for the sole use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient please contact the sender and delete all copies of this message.

 

 

To unsubscribe, send an email message to [hidden email] with SIGNOFF OVAL-DEVELOPER-LIST in the BODY of the message. If you have difficulties, write to [hidden email].
bakerj

Re: Regular Expressions in OVAL 5.6

Reply Threaded More More options
Print post
Permalink
Some javascript/style in this post has been disabled (why?)

Let me start by thanking you for putting the time and effort in to considering regular expressions in OVAL.  You have raised a number of valid concerns with the changes planned for version 5.6 with regard to regular expression support in OVAL. As you are probably well aware arriving at a solution for regular expression support that will appease everyone is very difficult.  The good news is that at least everyone agreed that what was used in OVAL 5.5 and earlier needed to be changed.

 

Historically we have tried to avoid specifying our own regular expression syntax either by referring to someone else’s and then disallowing certain elements or by creating our own from scratch. The fear has been that if we define our own no tools will actually support it.  However, the approach you have suggested, referring to a baseline and then excluding the unwanted elements is appealing.

 

In Version 5.6 we discussed the possibility of deprecating support for posix vs. just dropping it. The concern was that deprecating posix in favor of something else suggested that two different flavors will be supported. This seems highly undesirable. It has been our intent for deprecated language constructs to be supported as long as they are in the oval language. Deprecated simply means that the item may be removed in a future release. Before making the decision to simply drop posix we polled the oval-developer-list and the oval-board to ensure that this change would not invalidate any content. We had responses verifying that content would not be impacted and we surveyed all of our own content.  My feeling is that we should consider the second two points of your proposal and I would like to continue with the decision to drop posix. However, the important question here is what do others think???

 

I like the suggestion of explicitly adding a multiline and ignore case behavior to the textfilecontent56_test. This will result in simpler more readable expressions and very explicitly stating when these two matching behaviors should be used in a format that is independent of the underlying regular expression syntax. However, assuming no other issues I am concerned with the notion of adding this into the release or delaying the release for just this change. My thought is to see how the rest of the discussion plays out and then decide how to proceed with this test.

 

In reviewing the suggested supported and excluded characters I don’t have any specific concerns. The subset you proposed seems reasonable.

 

At this point we need to hear from others about:

1.       In verison 5.6 was the decision to simply drop posix support instead of deprecate it valid?

 

2.       In verison 5.6 should we refer to a more specific version of the perl re documentation?

 

3.       In verison 5.6 should we limit the supported elements defined in the perl re documentation?

 

4.       In verison 5.6 should we add the proposed textfilecontent56_test?

 

 

Thanks,

 

Jon

 

============================================

Jonathan O. Baker

G022 - IA Industry Collaboration

The MITRE Corporation

Email: [hidden email]

 

From: [hidden email] [mailto:[hidden email]]
Sent: Friday, August 28, 2009 12:26 PM
To: oval-developer-list OVAL Developer List/Closed Public Discussion
Subject: [OVAL-DEVELOPER-LIST] Regular Expressions in OVAL 5.6
Importance: High

 

Sorry for bringing this to the table so late in the game, but as discussed at last week’s XCCDF Developers meeting there are several things to consider regarding Regular Expressions (REs) as specified in the OVAL 5.6 Release Candidate specification.  While some of this has been talked about on the OVAL mailing list, there are several points we feel bear additional discussion.

 

On the mailing list, there was some discussion of requiring “PCRE” for OVAL 5.6.  However, it is not clear from the text if this means “Perl Compatible Regular Expression” or if it means specifically the portable PCRE library at http://www.pcre.org/. If the latter, we feel this should be called out in the OVAL specification with a specific version’s syntax required, else incompatibilities in content are sure to crop up.  In any case, we would like to make an alternative proposal which takes the following into consideration.

 

 

DATA

 

1.       Requiring use of the PCRE library (http://www.pcre.org/) will preclude a pure-Java OVAL implementation, or implementation in some other non-C++ languages.

2.       By not introducing a new data type or behavior for Perl5 REs, existing OVAL content which uses POSIX-compatible REs may cease to work, depending on which library is in use.  While the open-source PCRE library and Perl itself support ERE character classes in addition to the Perl ones, even in the same RE, the BOOST library requires one to specify which syntax to use for any given RE: either Perl5, POSIX Basic REs (BRE), or POSIX Extended REs (ERE).

3.       We have observed OVAL content which includes a large number of ERE character classes (“[:alpha:], [:alnum:], [:blank:]”, etc.).

4.       It is fairly trivial to convert ERE character-classes into Perl5 character classes.

5.       Perl5 and ERE character classes do not overlap, so both can be supported simultaneously: ERE character classes are very unlikely to appear in a valid Perl5 bracket expression and Perl5 character classes are illegal in POSIX.

6.        “Perl5 regular expression” is a fairly loose term.  When a product says it uses REs that are “Perl5 compatible,” this tends to mean compatible with the original Perl 5.00 release, not the current release. There are many implementations make this claim, but they only share a subset of the full functionality from that release.

7.       RegEx’s tend to fail silently on incompatibilities between implementations.

8.       The link provided in the OVAL 5.6 specification which defines “Perl5 Regular Expression” (http://www.perl.com/doc/manual/html/pod/perlre.html) is not versioned.  This implies conforming OVAL implementations must track changes to Perl regular expressions and always implement the latest one.

9.       The link documents RE features which may not be widely considered “Perl 5 compatible.”  For example, specifying case sensitivity within the regular expression itself; e.g., “(?i)”.

10.   Since the OVAL specification does not indicate which parts of “Perl5 regular expressions” are required, one could be lead to assume all of them are required.  This implies a requirement to include the experimental “(?{ code })” construct, where “code” consists of arbitrary Perl code that will be evaluated as a zero-width assertion.

11.   While most Perl5 RE implementations support a number of common language elements, almost every RE library or application includes one or more unique language elements or a unique subset.  Because of this, those libraries and applications tend to document their particular flavor of RE.  (To be frank, I cannot think of any commercial or popular open-source application or library that supports regular expressions but that does not define exactly which language elements they support.)

12.   While an OVAL RE implementation should handle Unicode to, for example, match against strings in the Windows Registry, supplying these in hex is problematic because no compatible regular-expression sub-set exists: BOOST, PCRE, and Perl implementations use \x{2345} whereas most others use \u2345.

 

 

PROPOSAL

 

Our proposal consists of:

 

-          Indicating that POSIX ERE support is deprecated and will be removed in a future release.  Since ERE and Perl5 do not overlap and can be supported simultaneously, an implementation is allowed to support both.

-          Providing in the OVAL 5.6 specification a listing of the Regular Expression Language Elements (a Perl5 subset) that must be supported in order to be OVAL 5.6 compatible.  This would only be a listing, with full definitions supplied by the perlre link, or another RE language specification.

-          Providing a link to a specific version of the “perlre” documentation (Perl 5.00), hosted on the OVAL website, rather than pointing to an off-site unversioned resource.

 

The following documents were consulted with regards to this proposal:

 

-          http://msdn.microsoft.com/en-us/library/az24scfc(VS.80).aspx (.NET Framework 2.0)

-          http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html (BOOST)

-          http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html (Java)

-          http://www.ecma-international.org/publications/standards/Ecma-262.htm (JavaScript /ECMA-262)

-          http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx (JavaScript/JScript)

-          http://www.pcre.org/pcre.txt (PCRE)

-          http://perldoc.perl.org/perlre.html (Perl - current version)

-          http://search.cpan.org/~andyd/perl5.003_07/pod/perlre.pod (Perl 5.003_07)

-          http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html (POSIX Regular Expressions/IEEE Std 1003.1)

 

 

Here is proposed text for OVAL 5.6, including a subset of language elements drawn from the perlre web page, reduced to a more common subset:

The 'pattern match' operation allows an item to be tested against a regular expression. When used by an entity in an OVAL Object, the regular expression represents the set of matching objects on the system. OVAL supports a subset of regular expression character classes, operations, expressions and other lexical tokens defined within Perl 5's regular expression specification (See: http://search.cpan.org/~andyd/perl5.003_07/pod/perlre.pod), as noted below.

Modifiers are not supported for the ‘pattern match’ operation: case insensitive is always OFF and multiline is also always OFF.

POSIX Extended Regular Expressions (ERE) (http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html) are deprecated in this release, and will be removed in a future release.

Since POSIX ERE collating symbols (“[.symbol.]”) and equivalence classes (“[=symbol=]”) depend on POSIX-specific locale features, these are not supported by OVAL Regular Expressions.  Instead, use all applicable Unicode code-points within a bracket expression; e.g., rather than specify "[[=a=]b]", "[[=à=]b]", or "[[=â=]b]", instead provide "[aàâb]" for situations that warrant this class of matching.

Escaping characters which are not defined as metacharacters in this specification will result in indeterminate behavior.  Many implementations will support a superset of these metacharacters, with potentially different meaning than other implementations.

Character matching assumes a Unicode character set.  Note that no syntax is supplied for specifying code points in hex; actual Unicode characters must be used instead.

The language elements defined below include a common subset of Perl5-compatible Regular Expression syntaxes.  Specifically, this subset was drawn from .NET, BOOST, Java, JavaScript, PCRE, and Perl 5.0.  In the following, strikethrough text and [NOT IMPLEMENTED] indicate features found in Perl 5.003_07 which are disallowed, because they are not in the common sub-set.

METACHARACTERS

    \   Quote the next metacharacter

    ^   Match the beginning of the line

    .   Match any character (except newline)

    $   Match the end of the line (or before newline at the end)

    |   Alternation

    ()  Grouping

    []  Character class

GREEDY QUANTIFIERS

    *      Match 0 or more times
    +      Match 1 or more times
    ?      Match 1 or 0 times
    {n}    Match exactly n times
    {n,}   Match at least n times
    {n,m}  Match at least n but not more than m times

RELUCTANT QUANTIFIERS

    *?     Match 0 or more times
    +?     Match 1 or more times
    ??     Match 0 or 1 time
    {n}?   Match exactly n times
    {n,}?  Match at least n times
    {n,m}? Match at least n but not more than m times

ESCAPE SEQUENCES

    \t          tab                   (HT, TAB)
    \n          newline               (LF, NL)
    \r          return                (CR)
    \f          form feed             (FF)
    \a          alarm (bell)          (BEL) [NOT SUPPORTED]
    \e          escape (think troff)  (ESC) [NOT SUPPORTED]
    \033        octal char (think of a PDP-11)
    \x1B        hex char
    \c[         control char
    \l          lowercase next char (think vi) [NOT SUPPORTED]
    \u          uppercase next char (think vi) [NOT SUPPORTED]
    \L          lowercase till \E (think vi) [NOT SUPPORTED]
    \U          uppercase till \E (think vi) [NOT SUPPORTED]
    \E          end case modification (think vi) [NOT SUPPORTED]
    \Q          quote regexp metacharacters till \E [NOT SUPPORTED]

CHARACTER CLASSES

    \w  Match a "word" character (alphanumeric plus "_")
    \W  Match a non-word character
    \s  Match a whitespace character
    \S  Match a non-whitespace character
    \d  Match a digit character
    \D  Match a non-digit character

ZERO WIDTH ASSERTIONS

    \b  Match a word boundary
    \B  Match a non-(word boundary)
    \A  Match only at beginning of string [NOT SUPPORTED]
    \Z  Match only at end of string (or before newline at the end [NOT SUPPORTED]
    \G  Match only where previous m//g left off [NOT SUPPORTED]

EXTENSIONS

        (?#text)       - Comment [NOT SUPPORTED]
        (?:regexp)     - Group without capture
        (?=regexp)     - Zero-width positive lookahead assertion
        (?!regexp)     - Zero-width negative lookahead assertion
        (?imsx)        - Pattern match modifiers [NOT SUPPORTED]

VERSION 8 REGULAR EXPRESSIONS

        [chars]        - Match any of the specified characters
        [^chars]       - Match anything that is not one of the specified characters
        [a-b]          - Match any character in the range between “a” and “b”, inclusive
        a|b            - Alternation; match either the left side of the “|” or the right side
        \n             - When ‘n’ is a single digit: the nth capturing group matched.
        

POSIX CHARACTER CLASSES (DEPRECATED)

       [[:xxx:]]       positive POSIX named set
       [[:^xxx:]]      negative POSIX named set
 
       alnum           alphanumeric
       alpha           alphabetic
       blank           space or tab
       cntrl           control character
       digit           decimal digit
       graph           printing, excluding space
       lower           lower case letter
       print           printing, including space
       punct           printing, excluding alphanumeric
       space           whitespace
       upper           upper case letter
       xdigit          hexadecimal digit
 
       [:name:]        charclass definition in the POSIX LC_CTYPE category [NOT SUPPORTED]

 

 

 

 

Here is a discussion of probe-specific behaviors that can be supplied to enable case-insensitivity and multi-line matching.  These are OPTIONAL in our proposal, in that the preceeding can be implemented independently of the following.

 

 

== textfilecontent56_* ==

 

<Deprecate the existing textfilecontent54_* elements, and replace with textfilecontent56_*.  The only difference is to use Textfilecontent56Behaviors rather than TextfilecontentBehaviors>

 

== Textfilecontent56Behaviors ==

 

<text from TextfilecontentBehaviors goes here>

 

Also note that the ‘ignore_case’ and ‘multiline’ attributes of the ‘behaviors’ element apply only to the ‘pattern’ element.  These behaviors modify how specific strings produced by this element are matched against the file content; they do NOT modify the filtering behavior of the ‘pattern match’ operation on the output of this element itself, if such were to be used to filter some other list of pattern-match strings.

 

Attributes:


-

ignore_case

n/a

(optional -- default='false')

-

multiline

n/a

(optional -- default=true')

 

 

 

 

 “Perfection is reached not when there is nothing left to add, but when nothing more can be taken away.”  -- Antoine de Saint-Exupery


James Hugard
Foundstone Architect
[hidden email]
(949) 212-9894 (cell)


McAfee, Inc.
27201 Puerta Real, Suite 400
Mission Viejo, CA 92691
(949) 297-5603 (direct)

 

Note: This email may contain confidential and privileged information for the sole use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient please contact the sender and delete all copies of this message.

 

 

To unsubscribe, send an email message to [hidden email] with SIGNOFF OVAL-DEVELOPER-LIST in the BODY of the message. If you have difficulties, write to [hidden email].

To unsubscribe, send an email message to [hidden email] with SIGNOFF OVAL-DEVELOPER-LIST in the BODY of the message. If you have difficulties, write to [hidden email].
bakerj

Re: Regular Expressions in OVAL 5.6

Reply Threaded More More options
Print post
Permalink
Some javascript/style in this post has been disabled (why?)

After further consideration and deliberation of your recommendations for regular expressions in version 5.6 we have decided that the we should improve the documentation around the supported regular expression syntax in OVAL and add a new set of behaviors to the existing textfilecontent54_test to support the changes to the allowed regular expression syntax.

 

Here is a quick summary of the changes we would like to make:

-          Change Perl Compatible Regular Expression reference

The reference in the oval schema documentation that refers to Perl5 Regular Expressions will be updated to refer to a resource that is versioned.

 

-          Define a white list of supported Regular Expression elements

As recommended a white list of the supported regular expression elements will be listed on the oval web site. This list will be referenced from the oval language schemas. This list will consist of the most commonly supported elements in order to increase the likelihood that a given regular expression will be supported by an oval compliant tool. The list will match what was suggested with perhaps additional clarification around the supported modifiers. No assertion will be made about regular expression elements that are not included in the list.

 

-          Add behaviors to the textfilecontent54_test to control multiline and case sensitivity matching

The behaviors that were suggested for a new textfilecontent56_test will be added to the existing test. The default values for the behavior will align with the current intent of the test to allow multiline matches. This will ensure that the addition of the behavior does not change the meaning of existing content that uses the test. It is fairly common for a new behavior to be added to a test during a release as long as the new behavior is backwards compatible with content that already uses that test.

 

We feel that another week should be added to the release timeline to allow for these changes. The revised release timeline will be:

-          9/01/2009 – Version 5.6 RC 3 will be published

As soon as the above changes are made and the white list of supported regular expression elements is available an new release candidate will be posted.

 

-          9/11/2009 – New Version 5.6 release date

The release will be delayed by one week to allow for the changes that need to be made.

 

 

Thank you all for your continued review of this release.

 

Jon

 

============================================

Jonathan O. Baker

G022 - IA Industry Collaboration

The MITRE Corporation

Email: [hidden email]

 

From: Baker, Jon [mailto:[hidden email]]
Sent: Friday, August 28, 2009 4:46 PM
To: oval-developer-list OVAL Developer List/Closed Public Discussion
Subject: Re: [OVAL-DEVELOPER-LIST] Regular Expressions in OVAL 5.6

 

Let me start by thanking you for putting the time and effort in to considering regular expressions in OVAL.  You have raised a number of valid concerns with the changes planned for version 5.6 with regard to regular expression support in OVAL. As you are probably well aware arriving at a solution for regular expression support that will appease everyone is very difficult.  The good news is that at least everyone agreed that what was used in OVAL 5.5 and earlier needed to be changed.

 

Historically we have tried to avoid specifying our own regular expression syntax either by referring to someone else’s and then disallowing certain elements or by creating our own from scratch. The fear has been that if we define our own no tools will actually support it.  However, the approach you have suggested, referring to a baseline and then excluding the unwanted elements is appealing.

 

In Version 5.6 we discussed the possibility of deprecating support for posix vs. just dropping it. The concern was that deprecating posix in favor of something else suggested that two different flavors will be supported. This seems highly undesirable. It has been our intent for deprecated language constructs to be supported as long as they are in the oval language. Deprecated simply means that the item may be removed in a future release. Before making the decision to simply drop posix we polled the oval-developer-list and the oval-board to ensure that this change would not invalidate any content. We had responses verifying that content would not be impacted and we surveyed all of our own content.  My feeling is that we should consider the second two points of your proposal and I would like to continue with the decision to drop posix. However, the important question here is what do others think???

 

I like the suggestion of explicitly adding a multiline and ignore case behavior to the textfilecontent56_test. This will result in simpler more readable expressions and very explicitly stating when these two matching behaviors should be used in a format that is independent of the underlying regular expression syntax. However, assuming no other issues I am concerned with the notion of adding this into the release or delaying the release for just this change. My thought is to see how the rest of the discussion plays out and then decide how to proceed with this test.

 

In reviewing the suggested supported and excluded characters I don’t have any specific concerns. The subset you proposed seems reasonable.

 

At this point we need to hear from others about:

1.       In verison 5.6 was the decision to simply drop posix support instead of deprecate it valid?

 

2.       In verison 5.6 should we refer to a more specific version of the perl re documentation?

 

3.       In verison 5.6 should we limit the supported elements defined in the perl re documentation?

 

4.       In verison 5.6 should we add the proposed textfilecontent56_test?

 

 

Thanks,

 

Jon

 

============================================

Jonathan O. Baker

G022 - IA Industry Collaboration

The MITRE Corporation

Email: [hidden email]

 

From: [hidden email] [mailto:[hidden email]]
Sent: Friday, August 28, 2009 12:26 PM
To: oval-developer-list OVAL Developer List/Closed Public Discussion
Subject: [OVAL-DEVELOPER-LIST] Regular Expressions in OVAL 5.6
Importance: High

 

Sorry for bringing this to the table so late in the game, but as discussed at last week’s XCCDF Developers meeting there are several things to consider regarding Regular Expressions (REs) as specified in the OVAL 5.6 Release Candidate specification.  While some of this has been talked about on the OVAL mailing list, there are several points we feel bear additional discussion.

 

On the mailing list, there was some discussion of requiring “PCRE” for OVAL 5.6.  However, it is not clear from the text if this means “Perl Compatible Regular Expression” or if it means specifically the portable PCRE library at http://www.pcre.org/. If the latter, we feel this should be called out in the OVAL specification with a specific version’s syntax required, else incompatibilities in content are sure to crop up.  In any case, we would like to make an alternative proposal which takes the following into consideration.

 

 

DATA

 

1.       Requiring use of the PCRE library (http://www.pcre.org/) will preclude a pure-Java OVAL implementation, or implementation in some other non-C++ languages.

2.       By not introducing a new data type or behavior for Perl5 REs, existing OVAL content which uses POSIX-compatible REs may cease to work, depending on which library is in use.  While the open-source PCRE library and Perl itself support ERE character classes in addition to the Perl ones, even in the same RE, the BOOST library requires one to specify which syntax to use for any given RE: either Perl5, POSIX Basic REs (BRE), or POSIX Extended REs (ERE).

3.       We have observed OVAL content which includes a large number of ERE character classes (“[:alpha:], [:alnum:], [:blank:]”, etc.).

4.       It is fairly trivial to convert ERE character-classes into Perl5 character classes.

5.       Perl5 and ERE character classes do not overlap, so both can be supported simultaneously: ERE character classes are very unlikely to appear in a valid Perl5 bracket expression and Perl5 character classes are illegal in POSIX.

6.        “Perl5 regular expression” is a fairly loose term.  When a product says it uses REs that are “Perl5 compatible,” this tends to mean compatible with the original Perl 5.00 release, not the current release. There are many implementations make this claim, but they only share a subset of the full functionality from that release.

7.       RegEx’s tend to fail silently on incompatibilities between implementations.

8.       The link provided in the OVAL 5.6 specification which defines “Perl5 Regular Expression” (http://www.perl.com/doc/manual/html/pod/perlre.html) is not versioned.  This implies conforming OVAL implementations must track changes to Perl regular expressions and always implement the latest one.

9.       The link documents RE features which may not be widely considered “Perl 5 compatible.”  For example, specifying case sensitivity within the regular expression itself; e.g., “(?i)”.

10.   Since the OVAL specification does not indicate which parts of “Perl5 regular expressions” are required, one could be lead to assume all of them are required.  This implies a requirement to include the experimental “(?{ code })” construct, where “code” consists of arbitrary Perl code that will be evaluated as a zero-width assertion.

11.   While most Perl5 RE implementations support a number of common language elements, almost every RE library or application includes one or more unique language elements or a unique subset.  Because of this, those libraries and applications tend to document their particular flavor of RE.  (To be frank, I cannot think of any commercial or popular open-source application or library that supports regular expressions but that does not define exactly which language elements they support.)

12.   While an OVAL RE implementation should handle Unicode to, for example, match against strings in the Windows Registry, supplying these in hex is problematic because no compatible regular-expression sub-set exists: BOOST, PCRE, and Perl implementations use \x{2345} whereas most others use \u2345.

 

 

PROPOSAL

 

Our proposal consists of:

 

-          Indicating that POSIX ERE support is deprecated and will be removed in a future release.  Since ERE and Perl5 do not overlap and can be supported simultaneously, an implementation is allowed to support both.

-          Providing in the OVAL 5.6 specification a listing of the Regular Expression Language Elements (a Perl5 subset) that must be supported in order to be OVAL 5.6 compatible.  This would only be a listing, with full definitions supplied by the perlre link, or another RE language specification.

-          Providing a link to a specific version of the “perlre” documentation (Perl 5.00), hosted on the OVAL website, rather than pointing to an off-site unversioned resource.

 

The following documents were consulted with regards to this proposal:

 

-          http://msdn.microsoft.com/en-us/library/az24scfc(VS.80).aspx (.NET Framework 2.0)

-          http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html (BOOST)

-          http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html (Java)

-          http://www.ecma-international.org/publications/standards/Ecma-262.htm (JavaScript /ECMA-262)

-          http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx (JavaScript/JScript)

-          http://www.pcre.org/pcre.txt (PCRE)

-          http://perldoc.perl.org/perlre.html (Perl - current version)

-          http://search.cpan.org/~andyd/perl5.003_07/pod/perlre.pod (Perl 5.003_07)

-          http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html (POSIX Regular Expressions/IEEE Std 1003.1)

 

 

Here is proposed text for OVAL 5.6, including a subset of language elements drawn from the perlre web page, reduced to a more common subset:

The 'pattern match' operation allows an item to be tested against a regular expression. When used by an entity in an OVAL Object, the regular expression represents the set of matching objects on the system. OVAL supports a subset of regular expression character classes, operations, expressions and other lexical tokens defined within Perl 5's regular expression specification (See: http://search.cpan.org/~andyd/perl5.003_07/pod/perlre.pod), as noted below.

Modifiers are not supported for the ‘pattern match’ operation: case insensitive is always OFF and multiline is also always OFF.

POSIX Extended Regular Expressions (ERE) (http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html) are deprecated in this release, and will be removed in a future release.

Since POSIX ERE collating symbols (“[.symbol.]”) and equivalence classes (“[=symbol=]”) depend on POSIX-specific locale features, these are not supported by OVAL Regular Expressions.  Instead, use all applicable Unicode code-points within a bracket expression; e.g., rather than specify "[[=a=]b]", "[[=à=]b]", or "[[=â=]b]", instead provide "[aàâb]" for situations that warrant this class of matching.

Escaping characters which are not defined as metacharacters in this specification will result in indeterminate behavior.  Many implementations will support a superset of these metacharacters, with potentially different meaning than other implementations.

Character matching assumes a Unicode character set.  Note that no syntax is supplied for specifying code points in hex; actual Unicode characters must be used instead.

The language elements defined below include a common subset of Perl5-compatible Regular Expression syntaxes.  Specifically, this subset was drawn from .NET, BOOST, Java, JavaScript, PCRE, and Perl 5.0.  In the following, strikethrough text and [NOT IMPLEMENTED] indicate features found in Perl 5.003_07 which are disallowed, because they are not in the common sub-set.

METACHARACTERS

    \   Quote the next metacharacter

    ^   Match the beginning of the line

    .   Match any character (except newline)

    $   Match the end of the line (or before newline at the end)

    |   Alternation

    ()  Grouping

    []  Character class

GREEDY QUANTIFIERS

    *      Match 0 or more times
    +      Match 1 or more times
    ?      Match 1 or 0 times
    {n}    Match exactly n times
    {n,}   Match at least n times
    {n,m}  Match at least n but not more than m times

RELUCTANT QUANTIFIERS

    *?     Match 0 or more times
    +?     Match 1 or more times
    ??     Match 0 or 1 time
    {n}?   Match exactly n times
    {n,}?  Match at least n times
    {n,m}? Match at least n but not more than m times

ESCAPE SEQUENCES

    \t          tab                   (HT, TAB)
    \n          newline               (LF, NL)
    \r          return                (CR)
    \f          form feed             (FF)
    \a          alarm (bell)          (BEL) [NOT SUPPORTED]
    \e          escape (think troff)  (ESC) [NOT SUPPORTED]
    \033        octal char (think of a PDP-11)
    \x1B        hex char
    \c[         control char
    \l          lowercase next char (think vi) [NOT SUPPORTED]
    \u          uppercase next char (think vi) [NOT SUPPORTED]
    \L          lowercase till \E (think vi) [NOT SUPPORTED]
    \U          uppercase till \E (think vi) [NOT SUPPORTED]
    \E          end case modification (think vi) [NOT SUPPORTED]
    \Q          quote regexp metacharacters till \E [NOT SUPPORTED]

CHARACTER CLASSES

    \w  Match a "word" character (alphanumeric plus "_")
    \W  Match a non-word character
    \s  Match a whitespace character
    \S  Match a non-whitespace character
    \d  Match a digit character
    \D  Match a non-digit character

ZERO WIDTH ASSERTIONS

    \b  Match a word boundary
    \B  Match a non-(word boundary)
    \A  Match only at beginning of string [NOT SUPPORTED]
    \Z  Match only at end of string (or before newline at the end [NOT SUPPORTED]
    \G  Match only where previous m//g left off [NOT SUPPORTED]

EXTENSIONS

        (?#text)       - Comment [NOT SUPPORTED]
        (?:regexp)     - Group without capture
        (?=regexp)     - Zero-width positive lookahead assertion
        (?!regexp)     - Zero-width negative lookahead assertion
        (?imsx)        - Pattern match modifiers [NOT SUPPORTED]

VERSION 8 REGULAR EXPRESSIONS

        [chars]        - Match any of the specified characters
        [^chars]       - Match anything that is not one of the specified characters
        [a-b]          - Match any character in the range between “a” and “b”, inclusive
        a|b            - Alternation; match either the left side of the “|” or the right side
        \n             - When ‘n’ is a single digit: the nth capturing group matched.
        

POSIX CHARACTER CLASSES (DEPRECATED)

       [[:xxx:]]       positive POSIX named set
       [[:^xxx:]]      negative POSIX named set
 
       alnum           alphanumeric
       alpha           alphabetic
       blank           space or tab
       cntrl           control character
       digit           decimal digit
       graph           printing, excluding space
       lower           lower case letter
       print           printing, including space
       punct           printing, excluding alphanumeric
       space           whitespace
       upper           upper case letter
       xdigit          hexadecimal digit
 
       [:name:]        charclass definition in the POSIX LC_CTYPE category [NOT SUPPORTED]

 

 

 

 

Here is a discussion of probe-specific behaviors that can be supplied to enable case-insensitivity and multi-line matching.  These are OPTIONAL in our proposal, in that the preceeding can be implemented independently of the following.

 

 

== textfilecontent56_* ==

 

<Deprecate the existing textfilecontent54_* elements, and replace with textfilecontent56_*.  The only difference is to use Textfilecontent56Behaviors rather than TextfilecontentBehaviors>

 

== Textfilecontent56Behaviors ==

 

<text from TextfilecontentBehaviors goes here>

 

Also note that the ‘ignore_case’ and ‘multiline’ attributes of the ‘behaviors’ element apply only to the ‘pattern’ element.  These behaviors modify how specific strings produced by this element are matched against the file content; they do NOT modify the filtering behavior of the ‘pattern match’ operation on the output of this element itself, if such were to be used to filter some other list of pattern-match strings.

 

Attributes:


-

ignore_case

n/a

(optional -- default='false')

-

multiline

n/a

(optional -- default=true')

 

 

 

 

 “Perfection is reached not when there is nothing left to add, but when nothing more can be taken away.”  -- Antoine de Saint-Exupery


James Hugard
Foundstone Architect
[hidden email]
(949) 212-9894 (cell)


McAfee, Inc.
27201 Puerta Real, Suite 400
Mission Viejo, CA 92691
(949) 297-5603 (direct)

 

Note: This email may contain confidential and privileged information for the sole use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient please contact the sender and delete all copies of this message.

 

 

To unsubscribe, send an email message to [hidden email] with SIGNOFF OVAL-DEVELOPER-LIST in the BODY of the message. If you have difficulties, write to [hidden email].

To unsubscribe, send an email message to [hidden email] with SIGNOFF OVAL-DEVELOPER-LIST in the BODY of the message. If you have difficulties, write to [hidden email].

To unsubscribe, send an email message to [hidden email] with SIGNOFF OVAL-DEVELOPER-LIST in the BODY of the message. If you have difficulties, write to [hidden email].