|
|
|
James Hugard
|
Some javascript/style in this post has been disabled (why?)
Sorry for bringing this to the table so late in the game,
but as discussed at last week’s XCCDF Developers meeting there are
several things to consider regarding Regular Expressions (REs) as specified in the
OVAL 5.6 Release Candidate specification. While some of this has been
talked about on the OVAL mailing list, there are several points we feel bear
additional discussion. On the mailing list, there was some discussion of requiring
“PCRE” for OVAL 5.6. However, it is not clear from the text
if this means “Perl Compatible Regular Expression” or if it means
specifically the portable PCRE library at http://www.pcre.org/. If
the latter, we feel this should be called out in the OVAL specification with a
specific version’s syntax required, else incompatibilities in content are
sure to crop up. In any case, we would like to make an alternative
proposal which takes the following into consideration. DATA 1. Requiring
use of the PCRE library (http://www.pcre.org/)
will preclude a pure-Java OVAL implementation, or implementation in some other
non-C++ languages. 2. By
not introducing a new data type or behavior for Perl5 REs, existing OVAL content
which uses POSIX-compatible REs may cease to work, depending on which library
is in use. While the open-source PCRE library and Perl itself support ERE
character classes in addition to the Perl ones, even in the same RE, the BOOST
library requires one to specify which syntax to use for any given RE: either
Perl5, POSIX Basic REs (BRE), or POSIX Extended REs (ERE). 3. We
have observed OVAL content which includes a large number of ERE character
classes (“[:alpha:], [:alnum:],
[:blank:]”, etc.). 4. It
is fairly trivial to convert ERE character-classes into Perl5 character
classes. 5. Perl5
and ERE character classes do not overlap, so both can be supported
simultaneously: ERE character classes are very unlikely to appear in a valid
Perl5 bracket expression and Perl5 character classes are illegal in POSIX. 6. “Perl5
regular expression” is a fairly loose term. When a product says it
uses REs that are “Perl5 compatible,” this tends to mean compatible
with the original Perl 5.00 release, not the current release. There are many
implementations make this claim, but they only share a subset of the full
functionality from that release. 7. RegEx’s
tend to fail silently on incompatibilities between implementations. 8. The
link provided in the OVAL 5.6 specification which defines “Perl5 Regular
Expression” (http://www.perl.com/doc/manual/html/pod/perlre.html)
is not versioned. This implies conforming OVAL implementations must track
changes to Perl regular expressions and always implement the latest one. 9. The
link documents RE features which may not be widely considered “Perl 5
compatible.” For example, specifying case sensitivity within the
regular expression itself; e.g., “(?i)”. 10. Since
the OVAL specification does not indicate which parts of “Perl5 regular
expressions” are required, one could be lead to assume all of them are
required. This implies a requirement to include the experimental
“(?{ code })” construct, where “code” consists of
arbitrary Perl code that will be evaluated as a zero-width assertion. 11. While
most Perl5 RE implementations support a number of common language elements,
almost every RE library or application includes one or more unique language
elements or a unique subset. Because of this, those libraries and
applications tend to document their particular flavor of RE. (To be
frank, I cannot think of any commercial or popular open-source application or
library that supports regular expressions but that does not define exactly
which language elements they support.) 12. While
an OVAL RE implementation should handle Unicode to, for example, match against
strings in the Windows Registry, supplying these in hex is problematic because
no compatible regular-expression sub-set exists: BOOST, PCRE, and Perl
implementations use \x{2345} whereas most others use \u2345. PROPOSAL Our proposal consists of: -
Indicating that POSIX ERE support is deprecated and
will be removed in a future release. Since ERE and Perl5 do not overlap
and can be supported simultaneously, an implementation is allowed to support
both. -
Providing in the OVAL 5.6 specification a listing of
the Regular Expression Language Elements (a Perl5 subset) that must be
supported in order to be OVAL 5.6 compatible. This would only be a
listing, with full definitions supplied by the perlre link, or another RE
language specification. -
Providing a link to a specific version of the
“perlre” documentation (Perl 5.00), hosted on the OVAL website,
rather than pointing to an off-site unversioned resource. The following documents were consulted with regards to this
proposal: -
http://msdn.microsoft.com/en-us/library/az24scfc(VS.80).aspx
(.NET Framework 2.0) -
http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
(BOOST) -
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
(Java) -
http://www.ecma-international.org/publications/standards/Ecma-262.htm
(JavaScript /ECMA-262) -
http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx
(JavaScript/JScript) -
http://www.pcre.org/pcre.txt
(PCRE) -
http://perldoc.perl.org/perlre.html
(Perl - current version) -
http://search.cpan.org/~andyd/perl5.003_07/pod/perlre.pod
(Perl 5.003_07) -
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html
(POSIX Regular Expressions/IEEE Std 1003.1) Here is proposed text for OVAL 5.6, including a subset of
language elements drawn from the perlre web page, reduced to a more common
subset: The
'pattern match' operation allows an item to be tested against a regular
expression. When used by an entity in an OVAL Object, the regular expression
represents the set of matching objects on the system. OVAL supports a subset of
regular expression character classes, operations, expressions and other lexical
tokens defined within Perl 5's regular expression specification (See: http://search.cpan.org/~andyd/perl5.003_07/pod/perlre.pod),
as noted below. Modifiers
are not supported for the ‘pattern match’ operation: case
insensitive is always OFF and multiline is also always OFF. POSIX
Extended Regular Expressions (ERE) (http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html) are deprecated in
this release, and will be removed in a future release. Since
POSIX ERE collating symbols (“[.symbol.]”) and equivalence classes
(“[=symbol=]”) depend on POSIX-specific locale features, these are
not supported by OVAL Regular Expressions. Instead, use all applicable
Unicode code-points within a bracket expression; e.g., rather than specify "[[=a=]b]", "[[=à=]b]", or "[[=â=]b]", instead provide
"[aàâb]" for situations that warrant this class of matching. Escaping
characters which are not defined as metacharacters in this specification will
result in indeterminate behavior. Many implementations will support a
superset of these metacharacters, with potentially different meaning than other
implementations. Character
matching assumes a Unicode character set. Note that no syntax is supplied
for specifying code points in hex; actual Unicode characters must be used
instead. The
language elements defined below include a common subset of Perl5-compatible
Regular Expression syntaxes. Specifically, this subset was drawn from
.NET, BOOST, Java, JavaScript, PCRE, and Perl 5.0. In the following, METACHARACTERS \ Quote
the next metacharacter ^ Match
the beginning of the line . Match
any character (except newline) $ Match
the end of the line (or before newline at the end) |
Alternation () Grouping [] Character
class GREEDY
QUANTIFIERS * Match 0 or more times + Match 1 or more times ? Match 1 or 0 times {n} Match exactly n times {n,} Match at least n times {n,m} Match at least n but not more than m timesRELUCTANT
QUANTIFIERS *? Match 0 or more times +? Match 1 or more times ?? Match 0 or 1 time {n}? Match exactly n times {n,}? Match at least n times {n,m}? Match at least n but not more than m timesESCAPE
SEQUENCES \t tab (HT, TAB) \n newline (LF, NL) \r return (CR) \f form feed (FF)
\033 octal char (think of a PDP-11) \x1B hex char \c[ control char
CHARACTER
CLASSES \w Match a "word" character (alphanumeric plus "_") \W Match a non-word character \s Match a whitespace character \S Match a non-whitespace character \d Match a digit character \D Match a non-digit characterZERO
WIDTH ASSERTIONS \b Match a word boundary \B Match a non-(word boundary)
EXTENSIONS (?:regexp) - Group without capture (?=regexp) - Zero-width positive lookahead assertion (?!regexp) - Zero-width negative lookahead assertion VERSION
8 REGULAR EXPRESSIONS [chars] - Match any of the specified characters [^chars] - Match anything that is not one of the specified characters [a-b] - Match any character in the range between “a” and “b”, inclusive a|b - Alternation; match either the left side of the “|” or the right side \n - When ‘n’ is a single digit: the nth capturing group matched. POSIX
CHARACTER CLASSES (DEPRECATED) [[:xxx:]] positive POSIX named set [[:^xxx:]] negative POSIX named set alnum alphanumeric alpha alphabetic blank space or tab cntrl control character digit decimal digit graph printing, excluding space lower lower case letter print printing, including space punct printing, excluding alphanumeric space whitespace upper upper case letter xdigit hexadecimal digit
Here is a discussion of probe-specific behaviors that can be
supplied to enable case-insensitivity and multi-line matching. These are
OPTIONAL in our proposal, in that the preceeding can be implemented independently
of the following. == textfilecontent56_* == <Deprecate the existing
textfilecontent54_* elements, and replace with textfilecontent56_*. The
only difference is to use Textfilecontent56Behaviors rather than
TextfilecontentBehaviors> == Textfilecontent56Behaviors == <text from
TextfilecontentBehaviors goes here> Also note that the
‘ignore_case’ and ‘multiline’ attributes of the
‘behaviors’ element apply only to the ‘pattern’
element. These behaviors modify how specific strings produced by this
element are matched against the file content; they do NOT modify the filtering
behavior of the ‘pattern match’ operation on the output of this
element itself, if such were to be used to filter some other list of
pattern-match strings.
Note: This email may contain confidential and privileged
information for the sole use of the intended recipient. Any review or
distribution by others is strictly prohibited. If you are not the intended
recipient please contact the sender and delete all copies of this message. |
||||||||||||||||||||||||||||||||||||||||||||
|
bakerj
|
Some javascript/style in this post has been disabled (why?)
Let me start by thanking you for
putting the time and effort in to considering regular expressions in OVAL. You
have raised a number of valid concerns with the changes planned for version 5.6
with regard to regular expression support in OVAL. As you are probably well
aware arriving at a solution for regular expression support that will appease everyone
is very difficult. The good news is that at least everyone agreed that what
was used in OVAL 5.5 and earlier needed to be changed. Historically we have tried to
avoid specifying our own regular expression syntax either by referring to someone
else’s and then disallowing certain elements or by creating our own from
scratch. The fear has been that if we define our own no tools will actually support
it. However, the approach you have suggested, referring to a baseline and then
excluding the unwanted elements is appealing. In Version 5.6 we discussed the
possibility of deprecating support for posix vs. just dropping it. The concern
was that deprecating posix in favor of something else suggested that two different
flavors will be supported. This seems highly undesirable. It has been our
intent for deprecated language constructs to be supported as long as they are in
the oval language. Deprecated simply means that the item may be removed in a
future release. Before making the decision to simply drop posix we polled the oval-developer-list
and the oval-board to ensure that this change would not invalidate any content.
We had responses verifying that content would not be impacted and we surveyed all
of our own content. My feeling is that we should consider the second two points
of your proposal and I would like to continue with the decision to drop posix. However,
the important question here is what do others think??? I like the suggestion of
explicitly adding a multiline and ignore case behavior to the textfilecontent56_test.
This will result in simpler more readable expressions and very explicitly
stating when these two matching behaviors should be used in a format that is independent
of the underlying regular expression syntax. However, assuming no other issues I
am concerned with the notion of adding this into the release or delaying the
release for just this change. My thought is to see how the rest of the
discussion plays out and then decide how to proceed with this test. In reviewing the suggested
supported and excluded characters I don’t have any specific concerns. The
subset you proposed seems reasonable. At this point we need to hear
from others about: 1.
In verison 5.6 was
the decision to simply drop posix support instead of deprecate it valid? 2.
In verison 5.6 should
we refer to a more specific version of the perl re documentation? 3.
In verison 5.6 should
we limit the supported elements defined in the perl re documentation? 4.
In verison 5.6 should
we add the proposed textfilecontent56_test? Thanks, Jon ============================================ Jonathan O. Baker G022 - IA Industry Collaboration The MITRE Corporation Email: [hidden email] From:
[hidden email] [mailto:[hidden email]] Sorry for bringing this to the table so late in the game,
but as discussed at last week’s XCCDF Developers meeting there are
several things to consider regarding Regular Expressions (REs) as specified in
the OVAL 5.6 Release Candidate specification. While some of this has been
talked about on the OVAL mailing list, there are several points we feel bear
additional discussion. On the mailing list, there was some discussion of requiring
“PCRE” for OVAL 5.6. However, it is not clear from the text
if this means “Perl Compatible Regular Expression” or if it means
specifically the portable PCRE library at http://www.pcre.org/. If
the latter, we feel this should be called out in the OVAL specification with a
specific version’s syntax required, else incompatibilities in content are
sure to crop up. In any case, we would like to make an alternative
proposal which takes the following into consideration. DATA 1. Requiring
use of the PCRE library (http://www.pcre.org/)
will preclude a pure-Java OVAL implementation, or implementation in some other
non-C++ languages. 2. By
not introducing a new data type or behavior for Perl5 REs, existing OVAL
content which uses POSIX-compatible REs may cease to work, depending on which
library is in use. While the open-source PCRE library and Perl itself
support ERE character classes in addition to the Perl ones, even in the same
RE, the BOOST library requires one to specify which syntax to use for any given
RE: either Perl5, POSIX Basic REs (BRE), or POSIX Extended REs (ERE). 3. We
have observed OVAL content which includes a large number of ERE character
classes (“[:alpha:], [:alnum:],
[:blank:]”, etc.). 4. It
is fairly trivial to convert ERE character-classes into Perl5 character
classes. 5. Perl5
and ERE character classes do not overlap, so both can be supported
simultaneously: ERE character classes are very unlikely to appear in a valid
Perl5 bracket expression and Perl5 character classes are illegal in POSIX. 6. “Perl5
regular expression” is a fairly loose term. When a product says it
uses REs that are “Perl5 compatible,” this tends to mean compatible
with the original Perl 5.00 release, not the current release. There are many
implementations make this claim, but they only share a subset of the full
functionality from that release. 7. RegEx’s
tend to fail silently on incompatibilities between implementations. 8. The
link provided in the OVAL 5.6 specification which defines “Perl5 Regular
Expression” (http://www.perl.com/doc/manual/html/pod/perlre.html)
is not versioned. This implies conforming OVAL implementations must track
changes to Perl regular expressions and always implement the latest one. 9. The
link documents RE features which may not be widely considered “Perl 5
compatible.” For example, specifying case sensitivity within the
regular expression itself; e.g., “(?i)”. 10. Since
the OVAL specification does not indicate which parts of “Perl5 regular
expressions” are required, one could be lead to assume all of them are
required. This implies a requirement to include the experimental
“(?{ code })” construct, where “code” consists of
arbitrary Perl code that will be evaluated as a zero-width assertion. 11. While
most Perl5 RE implementations support a number of common language elements,
almost every RE library or application includes one or more unique language
elements or a unique subset. Because of this, those libraries and
applications tend to document their particular flavor of RE. (To be
frank, I cannot think of any commercial or popular open-source application or
library that supports regular expressions but that does not define exactly
which language elements they support.) 12. While
an OVAL RE implementation should handle Unicode to, for example, match against
strings in the Windows Registry, supplying these in hex is problematic because
no compatible regular-expression sub-set exists: BOOST, PCRE, and Perl implementations
use \x{2345} whereas most others use \u2345. PROPOSAL Our proposal consists of: -
Indicating that POSIX ERE support is deprecated and
will be removed in a future release. Since ERE and Perl5 do not overlap
and can be supported simultaneously, an implementation is allowed to support
both. -
Providing in the OVAL 5.6 specification a listing of
the Regular Expression Language Elements (a Perl5 subset) that must be
supported in order to be OVAL 5.6 compatible. This would only be a
listing, with full definitions supplied by the perlre link, or another RE
language specification. -
Providing a link to a specific version of the
“perlre” documentation (Perl 5.00), hosted on the OVAL website,
rather than pointing to an off-site unversioned resource. The following documents were consulted with regards to this
proposal: -
http://msdn.microsoft.com/en-us/library/az24scfc(VS.80).aspx
(.NET Framework 2.0) -
http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
(BOOST) -
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
(Java) -
http://www.ecma-international.org/publications/standards/Ecma-262.htm
(JavaScript /ECMA-262) -
http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx
(JavaScript/JScript) -
http://www.pcre.org/pcre.txt
(PCRE) -
http://perldoc.perl.org/perlre.html
(Perl - current version) -
http://search.cpan.org/~andyd/perl5.003_07/pod/perlre.pod
(Perl 5.003_07) -
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html
(POSIX Regular Expressions/IEEE Std 1003.1) Here is proposed text for OVAL 5.6, including a subset of
language elements drawn from the perlre web page, reduced to a more common
subset: The
'pattern match' operation allows an item to be tested against a regular
expression. When used by an entity in an OVAL Object, the regular expression represents
the set of matching objects on the system. OVAL supports a subset of regular
expression character classes, operations, expressions and other lexical tokens
defined within Perl 5's regular expression specification (See: http://search.cpan.org/~andyd/perl5.003_07/pod/perlre.pod),
as noted below. Modifiers
are not supported for the ‘pattern match’ operation: case
insensitive is always OFF and multiline is also always OFF. POSIX
Extended Regular Expressions (ERE) (http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html) are deprecated in
this release, and will be removed in a future release. Since
POSIX ERE collating symbols (“[.symbol.]”) and equivalence classes
(“[=symbol=]”) depend on POSIX-specific locale features, these are
not supported by OVAL Regular Expressions. Instead, use all applicable
Unicode code-points within a bracket expression; e.g., rather than specify "[[=a=]b]", "[[=à=]b]", or "[[=â=]b]", instead provide
"[aàâb]" for situations that warrant this class of matching. Escaping
characters which are not defined as metacharacters in this specification will
result in indeterminate behavior. Many implementations will support a
superset of these metacharacters, with potentially different meaning than other
implementations. Character
matching assumes a Unicode character set. Note that no syntax is supplied
for specifying code points in hex; actual Unicode characters must be used
instead. The
language elements defined below include a common subset of Perl5-compatible
Regular Expression syntaxes. Specifically, this subset was drawn from
.NET, BOOST, Java, JavaScript, PCRE, and Perl 5.0. In the following, METACHARACTERS \ Quote
the next metacharacter ^ Match
the beginning of the line . Match
any character (except newline) $ Match
the end of the line (or before newline at the end) |
Alternation () Grouping [] Character
class GREEDY
QUANTIFIERS * Match 0 or more times + Match 1 or more times ? Match 1 or 0 times {n} Match exactly n times {n,} Match at least n times {n,m} Match at least n but not more than m timesRELUCTANT
QUANTIFIERS *? Match 0 or more times +? Match 1 or more times ?? Match 0 or 1 time {n}? Match exactly n times {n,}? Match at least n times {n,m}? Match at least n but not more than m timesESCAPE
SEQUENCES \t tab (HT, TAB) \n newline (LF, NL) \r return (CR) \f form feed (FF)
\033 octal char (think of a PDP-11) \x1B hex char \c[ control char
CHARACTER
CLASSES \w Match a "word" character (alphanumeric plus "_") \W Match a non-word character \s Match a whitespace character \S Match a non-whitespace character \d Match a digit character \D Match a non-digit characterZERO
WIDTH ASSERTIONS \b Match a word boundary \B Match a non-(word boundary)
EXTENSIONS (?:regexp) - Group without capture (?=regexp) - Zero-width positive lookahead assertion (?!regexp) - Zero-width negative lookahead assertion VERSION
8 REGULAR EXPRESSIONS [chars] - Match any of the specified characters [^chars] - Match anything that is not one of the specified characters [a-b] - Match any character in the range between “a” and “b”, inclusive a|b - Alternation; match either the left side of the “|” or the right side \n - When ‘n’ is a single digit: the nth capturing group matched. POSIX
CHARACTER CLASSES (DEPRECATED) [[:xxx:]] positive POSIX named set [[:^xxx:]] negative POSIX named set alnum alphanumeric alpha alphabetic blank space or tab cntrl control character digit decimal digit graph printing, excluding space lower lower case letter print printing, including space punct printing, excluding alphanumeric space whitespace upper upper case letter xdigit hexadecimal digit
Here is a discussion of probe-specific behaviors that can be
supplied to enable case-insensitivity and multi-line matching. These are
OPTIONAL in our proposal, in that the preceeding can be implemented
independently of the following. == textfilecontent56_* == <Deprecate the existing textfilecontent54_*
elements, and replace with textfilecontent56_*. The only difference is to
use Textfilecontent56Behaviors rather than TextfilecontentBehaviors> == Textfilecontent56Behaviors == <text from
TextfilecontentBehaviors goes here> Also note that the
‘ignore_case’ and ‘multiline’ attributes of the
‘behaviors’ element apply only to the ‘pattern’
element. These behaviors modify how specific strings produced by this
element are matched against the file content; they do NOT modify the filtering behavior
of the ‘pattern match’ operation on the output of this element
itself, if such were to be used to filter some other list of pattern-match
strings.
Note: This email may contain confidential and privileged
information for the sole use of the intended recipient. Any review or
distribution by others is strictly prohibited. If you are not the intended
recipient please contact the sender and delete all copies of this message. To
unsubscribe, send an email message to [hidden email] with SIGNOFF
OVAL-DEVELOPER-LIST in the BODY of the message. If you have difficulties, write
to [hidden email]. |
||||||||||||||||||||||||||||||||||||||||||||
|
bakerj
|
Some javascript/style in this post has been disabled (why?)
After further consideration and
deliberation of your recommendations for regular expressions in version 5.6 we
have decided that the we should improve the documentation around the supported
regular expression syntax in OVAL and add a new set of behaviors to the
existing textfilecontent54_test to support the changes to the allowed regular expression
syntax. Here is a quick summary of the
changes we would like to make: -
Change Perl
Compatible Regular Expression reference The reference in the oval
schema documentation that refers to Perl5 Regular Expressions will be updated
to refer to a resource that is versioned. -
Define a white
list of supported Regular Expression elements As recommended a white
list of the supported regular expression elements will be listed on the oval
web site. This list will be referenced from the oval language schemas. This
list will consist of the most commonly supported elements in order to increase
the likelihood that a given regular expression will be supported by an oval
compliant tool. The list will match what was suggested with perhaps additional
clarification around the supported modifiers. No assertion will be made about regular
expression elements that are not included in the list. -
Add behaviors to
the textfilecontent54_test to control multiline and case sensitivity matching The behaviors that were
suggested for a new textfilecontent56_test will be added to the existing test.
The default values for the behavior will align with the current intent of the
test to allow multiline matches. This will ensure that the addition of the behavior
does not change the meaning of existing content that uses the test. It is
fairly common for a new behavior to be added to a test during a release as long
as the new behavior is backwards compatible with content that already uses that
test. We feel that another week should
be added to the release timeline to allow for these changes. The revised
release timeline will be: -
9/01/2009 –
Version 5.6 RC 3 will be published As soon as the above
changes are made and the white list of supported regular expression elements is
available an new release candidate will be posted. -
9/11/2009 –
New Version 5.6 release date The release will be
delayed by one week to allow for the changes that need to be made. Thank you all for your continued
review of this release. Jon ============================================ Jonathan O. Baker G022 - IA Industry Collaboration The MITRE Corporation Email: [hidden email] From: Baker, Jon
[mailto:[hidden email]] Let me start by thanking you for
putting the time and effort in to considering regular expressions in OVAL.
You have raised a number of valid concerns with the changes planned for
version 5.6 with regard to regular expression support in OVAL. As you are
probably well aware arriving at a solution for regular expression support that
will appease everyone is very difficult. The good news is that at least
everyone agreed that what was used in OVAL 5.5 and earlier needed to be
changed. Historically we have tried to
avoid specifying our own regular expression syntax either by referring to
someone else’s and then disallowing certain elements or by creating our
own from scratch. The fear has been that if we define our own no tools will
actually support it. However, the approach you have suggested, referring
to a baseline and then excluding the unwanted elements is appealing. In Version 5.6 we discussed the
possibility of deprecating support for posix vs. just dropping it. The concern
was that deprecating posix in favor of something else suggested that two
different flavors will be supported. This seems highly undesirable. It has been
our intent for deprecated language constructs to be supported as long as they
are in the oval language. Deprecated simply means that the item may be removed
in a future release. Before making the decision to simply drop posix we polled
the oval-developer-list and the oval-board to ensure that this change would not
invalidate any content. We had responses verifying that content would not be
impacted and we surveyed all of our own content. My feeling is that we
should consider the second two points of your proposal and I would like to
continue with the decision to drop posix. However, the important question here
is what do others think??? I like the suggestion of
explicitly adding a multiline and ignore case behavior to the
textfilecontent56_test. This will result in simpler more readable expressions
and very explicitly stating when these two matching behaviors should be used in
a format that is independent of the underlying regular expression syntax.
However, assuming no other issues I am concerned with the notion of adding this
into the release or delaying the release for just this change. My thought is to
see how the rest of the discussion plays out and then decide how to proceed
with this test. In reviewing the suggested
supported and excluded characters I don’t have any specific concerns. The
subset you proposed seems reasonable. At this point we need to hear
from others about: 1.
In verison 5.6 was
the decision to simply drop posix support instead of deprecate it valid? 2.
In verison 5.6
should we refer to a more specific version of the perl re documentation? 3.
In verison 5.6
should we limit the supported elements defined in the perl re documentation? 4.
In verison 5.6
should we add the proposed textfilecontent56_test? Thanks, Jon ============================================ Jonathan O. Baker G022 - IA Industry Collaboration The MITRE Corporation Email: [hidden email] From:
[hidden email] [mailto:[hidden email]] Sorry for bringing this to the table so late in the game, but
as discussed at last week’s XCCDF Developers meeting there are several
things to consider regarding Regular Expressions (REs) as specified in the OVAL
5.6 Release Candidate specification. While some of this has been talked
about on the OVAL mailing list, there are several points we feel bear
additional discussion. On the mailing list, there was some discussion of requiring
“PCRE” for OVAL 5.6. However, it is not clear from the text
if this means “Perl Compatible Regular Expression” or if it means
specifically the portable PCRE library at http://www.pcre.org/. If
the latter, we feel this should be called out in the OVAL specification with a
specific version’s syntax required, else incompatibilities in content are
sure to crop up. In any case, we would like to make an alternative
proposal which takes the following into consideration. DATA 1. Requiring
use of the PCRE library (http://www.pcre.org/)
will preclude a pure-Java OVAL implementation, or implementation in some other
non-C++ languages. 2. By
not introducing a new data type or behavior for Perl5 REs, existing OVAL
content which uses POSIX-compatible REs may cease to work, depending on which
library is in use. While the open-source PCRE library and Perl itself
support ERE character classes in addition to the Perl ones, even in the same
RE, the BOOST library requires one to specify which syntax to use for any given
RE: either Perl5, POSIX Basic REs (BRE), or POSIX Extended REs (ERE). 3. We
have observed OVAL content which includes a large number of ERE character
classes (“[:alpha:], [:alnum:],
[:blank:]”, etc.). 4. It
is fairly trivial to convert ERE character-classes into Perl5 character
classes. 5. Perl5
and ERE character classes do not overlap, so both can be supported
simultaneously: ERE character classes are very unlikely to appear in a valid
Perl5 bracket expression and Perl5 character classes are illegal in POSIX. 6. “Perl5
regular expression” is a fairly loose term. When a product says it
uses REs that are “Perl5 compatible,” this tends to mean compatible
with the original Perl 5.00 release, not the current release. There are many
implementations make this claim, but they only share a subset of the full
functionality from that release. 7. RegEx’s
tend to fail silently on incompatibilities between implementations. 8. The
link provided in the OVAL 5.6 specification which defines “Perl5 Regular
Expression” (http://www.perl.com/doc/manual/html/pod/perlre.html)
is not versioned. This implies conforming OVAL implementations must track
changes to Perl regular expressions and always implement the latest one. 9. The
link documents RE features which may not be widely considered “Perl 5 compatible.”
For example, specifying case sensitivity within the regular expression itself;
e.g., “(?i)”. 10. Since
the OVAL specification does not indicate which parts of “Perl5 regular
expressions” are required, one could be lead to assume all of them are required.
This implies a requirement to include the experimental “(?{ code
})” construct, where “code” consists of arbitrary Perl code
that will be evaluated as a zero-width assertion. 11. While
most Perl5 RE implementations support a number of common language elements,
almost every RE library or application includes one or more unique language
elements or a unique subset. Because of this, those libraries and
applications tend to document their particular flavor of RE. (To be
frank, I cannot think of any commercial or popular open-source application or
library that supports regular expressions but that does not define exactly
which language elements they support.) 12. While
an OVAL RE implementation should handle Unicode to, for example, match against
strings in the Windows Registry, supplying these in hex is problematic because
no compatible regular-expression sub-set exists: BOOST, PCRE, and Perl
implementations use \x{2345} whereas most others use \u2345. PROPOSAL Our proposal consists of: -
Indicating that POSIX ERE support is deprecated and
will be removed in a future release. Since ERE and Perl5 do not overlap
and can be supported simultaneously, an implementation is allowed to support
both. -
Providing in the OVAL 5.6 specification a listing of
the Regular Expression Language Elements (a Perl5 subset) that must be
supported in order to be OVAL 5.6 compatible. This would only be a
listing, with full definitions supplied by the perlre link, or another RE
language specification. -
Providing a link to a specific version of the
“perlre” documentation (Perl 5.00), hosted on the OVAL website,
rather than pointing to an off-site unversioned resource. The following documents were consulted with regards to this
proposal: -
http://msdn.microsoft.com/en-us/library/az24scfc(VS.80).aspx
(.NET Framework 2.0) -
http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
(BOOST) -
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
(Java) -
http://www.ecma-international.org/publications/standards/Ecma-262.htm
(JavaScript /ECMA-262) -
http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx
(JavaScript/JScript) -
http://www.pcre.org/pcre.txt
(PCRE) -
http://perldoc.perl.org/perlre.html
(Perl - current version) -
http://search.cpan.org/~andyd/perl5.003_07/pod/perlre.pod
(Perl 5.003_07) -
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html
(POSIX Regular Expressions/IEEE Std 1003.1) Here is proposed text for OVAL 5.6, including a subset of
language elements drawn from the perlre web page, reduced to a more common
subset: The
'pattern match' operation allows an item to be tested against a regular
expression. When used by an entity in an OVAL Object, the regular expression
represents the set of matching objects on the system. OVAL supports a subset of
regular expression character classes, operations, expressions and other lexical
tokens defined within Perl 5's regular expression specification (See: http://search.cpan.org/~andyd/perl5.003_07/pod/perlre.pod),
as noted below. Modifiers
are not supported for the ‘pattern match’ operation: case
insensitive is always OFF and multiline is also always OFF. POSIX
Extended Regular Expressions (ERE) (http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html) are deprecated in
this release, and will be removed in a future release. Since
POSIX ERE collating symbols (“[.symbol.]”) and equivalence classes
(“[=symbol=]”) depend on POSIX-specific locale features, these are
not supported by OVAL Regular Expressions. Instead, use all applicable
Unicode code-points within a bracket expression; e.g., rather than specify "[[=a=]b]", "[[=à=]b]", or "[[=â=]b]", instead provide
"[aàâb]" for situations that warrant this class of matching. Escaping
characters which are not defined as metacharacters in this specification will
result in indeterminate behavior. Many implementations will support a
superset of these metacharacters, with potentially different meaning than other
implementations. Character
matching assumes a Unicode character set. Note that no syntax is supplied
for specifying code points in hex; actual Unicode characters must be used
instead. The
language elements defined below include a common subset of Perl5-compatible
Regular Expression syntaxes. Specifically, this subset was drawn from
.NET, BOOST, Java, JavaScript, PCRE, and Perl 5.0. In the following, METACHARACTERS \ Quote
the next metacharacter ^ Match
the beginning of the line . Match
any character (except newline) $ Match
the end of the line (or before newline at the end) |
Alternation () Grouping [] Character
class GREEDY
QUANTIFIERS * Match 0 or more times + Match 1 or more times ? Match 1 or 0 times {n} Match exactly n times {n,} Match at least n times {n,m} Match at least n but not more than m timesRELUCTANT
QUANTIFIERS *? Match 0 or more times +? Match 1 or more times ?? Match 0 or 1 time {n}? Match exactly n times {n,}? Match at least n times {n,m}? Match at least n but not more than m timesESCAPE
SEQUENCES \t tab (HT, TAB) \n newline (LF, NL) \r return (CR) \f form feed (FF)
\033 octal char (think of a PDP-11) \x1B hex char \c[ control char
CHARACTER
CLASSES \w Match a "word" character (alphanumeric plus "_") \W Match a non-word character \s Match a whitespace character \S Match a non-whitespace character \d Match a digit character \D Match a non-digit characterZERO
WIDTH ASSERTIONS \b Match a word boundary \B Match a non-(word boundary)
EXTENSIONS (?:regexp) - Group without capture (?=regexp) - Zero-width positive lookahead assertion (?!regexp) - Zero-width negative lookahead assertion VERSION
8 REGULAR EXPRESSIONS [chars] - Match any of the specified characters [^chars] - Match anything that is not one of the specified characters [a-b] - Match any character in the range between “a” and “b”, inclusive a|b - Alternation; match either the left side of the “|” or the right side \n - When ‘n’ is a single digit: the nth capturing group matched. POSIX
CHARACTER CLASSES (DEPRECATED) [[:xxx:]] positive POSIX named set [[:^xxx:]] negative POSIX named set alnum alphanumeric alpha alphabetic blank space or tab cntrl control character digit decimal digit graph printing, excluding space lower lower case letter print printing, including space punct printing, excluding alphanumeric space whitespace upper upper case letter xdigit hexadecimal digit
Here is a discussion of probe-specific behaviors that can be
supplied to enable case-insensitivity and multi-line matching. These are
OPTIONAL in our proposal, in that the preceeding can be implemented
independently of the following. == textfilecontent56_* == <Deprecate the existing
textfilecontent54_* elements, and replace with textfilecontent56_*. The
only difference is to use Textfilecontent56Behaviors rather than
TextfilecontentBehaviors> == Textfilecontent56Behaviors == <text from
TextfilecontentBehaviors goes here> Also note that the
‘ignore_case’ and ‘multiline’ attributes of the
‘behaviors’ element apply only to the ‘pattern’
element. These behaviors modify how specific strings produced by this
element are matched against the file content; they do NOT modify the filtering
behavior of the ‘pattern match’ operation on the output of this
element itself, if such were to be used to filter some other list of pattern-match
strings.
Note: This email may contain confidential and privileged
information for the sole use of the intended recipient. Any review or
distribution by others is strictly prohibited. If you are not the intended
recipient please contact the sender and delete all copies of this message. To
unsubscribe, send an email message to [hidden email] with SIGNOFF
OVAL-DEVELOPER-LIST in the BODY of the message. If you have difficulties, write
to [hidden email]. To
unsubscribe, send an email message to [hidden email] with SIGNOFF
OVAL-DEVELOPER-LIST in the BODY of the message. If you have difficulties, write
to [hidden email]. |
||||||||||||||||||||||||||||||||||||||||||||
| Free Embeddable Forum Powered by Nabble | Help |