Version 2.1 Release Candidate

19 messages Options
Embed this post
Permalink
Andrew Buttner

Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
Attached is the release candidate for version 2.1.  The only changes
from the last posted draft is the addition of wording to help clarify
how CPE Names relate to external entities, and the modification of the
shemaLocation tag in the two schemas.

I understand that the holiday season is upon us and everyone may not be
able to give a final review of this until after the new year.  Because
of this the RC period will be extended a little bit.  The goal moving
forward will be to stamp this as final on Wednesday Jan 16, 2008.

Hope everyone is enjoying the holidays!

Thanks
Drew


---------

Andrew Buttner
The MITRE Corporation
[hidden email]
781-271-3515


cpe-specification-2.1.doc (483K) Download Attachment
Andrew Buttner

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
A suggestion has been made to add an <xsd:any> tag to the end of the
<cpe-item> in the CPE Dictionary schema.  (note that <xsd:any> is
already present in the <cpe-list> element)  This will allow the
dictionary schema to be extended to allow additional information
without sacrificing validation of the required CPE information.

This change should not affect anything with the existing 2.1 release,
hence I have added it and produced a release candidate 2.  If there are
any concerns, please voice them now.

I also took the opportunity to fix a few spelling mistakes that were
found.

Still planning on going final with CPE 2.1 on Jan 16th.

Thanks
Drew

>-----Original Message-----
>From: Buttner, Drew [mailto:[hidden email]]
>Sent: Friday, December 21, 2007 11:31 AM
>To: cpe-discussion-list CPE Community Forum
>Subject: [CPE-DISCUSSION-LIST] Version 2.1 Release Candidate
>
>Attached is the release candidate for version 2.1.  The only changes
>from the last posted draft is the addition of wording to help clarify
>how CPE Names relate to external entities, and the modification of the
>shemaLocation tag in the two schemas.
>
>I understand that the holiday season is upon us and everyone may not
be

>able to give a final review of this until after the new year.  Because
>of this the RC period will be extended a little bit.  The goal moving
>forward will be to stamp this as final on Wednesday Jan 16, 2008.
>
>Hope everyone is enjoying the holidays!
>
>Thanks
>Drew
>
>
>---------
>
>Andrew Buttner
>The MITRE Corporation
>[hidden email]
>781-271-3515
>




cpe-specification-2.1.doc (488K) Download Attachment
cpe-dictionary_2.1.xsd (13K) Download Attachment
cpe-language_2.1.xsd (11K) Download Attachment
Andrew Buttner

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
The suggestion to change the processContents attribute of the xsd:any
tags seems to be a good one.  I have updated the schemas to reflect
this.  I have also corrected the spelling/grammatical errors that have
been pointed out.  Thank you to everyone that has helped review so far.

Because of the change, I think it is best to give another week for
people to review.  We were originally hoping to release the official
version 2.1 spec today, but this is now targeted for January 28th.

If anyone has additional suggestions, please forward them on to the
list.

Thanks
Drew

---------

Andrew Buttner
The MITRE Corporation
[hidden email]
781-271-3515




cpe-specification-2.1.doc (494K) Download Attachment
cpe-dictionary_2.1.xsd (13K) Download Attachment
cpe-language_2.1.xsd (11K) Download Attachment
Andrew Buttner

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
Some small modifications to the structure of the dictionary schema have
been proposed.  I'd like to pass these proposals on to everyone.  They
do not change the format of the resulting XML so these changes will
most likely not affect anyone.

* pulled out the contents of <cpe-list> and <cpe-item> into types
* added an optional <generator> element to hold timestamp info
* changed the namePattern to be shorter and easier to understand

If any of these changes strike you as wrong or possibly negatively
affect your work, please let us know.

We are still hoping for a final release of 2.1 on Jan 28th.

Thanks
Drew


cpe-dictionary_2.1.xsd (18K) Download Attachment
Andrew Buttner

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
With the discussion today, I have made an update to the proposed
version 2.1 CPE Spec.  Please take a quick look at Section 5.4 which
clarifies the prohibited characters and how percent encoding should be
used.  Does this explanation answer the outstanding questions?

Also note the slight change in the NamePattern to allow the percent
encoding.

Because of this change, I would like to give some time for people to
review the spec.  The official release will be pushed back until later
this week.  New target is Thursday January 31.

Please don't hesitate to bring up more issues as it is better to solve
them now, rather than have to wait for the next release.

Thanks
Drew




cpe-dictionary_2.1.xsd (18K) Download Attachment
cpe-language_2.1.xsd (11K) Download Attachment
cpe-specification-2.1.doc (531K) Download Attachment
Gary Newman-2

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
I suggest a change to section 5.1 replacing

        A CPE Name is a URI [2] with...

with

        A CPE Name is a percent-encoded URI [2] with...

Then to add a couple of paragraphs to section 5.4 which I suggest could be a
copy of section 2.4 of RFC 3986.  That should clarify how encoding/decoding is
used.

The new 5.4 wording should also point out that aside from the "must be encoded"
list, URI encoding allows percent-encoding of any or all characters.  So the
uniqueness property of a CPE name is only true after URI decoding.

Overall, everything might be clearer if all description of CPE names referred
only to the decoded form (with only examples being encoded).  Then to have an
opening paragraph pointing out this distinction, and that URI encoded is a
requirement.



        -Gary-

> With the discussion today, I have made an update to the proposed
> version 2.1 CPE Spec.  Please take a quick look at Section 5.4 which
> clarifies the prohibited characters and how percent encoding should be
> used.  Does this explanation answer the outstanding questions?
>
> Also note the slight change in the NamePattern to allow the percent
> encoding.
>
> Because of this change, I would like to give some time for people to
> review the spec.  The official release will be pushed back until later
> this week.  New target is Thursday January 31.
>
> Please don't hesitate to bring up more issues as it is better to solve
> them now, rather than have to wait for the next release.
>
> Thanks
> Drew
>
Waltermire, Dave [USA]

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
I believe using "percent-encoded" vs "encoded" is redundant.  It might
be better to use the term "escape encoded" per the RFC2396 (Section
2.4.1) or just "encoded" as escape encoding is the default approach for
URIs.

Also, according to section 2.4.2 in the same RFC:

"A URI is always in an "escaped" form, since escaping or unescaping a
completed URI might change its semantics."

In the spirit of this a CPE Name should always be in the encoded form,
however matching should be done on a component-by-component basis in the
decoded form.  Everyone agree?

http://www.ietf.org/rfc/rfc2396.txt 

Dave

> -----Original Message-----
> From: Gary Newman [mailto:[hidden email]]
> Sent: Monday, January 28, 2008 3:49 PM
> To: [hidden email]
> Subject: Re: [CPE-DISCUSSION-LIST] Version 2.1 Release Candidate
>
> I suggest a change to section 5.1 replacing
>
>         A CPE Name is a URI [2] with...
>
> with
>
>         A CPE Name is a percent-encoded URI [2] with...
>
> Then to add a couple of paragraphs to section 5.4 which I
> suggest could be a copy of section 2.4 of RFC 3986.  That
> should clarify how encoding/decoding is used.
>
> The new 5.4 wording should also point out that aside from the
> "must be encoded"
> list, URI encoding allows percent-encoding of any or all
> characters.  So the uniqueness property of a CPE name is only
> true after URI decoding.
>
> Overall, everything might be clearer if all description of
> CPE names referred only to the decoded form (with only
> examples being encoded).  Then to have an opening paragraph
> pointing out this distinction, and that URI encoded is a requirement.
>
>
>
>         -Gary-
>
> > With the discussion today, I have made an update to the proposed
> > version 2.1 CPE Spec.  Please take a quick look at Section
> 5.4 which
> > clarifies the prohibited characters and how percent
> encoding should be
> > used.  Does this explanation answer the outstanding questions?
> >
> > Also note the slight change in the NamePattern to allow the percent
> > encoding.
> >
> > Because of this change, I would like to give some time for
> people to
> > review the spec.  The official release will be pushed back
> until later
> > this week.  New target is Thursday January 31.
> >
> > Please don't hesitate to bring up more issues as it is
> better to solve
> > them now, rather than have to wait for the next release.
> >
> > Thanks
> > Drew
> >
>
Andrew Buttner

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
>In the spirit of this a CPE Name should always be in the encoded form,

I think this will serve us best since it will allow us to have unique
names.  As long as we state what characters should be encoded then we
should be fine.  We would also want to state that for CPE, URIs that
are encoded differently but decode the same (think of one that encodes
every character) would be different CPE Names.  This satisfies the
uniqueness.  (maybe they "point" to the same platform type though)



>however matching should be done on a component-by-component
>basis in the decoded form.

agree.  This basically removes the technicality mentioned above where
two URIs that decode the same would be different CPE Name.  At least
these name would match.
Gary Newman-2

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
Note this differs from RFCs, as they specify encoding yet describe the
components in unencoded form.

URIs that are encoded differently but decode the same MUST be the same CPE
name.  Otherwise the optional 2^N character encodings will make 2^N names that
decode to the same CPE name.

> >In the spirit of this a CPE Name should always be in the encoded form,
>
> I think this will serve us best since it will allow us to have unique
> names.  As long as we state what characters should be encoded then we
> should be fine.  We would also want to state that for CPE, URIs that
> are encoded differently but decode the same (think of one that encodes
> every character) would be different CPE Names.  This satisfies the
> uniqueness.  (maybe they "point" to the same platform type though)
>
>
>
> >however matching should be done on a component-by-component
> >basis in the decoded form.
>
> agree.  This basically removes the technicality mentioned above where
> two URIs that decode the same would be different CPE Name.  At least
> these name would match.
Andrew Buttner

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
My goal is that when different people use a CPE Name for a given
platform, that the exact same string of characters is used.  In other
words, that the given identifier is exactly the same.  I don't think we
want to allow people to use any number of possible encodings.

To accomplish this, I'd like the spec to say that a CPE Name is always
in percent-encoded form with only certain characters being
percent-encoded.  Can we make this wording work?

Thanks
Drew



>-----Original Message-----
>From: Gary Newman [mailto:[hidden email]]
>Sent: Tuesday, January 29, 2008 10:06 AM
>To: cpe-discussion-list CPE Community Forum
>Subject: Re: [CPE-DISCUSSION-LIST] Version 2.1 Release Candidate
>
>Note this differs from RFCs, as they specify encoding yet describe the

>components in unencoded form.
>
>URIs that are encoded differently but decode the same MUST be
>the same CPE
>name.  Otherwise the optional 2^N character encodings will
>make 2^N names that
>decode to the same CPE name.
>
>> >In the spirit of this a CPE Name should always be in the
>encoded form,
>>
>> I think this will serve us best since it will allow us to have
unique
>> names.  As long as we state what characters should be encoded then
we

>> should be fine.  We would also want to state that for CPE, URIs that
>> are encoded differently but decode the same (think of one
>that encodes
>> every character) would be different CPE Names.  This satisfies the
>> uniqueness.  (maybe they "point" to the same platform type though)
>>
>>
>>
>> >however matching should be done on a component-by-component
>> >basis in the decoded form.
>>
>> agree.  This basically removes the technicality mentioned above
where
>> two URIs that decode the same would be different CPE Name.  At least
>> these name would match.
>
Gary Newman-2

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
Your proposal would likely make a CPE Name no longer follow the URI rules.  See
section 6 on normalization for more details, particularly

        http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-encoding

"...some URI producers percent-encode octets that do not require percent-
encoding, resulting in URIs that are equivalent to their non-encoded
counterparts." along with other indications that percent encoding is presumed
allowable.

> My goal is that when different people use a CPE Name for a given
> platform, that the exact same string of characters is used.  In other
> words, that the given identifier is exactly the same.  I don't think we
> want to allow people to use any number of possible encodings.
>
> To accomplish this, I'd like the spec to say that a CPE Name is always
> in percent-encoded form with only certain characters being
> percent-encoded.  Can we make this wording work?
>
> Thanks
> Drew
>
>
>
> >-----Original Message-----
> >From: Gary Newman [mailto:[hidden email]]
> >Sent: Tuesday, January 29, 2008 10:06 AM
> >To: cpe-discussion-list CPE Community Forum
> >Subject: Re: [CPE-DISCUSSION-LIST] Version 2.1 Release Candidate
> >
> >Note this differs from RFCs, as they specify encoding yet describe the
>
> >components in unencoded form.
> >
> >URIs that are encoded differently but decode the same MUST be
> >the same CPE
> >name.  Otherwise the optional 2^N character encodings will
> >make 2^N names that
> >decode to the same CPE name.
> >
> >> >In the spirit of this a CPE Name should always be in the
> >encoded form,
> >>
> >> I think this will serve us best since it will allow us to have
> unique
> >> names.  As long as we state what characters should be encoded then
> we
> >> should be fine.  We would also want to state that for CPE, URIs that
> >> are encoded differently but decode the same (think of one
> >that encodes
> >> every character) would be different CPE Names.  This satisfies the
> >> uniqueness.  (maybe they "point" to the same platform type though)
> >>
> >>
> >>
> >> >however matching should be done on a component-by-component
> >> >basis in the decoded form.
> >>
> >> agree.  This basically removes the technicality mentioned above
> where
> >> two URIs that decode the same would be different CPE Name.  At least
> >> these name would match.
> >
>
>
>
Andrew Buttner

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
Would you feel more comfortable if we stated that ...

- CPE Name is a URI
- the URI should be in percent-encoded form
- all CPE Names in the official dictionary will be normalized by
decoding any percent-encoded octet that corresponds to an unreserved
character
- matching is based on decoded components

This means that the following two CPE Names would be the same:

cpe:/o:redhat
cpe:/o:redh%41t

My problem with this is that producers could use either when trying to
communicate with other applications.  Meaning that consumers will have
to percent-encoding normalize CPE Names that it receives.

I would like to state that CPE requires percent-encoded normalized URIs
to be used.  Isn't this something that the hypothetical "CPE URI
Scheme" could declare?  Or could we cover things by saying as part of
the CPE Spec that percent-encoded normalization should occur before
producing or consuming a CPE Name?  (that way only normalized Names are
used)


Thanks
Drew


>-----Original Message-----
>From: Gary Newman [mailto:[hidden email]]
>Sent: Wednesday, January 30, 2008 3:39 PM
>To: cpe-discussion-list CPE Community Forum
>Subject: Re: [CPE-DISCUSSION-LIST] Version 2.1 Release Candidate
>
>Your proposal would likely make a CPE Name no longer follow
>the URI rules.  See
>section 6 on normalization for more details, particularly
>
>        
>http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-encoding
>
>"...some URI producers percent-encode octets that do not
>require percent-
>encoding, resulting in URIs that are equivalent to their non-encoded
>counterparts." along with other indications that percent
>encoding is presumed
>allowable.
>
>> My goal is that when different people use a CPE Name for a given
>> platform, that the exact same string of characters is used.  In
other

>> words, that the given identifier is exactly the same.  I
>don't think we
>> want to allow people to use any number of possible encodings.
>>
>> To accomplish this, I'd like the spec to say that a CPE Name
>is always
>> in percent-encoded form with only certain characters being
>> percent-encoded.  Can we make this wording work?
>>
>> Thanks
>> Drew
>>
>>
>>
>> >-----Original Message-----
>> >From: Gary Newman [mailto:[hidden email]]
>> >Sent: Tuesday, January 29, 2008 10:06 AM
>> >To: cpe-discussion-list CPE Community Forum
>> >Subject: Re: [CPE-DISCUSSION-LIST] Version 2.1 Release Candidate
>> >
>> >Note this differs from RFCs, as they specify encoding yet
>describe the
>>
>> >components in unencoded form.
>> >
>> >URIs that are encoded differently but decode the same MUST be
>> >the same CPE
>> >name.  Otherwise the optional 2^N character encodings will
>> >make 2^N names that
>> >decode to the same CPE name.
>> >
>> >> >In the spirit of this a CPE Name should always be in the
>> >encoded form,
>> >>
>> >> I think this will serve us best since it will allow us to have
>> unique
>> >> names.  As long as we state what characters should be encoded
then
>> we
>> >> should be fine.  We would also want to state that for
>CPE, URIs that
>> >> are encoded differently but decode the same (think of one
>> >that encodes
>> >> every character) would be different CPE Names.  This satisfies
the
>> >> uniqueness.  (maybe they "point" to the same platform type
though)

>> >>
>> >>
>> >>
>> >> >however matching should be done on a component-by-component
>> >> >basis in the decoded form.
>> >>
>> >> agree.  This basically removes the technicality mentioned above
>> where
>> >> two URIs that decode the same would be different CPE
>Name.  At least
>> >> these name would match.
>> >
>>
>>
>>
>
Waltermire, Dave [USA]

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
In reply to this post by Andrew Buttner
Drew,

I would like to bring this thread down to a more technical level with
concrete examples so we can avoid talking past each other.  Please find
some example Java code at the end of my email to illustrate my
perspective.

I don't understand your stated goal below.  It seems reasonable to me
that two CPE names are equivalent if their decoded components are equal.
This is the approach I have taken in the equals method in my example.
Matching would work using a similar approach using decoded components.
I would argue that what I am suggesting is fine from an application
processing perspective.  It may be harder for a human to interpret,
however we are not developing CPE for humans as it is a machine
identifier.

What do you see as the major arguments for or against this approach?

Dave

DefaultCPEName.java:

import java.io.UnsupportedEncodingException;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URLDecoder;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * A default implementation of a CPEName.
 */
// TODO: add support for escaping
public class DefaultCPEName {
    /** the pattern for a name */
    private static final Pattern pattern =
 
Pattern.compile("\\A[cC][pP][eE]:/([AHOaho]?(:(?:[A-Za-z0-9\\._\\-~]|%[A
-Fa-f0-9]{2})*){0,6})(?<!:)\\Z");

    /** The CPEName URI scheme name */
    public static final String SCHEME = "cpe";

    /** Value for the first component of a hardware CPEName Name */
    public static final String PART_HARDWARE = "h";

    /** Value for the first component of an operating system CPEName
Name */
    public static final String PART_OS = "o";

    /** Value for the first component of an application CPEName Name */
    public static final String PART_APPLICATION = "a";

    /** object state - all of these are serializable */
    private final URI uri;

    /** holds a cache of the components */
    private transient String[] components;

    /** cache of the unique part of the string to use with hashCode and
equals */
    private transient int hashCache = 0;

    /**
     * Create a CPEName from a string.
     *
     * @param s a valid CPE name string
     * @throws URISyntaxException if s is not a valid cpe value
     */
    public DefaultCPEName(String s) throws URISyntaxException {
        this(new URI(s));
    }

    /**
     * Creates a CPE name from a URI
     * @param uri a CPE URI value
     * @throws NullPointerException if URI is null
     */
    public DefaultCPEName(URI uri) throws URISyntaxException {
        this.uri = uri;
        if (uri == null) {
            throw new NullPointerException("uri");
        }
        Matcher m = pattern.matcher(uri.toString());
        if (!m.matches()) {
            throw new URISyntaxException(
                uri.toString(),
                "Invalid CPEName");
        }
    }

    /** {@inheritDoc} */
    public URI getName() {
        return uri;
    }

    /** {@inheritDoc} */
    @Override
    public String toString() {
        return uri.toString();
    }

    /** {@inheritDoc} */
    @Override
    public int hashCode() {
    if (hashCache == 0) {
    int hash = 17;
    for (String component : getComponents()) {
    hash = 37 * hash + component.hashCode();
    }
    hashCache = hash;
    }
    return hashCache;
    }

    /**
     * Return true if this CPEName equals another.  Note that
     * this is different from matching!
     *
     * {@inheritDoc}
     *
     */
    @Override
    public boolean equals(Object o) {
        if (!(o instanceof DefaultCPEName)) {
        return false;
        }
        DefaultCPEName that = (DefaultCPEName)o;
        String[] thisComponents = getComponents();
        String[] thatComponents = that.getComponents();
        if (thisComponents.length != thatComponents.length) {
        return false;
    }

    for (int i = 0;i < thisComponents.length;i++) {
    if (!thisComponents[i].equals(thatComponents[i])) {
    return false;
    }
    }
        return true;
    }

    /** {@inheritDoc} */
    public String[] getComponents() {
        if (components == null) {
            components =
uri.getSchemeSpecificPart().substring(1).split(":");

            for (int i = 0;i < components.length;i++) {
            try {
            components[i] = URLDecoder.decode(components[i],
"UTF-8").toLowerCase();
        } catch (UnsupportedEncodingException e) {
        throw new RuntimeException(e);
            }
            }
        }
        return components;
    }

    /** {@inheritDoc} */
    public boolean contains(final CPEName cpe) {
        String[] myComponents = getComponents();
        String[] inComponents = cpe.getComponents();
        // if length(cpe) >= length(this) then we may have a match
        if (inComponents.length < myComponents.length) {
            return false;
        }
        boolean retval = false;
        // check each component of n and this
        for (int i = 0; i < myComponents.length; i++) {
            // components equal, or this component is empty
            if (myComponents[i].equalsIgnoreCase(inComponents[i])
                    || myComponents[i].length() == 0
                    || inComponents[i].length() == 0) {
                retval = true;
            } else {
                retval = false;
                break;
            }
        }
        return retval;
    }
}

> -----Original Message-----
> From: Buttner, Drew [mailto:[hidden email]]
> Sent: Wednesday, January 30, 2008 1:37 PM
> To: [hidden email]
> Subject: Re: [CPE-DISCUSSION-LIST] Version 2.1 Release Candidate
>
> My goal is that when different people use a CPE Name for a
> given platform, that the exact same string of characters is
> used.  In other words, that the given identifier is exactly
> the same.  I don't think we want to allow people to use any
> number of possible encodings.
>
> To accomplish this, I'd like the spec to say that a CPE Name
> is always in percent-encoded form with only certain
> characters being percent-encoded.  Can we make this wording work?
>
> Thanks
> Drew
>
>
>
> >-----Original Message-----
> >From: Gary Newman [mailto:[hidden email]]
> >Sent: Tuesday, January 29, 2008 10:06 AM
> >To: cpe-discussion-list CPE Community Forum
> >Subject: Re: [CPE-DISCUSSION-LIST] Version 2.1 Release Candidate
> >
> >Note this differs from RFCs, as they specify encoding yet
> describe the
>
> >components in unencoded form.
> >
> >URIs that are encoded differently but decode the same MUST
> be the same
> >CPE name.  Otherwise the optional 2^N character encodings
> will make 2^N
> >names that decode to the same CPE name.
> >
> >> >In the spirit of this a CPE Name should always be in the
> >encoded form,
> >>
> >> I think this will serve us best since it will allow us to have
> unique
> >> names.  As long as we state what characters should be encoded then
> we
> >> should be fine.  We would also want to state that for CPE,
> URIs that
> >> are encoded differently but decode the same (think of one
> >that encodes
> >> every character) would be different CPE Names.  This satisfies the
> >> uniqueness.  (maybe they "point" to the same platform type though)
> >>
> >>
> >>
> >> >however matching should be done on a
> component-by-component basis in
> >> >the decoded form.
> >>
> >> agree.  This basically removes the technicality mentioned above
> where
> >> two URIs that decode the same would be different CPE Name.
>  At least
> >> these name would match.
> >
>
Gary Newman-2

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
Hi Dave,

Without reading the code example, I can agree with your statement of CPE
equivalence.  The URI specs say that, within some limits, a scheme can declare
its own rules for normalization.  Thus the CPE declaration of case-independence
fits fine for a URI.  

I believe Drew is trying to make the database lookup problem simpler by
requiring there to be only one ascii character sequence that represents a CPE
name.  

        -Gary-

> Drew,
>
> I would like to bring this thread down to a more technical level with
> concrete examples so we can avoid talking past each other.  Please find
> some example Java code at the end of my email to illustrate my
> perspective.
>
> I don't understand your stated goal below.  It seems reasonable to me
> that two CPE names are equivalent if their decoded components are equal.
> This is the approach I have taken in the equals method in my example.
> Matching would work using a similar approach using decoded components.
> I would argue that what I am suggesting is fine from an application
> processing perspective.  It may be harder for a human to interpret,
> however we are not developing CPE for humans as it is a machine
> identifier.
>
> What do you see as the major arguments for or against this approach?
>
> Dave
Gary Newman-2

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
In reply to this post by Andrew Buttner
Hi Drew,

> Would you feel more comfortable if we stated that ...
>
> - CPE Name is a URI

Fine... that seems to be the current premise.

> - the URI should be in percent-encoded form

Fine... it's a URI requirement for at least the reserved characters.

> - all CPE Names in the official dictionary will be normalized by
> decoding any percent-encoded octet that corresponds to an unreserved
> character

This generally seems OK, assuming you're using "unreserved character" to
correspond to the URI RFC's use.

        unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

The dictionary section could also productively state that all names in the
dictionary will use only lower-case letters to simplify things for case-
dependent databases and the like.  It's gotta be better for everyone if use of
the dictionary requires as little extra processing as possible.

> - matching is based on decoded components

Fine.

> This means that the following two CPE Names would be the same:
>
> cpe:/o:redhat
> cpe:/o:redh%41t
>
> My problem with this is that producers could use either when trying to
> communicate with other applications.  Meaning that consumers will have
> to percent-encoding normalize CPE Names that it receives.
>
> I would like to state that CPE requires percent-encoded normalized URIs
> to be used.  Isn't this something that the hypothetical "CPE URI
> Scheme" could declare?  Or could we cover things by saying as part of
> the CPE Spec that percent-encoded normalization should occur before
> producing or consuming a CPE Name?  (that way only normalized Names are
> used)

When you say "CPE requires" I'm guessing that you mean to say that a CPE Name
can only be a limited kind of URI.  If by "percent-encoded normalized" you mean
the decoding of all unreserved characters, then I assume the CPE specification
can specify whatever it wants.  The consequences of this limited URI would
likely cause confusion and headaches though.  It's simple enough to limit what
goes into the dictionary, and how matching is done.  Limiting the kind of URI
that a CPE Name is may come back to haunt us, IMHO.

        -Gary-

> Thanks
> Drew
>
>
> >-----Original Message-----
> >From: Gary Newman [mailto:[hidden email]]
> >Sent: Wednesday, January 30, 2008 3:39 PM
> >To: cpe-discussion-list CPE Community Forum
> >Subject: Re: [CPE-DISCUSSION-LIST] Version 2.1 Release Candidate
> >
> >Your proposal would likely make a CPE Name no longer follow
> >the URI rules.  See
> >section 6 on normalization for more details, particularly
> >
> >        
> >http://gbiv.com/protocols/uri/rfc/rfc3986.html#normalize-encoding
> >
> >"...some URI producers percent-encode octets that do not
> >require percent-
> >encoding, resulting in URIs that are equivalent to their non-encoded
> >counterparts." along with other indications that percent
> >encoding is presumed
> >allowable.
> >
> >> My goal is that when different people use a CPE Name for a given
> >> platform, that the exact same string of characters is used.  In
> other
> >> words, that the given identifier is exactly the same.  I
> >don't think we
> >> want to allow people to use any number of possible encodings.
> >>
> >> To accomplish this, I'd like the spec to say that a CPE Name
> >is always
> >> in percent-encoded form with only certain characters being
> >> percent-encoded.  Can we make this wording work?
> >>
> >> Thanks
> >> Drew
Andrew Buttner

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
In reply to this post by Gary Newman-2
>I believe Drew is trying to make the database lookup problem
>simpler by requiring there to be only one ascii character
>sequence that represents a CPE name.  

The very first use case that was presented to us was two tool trying to
share information related to a specific platform.  Tool 1 would say
"here is info for WinXP".  Tool 2 would say "I don't know about WinXP
but do you mean Windows XP".  What was needed was a common identifier
so both tools talked the same language.

This is similar to the need for CVE that is out there.

What I am trying to avoid is forcing tools to have to undergo a lot of
processing to do this match.  But as I think about it, I am guessing
that tools will have to perform a matching algorithm no matter what
since the first tool may respond with "here is info for WinXP" and the
second tool will say "I am working on a WinXP/SP2 system".  So a
straight character map isn't going to work.

What do other tool vendors think about this issue?  Would having to
normalize a CPE Name before processing cause issues?  Would it be ok
for producers to send any number of different percent-encoded forms of
the same CPE Name?

Thanks
Drew
Andrew Buttner

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
In reply to this post by Gary Newman-2
>> I would like to state that CPE requires percent-encoded normalized
URIs
>> to be used.  Isn't this something that the hypothetical "CPE URI
>> Scheme" could declare?  Or could we cover things by saying as part
of
>> the CPE Spec that percent-encoded normalization should occur before
>> producing or consuming a CPE Name?  (that way only normalized Names
are

>> used)
>
>When you say "CPE requires" I'm guessing that you mean to say
>that a CPE Name can only be a limited kind of URI.  If by
>"percent-encoded normalized" you mean the decoding of all
>unreserved characters, then I assume the CPE specification
>can specify whatever it wants.  The consequences of this
>limited URI would likely cause confusion and headaches though.
>It's simple enough to limit what goes into the dictionary, and
>how matching is done.  Limiting the kind of URI
>that a CPE Name is may come back to haunt us, IMHO.


Yes, you correctly captured my intentions.  Of course I am not hearing
much support for this type of limited URI.  I very well might be
focused too much on our traditional idea of an identifier.  If we think
of the "identifier" as being the unique collection of the different
components and the CPE Name as a way of talking about this collection,
then things start to make sense again.

Thanks
Drew
Vladimir Giszpenc

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
In reply to this post by Andrew Buttner
Hi Drew,

I will restate what I have said in the past.  This is an idealistic view
(from an insignificant vendor) that probably could not happen in the current
version, but I am young and naive.

> What do other tool vendors think about this issue?  Would having to
> normalize a CPE Name before processing cause issues?  Would it be ok
> for producers to send any number of different percent-encoded forms of
> the same CPE Name?

When normalizing, identifiers should not be meaningful.  Integers and
globally unique (GUID) or universally unique identifiers (UUID) are
fantastic for this purpose.  For actual files such as executables, using a
hash is another good way to identify things.  Until we get there, we will
have problems.  Could we seed an DatabaseId tag or something like that?  You
could even call it MitreId if you want.  Anything less will leave room for
interpretation which is the problem.  We really don't want interpretation.

As I like to say, numbers are cheap.  GUIDs and UUIDs are more useful but
slightly more expensive.  As far the URI and case question, I think Dave
Waltermire explained clearly that the tools can do a lot of the heavy
lifting.  

Please don't flame me too badly for bringing up a sore subject.

Thanks,

Vladimir Giszpenc
DSCI Contractor Supporting
US Army CERDEC S&TCD IAD Tactical Network Protection Branch
(732) 532-8959



smime.p7s (4K) Download Attachment
Andrew Buttner

Re: Version 2.1 Release Candidate

Reply Threaded More More options
Print post
Permalink
No flame needed.  I think your points are good ones.  I think serious
thought needs to be given to the use of numerical ids in the next major
version.  Right now though we have to work with the current version.  I
think this discussion related to URIs will help us better understand why
certain things might not work.  As Neal said in the past, I think it is
important for us to give this a fair shot and then learn from our mistakes.

>As far the URI and case question, I think Dave
>Waltermire explained clearly that the tools can do
>a lot of the heavy lifting.  

ok

Thanks
Drew


smime.p7s (4K) Download Attachment