Drew,
I would like to bring this thread down to a more technical level with
concrete examples so we can avoid talking past each other. Please find
some example Java code at the end of my email to illustrate my
perspective.
I don't understand your stated goal below. It seems reasonable to me
that two CPE names are equivalent if their decoded components are equal.
This is the approach I have taken in the equals method in my example.
Matching would work using a similar approach using decoded components.
I would argue that what I am suggesting is fine from an application
processing perspective. It may be harder for a human to interpret,
however we are not developing CPE for humans as it is a machine
identifier.
What do you see as the major arguments for or against this approach?
Dave
DefaultCPEName.java:
import java.io.UnsupportedEncodingException;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URLDecoder;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* A default implementation of a CPEName.
*/
// TODO: add support for escaping
public class DefaultCPEName {
/** the pattern for a name */
private static final Pattern pattern =
Pattern.compile("\\A[cC][pP][eE]:/([AHOaho]?(:(?:[A-Za-z0-9\\._\\-~]|%[A
-Fa-f0-9]{2})*){0,6})(?<!:)\\Z");
/** The CPEName URI scheme name */
public static final String SCHEME = "cpe";
/** Value for the first component of a hardware CPEName Name */
public static final String PART_HARDWARE = "h";
/** Value for the first component of an operating system CPEName
Name */
public static final String PART_OS = "o";
/** Value for the first component of an application CPEName Name */
public static final String PART_APPLICATION = "a";
/** object state - all of these are serializable */
private final URI uri;
/** holds a cache of the components */
private transient String[] components;
/** cache of the unique part of the string to use with hashCode and
equals */
private transient int hashCache = 0;
/**
* Create a CPEName from a string.
*
* @param s a valid CPE name string
* @throws URISyntaxException if s is not a valid cpe value
*/
public DefaultCPEName(String s) throws URISyntaxException {
this(new URI(s));
}
/**
* Creates a CPE name from a URI
* @param uri a CPE URI value
* @throws NullPointerException if URI is null
*/
public DefaultCPEName(URI uri) throws URISyntaxException {
this.uri = uri;
if (uri == null) {
throw new NullPointerException("uri");
}
Matcher m = pattern.matcher(uri.toString());
if (!m.matches()) {
throw new URISyntaxException(
uri.toString(),
"Invalid CPEName");
}
}
/** {@inheritDoc} */
public URI getName() {
return uri;
}
/** {@inheritDoc} */
@Override
public String toString() {
return uri.toString();
}
/** {@inheritDoc} */
@Override
public int hashCode() {
if (hashCache == 0) {
int hash = 17;
for (String component : getComponents()) {
hash = 37 * hash + component.hashCode();
}
hashCache = hash;
}
return hashCache;
}
/**
* Return true if this CPEName equals another. Note that
* this is different from matching!
*
* {@inheritDoc}
*
*/
@Override
public boolean equals(Object o) {
if (!(o instanceof DefaultCPEName)) {
return false;
}
DefaultCPEName that = (DefaultCPEName)o;
String[] thisComponents = getComponents();
String[] thatComponents = that.getComponents();
if (thisComponents.length != thatComponents.length) {
return false;
}
for (int i = 0;i < thisComponents.length;i++) {
if (!thisComponents[i].equals(thatComponents[i])) {
return false;
}
}
return true;
}
/** {@inheritDoc} */
public String[] getComponents() {
if (components == null) {
components =
uri.getSchemeSpecificPart().substring(1).split(":");
for (int i = 0;i < components.length;i++) {
try {
components[i] = URLDecoder.decode(components[i],
"UTF-8").toLowerCase();
} catch (UnsupportedEncodingException e) {
throw new RuntimeException(e);
}
}
}
return components;
}
/** {@inheritDoc} */
public boolean contains(final CPEName cpe) {
String[] myComponents = getComponents();
String[] inComponents = cpe.getComponents();
// if length(cpe) >= length(this) then we may have a match
if (inComponents.length < myComponents.length) {
return false;
}
boolean retval = false;
// check each component of n and this
for (int i = 0; i < myComponents.length; i++) {
// components equal, or this component is empty
if (myComponents[i].equalsIgnoreCase(inComponents[i])
|| myComponents[i].length() == 0
|| inComponents[i].length() == 0) {
retval = true;
} else {
retval = false;
break;
}
}
return retval;
}
}
> -----Original Message-----
> From: Buttner, Drew [mailto:
[hidden email]]
> Sent: Wednesday, January 30, 2008 1:37 PM
> To:
[hidden email]
> Subject: Re: [CPE-DISCUSSION-LIST] Version 2.1 Release Candidate
>
> My goal is that when different people use a CPE Name for a
> given platform, that the exact same string of characters is
> used. In other words, that the given identifier is exactly
> the same. I don't think we want to allow people to use any
> number of possible encodings.
>
> To accomplish this, I'd like the spec to say that a CPE Name
> is always in percent-encoded form with only certain
> characters being percent-encoded. Can we make this wording work?
>
> Thanks
> Drew
>
>
>
> >-----Original Message-----
> >From: Gary Newman [mailto:
[hidden email]]
> >Sent: Tuesday, January 29, 2008 10:06 AM
> >To: cpe-discussion-list CPE Community Forum
> >Subject: Re: [CPE-DISCUSSION-LIST] Version 2.1 Release Candidate
> >
> >Note this differs from RFCs, as they specify encoding yet
> describe the
>
> >components in unencoded form.
> >
> >URIs that are encoded differently but decode the same MUST
> be the same
> >CPE name. Otherwise the optional 2^N character encodings
> will make 2^N
> >names that decode to the same CPE name.
> >
> >> >In the spirit of this a CPE Name should always be in the
> >encoded form,
> >>
> >> I think this will serve us best since it will allow us to have
> unique
> >> names. As long as we state what characters should be encoded then
> we
> >> should be fine. We would also want to state that for CPE,
> URIs that
> >> are encoded differently but decode the same (think of one
> >that encodes
> >> every character) would be different CPE Names. This satisfies the
> >> uniqueness. (maybe they "point" to the same platform type though)
> >>
> >>
> >>
> >> >however matching should be done on a
> component-by-component basis in
> >> >the decoded form.
> >>
> >> agree. This basically removes the technicality mentioned above
> where
> >> two URIs that decode the same would be different CPE Name.
> At least
> >> these name would match.
> >
>