|
|
|
Andrew Buttner
|
All,
Version 2.0 of the CPE Specification was released on September 14th, 2008. At that time the community wanted to see CPE go through a stabilization period and have the community attempt to use the specification in order to get a better feel for future direction. The past year has seen a lot of conversation within the community about possible direction with some different ideas about the best future path to take. I wanted to start discussion on the future vision for CPE. In the very near term we have a new minor release (2.2) scheduled to be official on March 11. There is also a huge push currently ongoing to clean-up the Official CPE Dictionary. But where do we go after that? There is a lot in this email, for that I apologize. Hopefully some of these points can sparks some discussion as your views will help us better understand where CPE needs to go. Questions below: - What are we enumerating? - Is software inventory THE target technical use case? - Should the CPE Language be removed? - What should we do with CPE Matching? - Should we keep the URI? ------------------------- By name CPE is an enumeration. Probably the biggest question to be answered is what are we enumerating? CPE currently is about enumerating platform types, but this has proved to be a very broad term, and CPE has struggled to address what a platform type is. More on this in a second. Based on the research accomplished this past year regarding technical use cases, one option would be for CPE to focus on the software inventory technical use case. This seems to be the single use case that is shared across all members of the CPE Community. By narrowing our focus, we can hopefully deliver a solution that works for those users and not get bogged down trying to support fringe cases. Agree? The software inventory technical use case calls for enumerating platform types based on the underlying software products (either operating systems or applications). What is a software product? This could be defined using the following characteristics: * A user can download or buy it. * There is a vendor/organization that produces it. * An enterprise IT administrator can push it out over the enterprise network and install it into their environment. * It is (or can be) recorded by an asset management tool. In other words, every CPE Name should have at its root a software product. CPE would not try to name web pages, code libraries, functional types, etc. These areas are still important and we as a community need to address them. The suggestion however is to address them with their own enumerations and enable CPE to focus on its core mission. A movement toward multiple enumerations brings to light the need for a good expression language to tie everything together and make more complex statements. This in a way relates to the goal of the CPE Language. The CPE Language is currently under-used (if at all) and really goes against the idea of simplifying CPE. Should this be removed from CPE and stood up on its own or merged into an existing initiative? Thoughts? As we address the questions above, CPE might need to evolve to meet the technical challenges encountered and to try and solve the issues that have been experienced in version 2. Some of the ideas that have been brought up in the past: - don't make any major changes, even with its issues CPE is working well enough, focus on some minor tweaking - the URI is a major problem as the terms used are not permanent (e.g. product name changes) and are not consistent (e.g. 'windows_2003_server' and 'window_server_2003'), thus we should move away from the URI and switch to a numerical id - matching is the root of CPE's issues (e.g. the version component) and need to either be removed or completely rewritten (can we leverage an ontology?) - CPE should not try to be an enumeration but rather should be an expression enabling a user to talk about some combination of vendor, product, and version related to a target What is your reaction to the ideas above? Are there other ideas that need to be considered? Over the next few weeks I will be putting together a proposal for where to go with CPE and what changes should be considered, but I need your input to make sure that Version 3 is a long-term success. I thank you in advance for help you can provide on steering this ship. Thanks Drew --------- Andrew Buttner The MITRE Corporation [hidden email] 781-271-3515 |
||||||||||||||||
|
Wolfkiel, Joseph
|
I'm somewhat bummed that I haven't seen any discussion on this issue.
I'll go ahead and try to link it to earlier discussions about ontologies. Basically, I think the URI structure imposes unacceptable limits on our ability to express the names we need. I'd like to go to a tagged structure based on a more informed understanding of what a CPE is. At the end of the day, I think CPE should be about names for installable software, legal relationships between those names, and managing a body of community content that can be used to derive the consensus name for any given product--so we can build interoperable tools. Based on my experience in the DoD over the past 2.5 years, I'm thinking CPE should break away from the URI structure and go to a tagged structure that allows users to populate just the elements they want to communicate. I'm also thinking CPE should only address installable software inventories and not try to differentiate between OS and applications. I don't think vendor is a good base for products, particularly given the open source community and the potential to have a single product distributed by multiple vendors. I think product is a better base. I'm also wondering if any type of transport (URI/XML/CSV/JSON.. whatever) really has any business prescribing how a given vendor must use the text strings, titles, and other information tracked in CPE, so I'm thinking it may not have any business being part of the standard. I'm attaching a UML diagram of how I'm thinking an ontology for single CPE might look. I'm still trying to determine if there's a dependency between version and update, but I'm almost completely sure there isn't a dependency between edition and version, or between edition and update. I'm also thinking we may need to have subordinate elements of version that break out major, minor, and sub-minor version information. I also think deprecation should take place on a per-component basis. Also that there's no guarantee of uniqueness for the text names of CPE components, so they should be assigned unique identifiers, which should be the basis for managing deprecation. Let me know what you think. If we agree on this, we can put an "any" tag at the end of the standard CPE Component element list and other standards can expand on cpe core data by bringing in data elements that address function, family, hash, etc. That said, the CPE forum is meant to be consensus driven, so I'll bow to the collective wisdom of contributors to the list. Also attached an xml schema that implements the ontology as an XML language. Lt Col Joseph L. Wolfkiel Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office 9800 Savage Rd Ste 6767 Ft Meade, MD 20755-6767 Commercial 410-854-5401 DSN 244-5401 Fax 410-854-6700 -----Original Message----- From: Buttner, Drew [mailto:[hidden email]] Sent: Wednesday, March 04, 2009 2:07 PM To: [hidden email] Subject: [CPE-DISCUSSION-LIST] CPE Future Vision All, Version 2.0 of the CPE Specification was released on September 14th, 2008. At that time the community wanted to see CPE go through a stabilization period and have the community attempt to use the specification in order to get a better feel for future direction. The past year has seen a lot of conversation within the community about possible direction with some different ideas about the best future path to take. I wanted to start discussion on the future vision for CPE. In the very near term we have a new minor release (2.2) scheduled to be official on March 11. There is also a huge push currently ongoing to clean-up the Official CPE Dictionary. But where do we go after that? There is a lot in this email, for that I apologize. Hopefully some of these points can sparks some discussion as your views will help us better understand where CPE needs to go. Questions below: - What are we enumerating? - Is software inventory THE target technical use case? - Should the CPE Language be removed? - What should we do with CPE Matching? - Should we keep the URI? ------------------------- By name CPE is an enumeration. Probably the biggest question to be answered is what are we enumerating? CPE currently is about enumerating platform types, but this has proved to be a very broad term, and CPE has struggled to address what a platform type is. More on this in a second. Based on the research accomplished this past year regarding technical use cases, one option would be for CPE to focus on the software inventory technical use case. This seems to be the single use case that is shared across all members of the CPE Community. By narrowing our focus, we can hopefully deliver a solution that works for those users and not get bogged down trying to support fringe cases. Agree? The software inventory technical use case calls for enumerating platform types based on the underlying software products (either operating systems or applications). What is a software product? This could be defined using the following characteristics: * A user can download or buy it. * There is a vendor/organization that produces it. * An enterprise IT administrator can push it out over the enterprise network and install it into their environment. * It is (or can be) recorded by an asset management tool. In other words, every CPE Name should have at its root a software product. CPE would not try to name web pages, code libraries, functional types, etc. These areas are still important and we as a community need to address them. The suggestion however is to address them with their own enumerations and enable CPE to focus on its core mission. A movement toward multiple enumerations brings to light the need for a good expression language to tie everything together and make more complex statements. This in a way relates to the goal of the CPE Language. The CPE Language is currently under-used (if at all) and really goes against the idea of simplifying CPE. Should this be removed from CPE and stood up on its own or merged into an existing initiative? Thoughts? As we address the questions above, CPE might need to evolve to meet the technical challenges encountered and to try and solve the issues that have been experienced in version 2. Some of the ideas that have been brought up in the past: - don't make any major changes, even with its issues CPE is working well enough, focus on some minor tweaking - the URI is a major problem as the terms used are not permanent (e.g. product name changes) and are not consistent (e.g. 'windows_2003_server' and 'window_server_2003'), thus we should move away from the URI and switch to a numerical id - matching is the root of CPE's issues (e.g. the version component) and need to either be removed or completely rewritten (can we leverage an ontology?) - CPE should not try to be an enumeration but rather should be an expression enabling a user to talk about some combination of vendor, product, and version related to a target What is your reaction to the ideas above? Are there other ideas that need to be considered? Over the next few weeks I will be putting together a proposal for where to go with CPE and what changes should be considered, but I need your input to make sure that Version 3 is a long-term success. I thank you in advance for help you can provide on steering this ship. Thanks Drew --------- Andrew Buttner The MITRE Corporation [hidden email] 781-271-3515 |
|
Ernest Park-2
|
On Thu, Mar 5, 2009 at 3:50 PM, Wolfkiel, Joseph <[hidden email]> wrote: I'm somewhat bummed that I haven't seen any discussion on this issue. THe URI structure does provide a simple API - a known way of communication. Agreed - it is not a database, and the one dimensional structure is hard to define the layered interdependancies.
It is a reasonable distinction - and any distinction allows further filtering of data. Whether everybody needs the "application" information or not, it can be output as a CPE 1.x or 2.x "string" as the result to a basic query.
Once the string is satisfied, the additional metadata tags - open, proprietary, community developed - can also be interrogated, assuming a general output XML schema or database table schema is agreed.
In this way, it is the overall schema with its complex relationships that defines CPE, not a limited, but valuable, string that gets us to narrow down the choices that resolve a more complex query.
I don't think vendor For OSS - vendor in reality needs to be the lowest common denominator, or can be a few. In my database, a single product can be joined to multiple vendors due to changes in publishing, development, etc, over the life. I can therefore "fix" dynamic and historically evolving data to be a flat string to serve the needs of CPE constrained reporting, while my database is aware of the complex relations that make up a name definition. This problem is not distinct to OSS, as acquisitions, bankruptcies, lawsuits, all change hte ownership and the provenance of software over its life.
If a CPE string is a reporting mechanism, then it works. The database has to understand the complexity of the data, but hte string output can be simplified and human friendly.
In this way, by CPE normalized output can be read by another tool that can read CPE, and it can layer additional metadata into a three dimensional output. I still think that the basic string elements are valid, as long as we agree to synchronize the highest level of the database, then we all communicate using these strings.
Additionally, these strings may not be unique, or for any given combination, there may be multiple different results. Understanding this, the query that builds the result set needs to contain additional test logic to further qualifiy the multiple result possibilities for the most likely match.
Why deprecate at all? I maintain old names - all names, just in case older data is floating around. I can correct the query and I still have all the permutations. I don't care if I have ten different things that resolve to apache:server (made up example), I can drill through a smaller result set of ten records easier than 50,000 and more.
|
||||||||||||||||
|
Dawn Adams
|
In reply to this post
by Wolfkiel, Joseph
Some javascript/style in this post has been disabled (why?)
Hi Joe, So far this seems like a pretty good idea. Do you agree with this statement? CPE should not try to be an enumeration but
rather should be an expression enabling a user to talk about some combination
of vendor, product, and version related to a target. By this would the target would also be part of the CPE
ID of a product however the naming is resolved? How would hardware based products fit into the CPE
standard – if at all? I agree with your discussion of deprecation and URIs. Dawn -----Original Message----- * PGP Bad Signature, Signed by an unverified key I'm somewhat bummed that I haven't seen any discussion
on this issue. I'll go ahead and try to link it to earlier
discussions about ontologies. Basically, I think the URI structure imposes
unacceptable limits on our ability to express the names we need. I'd like
to go to a tagged structure based on a more informed understanding of what a CPE
is. At the end of the day, I think CPE should be about names for installable
software, legal relationships between those names, and managing a body
of community content that can be used to derive the consensus name for any
given product--so we can build interoperable tools. Based on my experience in the DoD over the past 2.5
years, I'm thinking CPE should break away from the URI structure and go to a
tagged structure that allows users to populate just the elements they want
to communicate. I'm also thinking CPE should only address installable
software inventories and not try to differentiate between OS and
applications. I don't think vendor is a good base for products, particularly given the
open source community and the potential to have a single product distributed
by multiple vendors. I think product is a better base. I'm also
wondering if any type of transport (URI/XML/CSV/JSON.. whatever) really has any
business prescribing how a given vendor must use the text strings, titles,
and other information tracked in CPE, so I'm thinking it may not have any
business being part of the standard. I'm attaching a UML diagram of how I'm thinking an
ontology for single CPE might look. I'm still trying to determine if
there's a dependency between version and update, but I'm almost completely sure
there isn't a dependency between edition and version, or between edition and
update. I'm also thinking we may need to have subordinate elements of
version that break out major, minor, and sub-minor version information. I also think deprecation should take place on a
per-component basis. Also that there's no guarantee of uniqueness for the text
names of CPE components, so they should be assigned unique
identifiers, which should be the basis for managing deprecation. Let me know what you think. If we agree on this,
we can put an "any" tag at the end of the standard CPE Component element list and
other standards can expand on cpe core data by bringing in data elements
that address function, family, hash, etc. That said, the CPE forum is meant to be consensus
driven, so I'll bow to the collective wisdom of contributors to the list. Also attached an xml schema that implements the
ontology as an XML language. Director, Computer Network Defense Research &
Technology (CND R&T) Program Management Office Ft Commercial 410-854-5401 DSN 244-5401 Fax 410-854-6700 -----Original Message----- From: Buttner, Drew [mailto:[hidden email]] Sent: Wednesday, March 04, 2009 2:07 PM To: [hidden email] Subject: [CPE-DISCUSSION-LIST] CPE Future Vision All, Version 2.0 of the CPE Specification was released on
September 14th, 2008. At that time the community wanted to see CPE go
through a stabilization period and have the community attempt to use the
specification in order to get a better feel for future direction. The past
year has seen a lot of conversation within the community about possible
direction with some different ideas about the best future path to take. I wanted to start discussion on the future vision for
CPE. In the very near term we have a new minor release (2.2) scheduled to be
official on March 11. There is also a huge push currently ongoing to
clean-up the Official CPE Dictionary. But where do we go after that?
There is a lot in this email, for that I apologize. Hopefully some of these
points can sparks some discussion as your views will help us better
understand where CPE needs to go. Questions below: - What are we enumerating? - Is software inventory THE target technical use case? - Should the CPE Language be removed? - What should we do with CPE Matching? - Should we keep the URI? ------------------------- By name CPE is an enumeration. Probably the
biggest question to be answered is what are we enumerating? CPE currently is
about enumerating platform types, but this has proved to be a very broad term,
and CPE has struggled to address what a platform type is. More on this in
a second. Based on the research accomplished this past year
regarding technical use cases, one option would be for CPE to focus on the
software inventory technical use case. This seems to be the single
use case that is shared across all members of the CPE Community. By
narrowing our focus, we can hopefully deliver a solution that works for those
users and not get bogged down trying to support fringe cases. Agree? The software inventory technical use case calls for
enumerating platform types based on the underlying software products
(either operating systems or applications). What is a software product?
This could be defined using the following characteristics: * A user can download or buy it. * There is a vendor/organization that produces it. * An enterprise IT administrator can push it out over
the enterprise network and install it into their environment. * It is (or can be) recorded by an asset management
tool. In other words, every CPE Name should have at its root
a software product. CPE would not try to name web pages, code libraries,
functional types, etc. These areas are still important and we as a community
need to address them. The suggestion however is to address them with their
own enumerations and enable CPE to focus on its core mission. A
movement toward multiple enumerations brings to light the need for a good
expression language to tie everything together and make more complex
statements. This in a way relates to the goal of the CPE Language. The CPE
Language is currently under-used (if at all) and really goes against the idea of
simplifying CPE. Should this be removed from CPE and stood up on its own or
merged into an existing initiative? Thoughts? As we address the questions above, CPE might need to
evolve to meet the technical challenges encountered and to try and solve
the issues that have been experienced in version 2. Some of the ideas that
have been brought up in the past: - don't make any major changes, even with its issues
CPE is working well enough, focus on some minor tweaking - the URI is a major problem as the terms used are not
permanent (e.g. product name changes) and are not consistent (e.g.
'windows_2003_server' and 'window_server_2003'), thus we should move away from
the URI and switch to a numerical id - matching is the root of CPE's issues (e.g. the
version component) and need to either be removed or completely rewritten (can we
leverage an ontology?) - CPE should not try to be an enumeration but rather
should be an expression enabling a user to talk about some combination of
vendor, product, and version related to a target What is your reaction to the ideas above? Are
there other ideas that need to be considered? Over the next few weeks I will
be putting together a proposal for where to go with CPE and what changes
should be considered, but I need your input to make sure that Version 3 is a
long-term success. I thank you in advance for help you can provide on
steering this ship. Thanks Drew --------- Andrew Buttner The MITRE Corporation 781-271-3515 * Wolfkiel.Joseph.L.0514105171
<[hidden email]> * Issuer: |
||||||||||||||||
|
Wolfkiel, Joseph
|
Some javascript/style in this post has been disabled (why?)
The enumeration concept has been a confusing
discussion. When I talk about "enumeration" with respect to CPE, I'm
thinking that CPE should "enumerate" all legal combinations of CPE component
names (vendor/product/version/update/edition/targetHW/targetSW/language) along
with the names. When we can fully populate all combinations, we have
"enumerated" all software names.
With respect to the above explanation, I think having the
enumeration allows to share a common lexicon when sharing information about
names for, and linkages between vendor, product, and version related to a
target. My thought is that providing a combination of vendor, product, and
version (or other components) would be how you would express a valid CPE
name.
Alternatively, you could express unique IDs
(i.e. alpha-numeric IDs like CPE(vend=666 prod=123 vers=658) for
each vendor, product, and version. I'm a little ambiguous about the
numeric identifiers because they assume you actually know the names of all the
software you want to describe beforehand. When we want to use automated
discovery tools, any time a new product shows up, it leaves the tool without a
way to communicate the previously unseen product or product
component.
Within the DoD, we've seen very limited use for describing
hardware. Generally, when something is described as hardware, we're trying
to relate it back to firmware apps or firmware OSs. I'm not aware of any
existing vulnerabilities or settings contained in security guidance that are
actually targeted at physical hardware (e.g. setting switches, disconnecting
cables, installed power supplies) that can be scanned for with automated
tools. I also think going into that domain will cause any number of
problems with naming and ontological relationships.
Short answer, I would advocate for dispensing with hardware
in CPE for the time being. However, I would allow hardware names to be
used to represent the firmware installed on hardware. That's one of the
reasons I would advocate for doing away with the part type attribute, since I
can't really think of a place where it adds value, but many where it sows
confusion.
Lt Col Joseph L.
Wolfkiel From: Dawn Adams [mailto:[hidden email]] Sent: Thursday, March 05, 2009 4:21 PM To: [hidden email] Subject: Re: [CPE-DISCUSSION-LIST] *** renamed attachment *** Re: [CPE-DISCUSSION-LIST] CPE Future Vision Hi Joe, So far this seems like a pretty good
idea. Do you agree with this statement?
CPE should not try to be an enumeration but
rather should be an expression enabling a user to talk about some combination of
vendor, product, and version related to a target.
By this would the target would also be part of the CPE
ID of a product however the naming is
resolved? How would hardware based products fit into the CPE
standard – if at all? I agree with your discussion of deprecation and
URIs. Dawn -----Original Message----- * PGP Bad Signature, Signed by an unverified
key I'm somewhat bummed that I haven't seen any discussion
on this issue. I'll go ahead and try to link it to earlier discussions
about ontologies. Basically, I think the URI structure imposes
unacceptable limits on our ability to express the names we need. I'd like to
go to a tagged structure based on a more informed understanding of what a CPE
is. At the end of the day, I think CPE should be about names for installable
software, legal relationships between those names, and managing a body
of community content that can be used to derive the consensus name for any
given product--so we can build interoperable
tools. Based on my experience in the DoD over the past 2.5
years, I'm thinking CPE should break away from the URI structure and go to a
tagged structure that allows users to populate just the elements they want to
communicate. I'm also thinking CPE should only address installable
software inventories and not try to differentiate between OS and
applications. I don't think vendor is a good base for products, particularly given the open
source community and the potential to have a single product distributed
by multiple vendors. I think product is a better base. I'm also
wondering if any type of transport (URI/XML/CSV/JSON.. whatever) really has any
business prescribing how a given vendor must use the text strings, titles,
and other information tracked in CPE, so I'm thinking it may not have any
business being part of the standard. I'm attaching a UML diagram of how I'm thinking an
ontology for single CPE might look. I'm still trying to determine if
there's a dependency between version and update, but I'm almost completely sure there
isn't a dependency between edition and version, or between edition and
update. I'm also thinking we may need to have subordinate elements of
version that break out major, minor, and sub-minor version
information. I also think deprecation should take place on a
per-component basis. Also that there's no guarantee of uniqueness for the text
names of CPE components, so they should be assigned unique
identifiers, which should be the basis for managing
deprecation. Let me know what you think. If we agree on this,
we can put an "any" tag at the end of the standard CPE Component element list and
other standards can expand on cpe core data by bringing in data elements
that address function, family, hash, etc. That said, the CPE forum is meant to be consensus
driven, so I'll bow to the collective wisdom of contributors to the
list. Also attached an xml schema that implements the ontology
as an XML language. Director, Computer Network Defense Research &
Technology (CND R&T) Program Management Office Ft Commercial 410-854-5401 DSN
244-5401 Fax 410-854-6700 -----Original
Message----- From: Buttner, Drew [mailto:[hidden email]]
Sent: Wednesday, March 04, 2009 2:07
PM To:
[hidden email] Subject: [CPE-DISCUSSION-LIST] CPE Future
Vision All, Version 2.0 of the CPE Specification was released on
September 14th, 2008. At that time the community wanted to see CPE go through
a stabilization period and have the community attempt to use the
specification in order to get a better feel for future direction. The past
year has seen a lot of conversation within the community about possible
direction with some different ideas about the best future path to
take. I wanted to start discussion on the future vision for
CPE. In the very near term we have a new minor release (2.2) scheduled to be
official on March 11. There is also a huge push currently ongoing to clean-up
the Official CPE Dictionary. But where do we go after that?
There is a lot in this email, for that I apologize. Hopefully some of these
points can sparks some discussion as your views will help us better understand
where CPE needs to go. Questions below: - What are we
enumerating? - Is software inventory THE target technical use
case? - Should the CPE Language be
removed? - What should we do with CPE
Matching? - Should we keep the
URI? ------------------------- By name CPE is an enumeration. Probably the
biggest question to be answered is what are we enumerating? CPE currently is about
enumerating platform types, but this has proved to be a very broad term, and
CPE has struggled to address what a platform type is. More on this in a
second. Based on the research accomplished this past year
regarding technical use cases, one option would be for CPE to focus on the
software inventory technical use case. This seems to be the single
use case that is shared across all members of the CPE Community. By
narrowing our focus, we can hopefully deliver a solution that works for those users
and not get bogged down trying to support fringe cases.
Agree? The software inventory technical use case calls for
enumerating platform types based on the underlying software products (either
operating systems or applications). What is a software product?
This could be defined using the following
characteristics: * A user can download or buy
it. * There is a vendor/organization that produces
it. * An enterprise IT administrator can push it out over
the enterprise network and install it into their
environment. * It is (or can be) recorded by an asset management
tool. In other words, every CPE Name should have at its root a
software product. CPE would not try to name web pages, code libraries,
functional types, etc. These areas are still important and we as a community
need to address them. The suggestion however is to address them with their own
enumerations and enable CPE to focus on its core mission. A
movement toward multiple enumerations brings to light the need for a good
expression language to tie everything together and make more complex
statements. This in a way relates to the goal of the CPE Language. The CPE Language
is currently under-used (if at all) and really goes against the idea of
simplifying CPE. Should this be removed from CPE and stood up on its own or
merged into an existing initiative?
Thoughts? As we address the questions above, CPE might need to
evolve to meet the technical challenges encountered and to try and solve
the issues that have been experienced in version 2. Some of the ideas that
have been brought up in the past: - don't make any major changes, even with its issues CPE
is working well enough, focus on some minor
tweaking - the URI is a major problem as the terms used are not
permanent (e.g. product name changes) and are not consistent (e.g.
'windows_2003_server' and 'window_server_2003'), thus we should move away from the
URI and switch to a numerical id - matching is the root of CPE's issues (e.g. the version
component) and need to either be removed or completely rewritten (can we
leverage an ontology?) - CPE should not try to be an enumeration but rather
should be an expression enabling a user to talk about some combination of
vendor, product, and version related to a
target What is your reaction to the ideas above? Are
there other ideas that need to be considered? Over the next few weeks I will
be putting together a proposal for where to go with CPE and what changes
should be considered, but I need your input to make sure that Version 3 is a
long-term success. I thank you in advance for help you can provide on
steering this ship. Thanks Drew --------- Andrew Buttner The MITRE
Corporation 781-271-3515 * Wolfkiel.Joseph.L.0514105171
<[hidden email]> * Issuer: |
||||||||||||||||
|
Wolfkiel, Joseph
|
In reply to this post
by Ernest Park-2
Some javascript/style in this post has been disabled (why?)
Responses in-line. ****
Lt Col Joseph L.
Wolfkiel From: Ernest Park [mailto:[hidden email]] Sent: Thursday, March 05, 2009 4:07 PM To: [hidden email] Subject: Re: [CPE-DISCUSSION-LIST] CPE Future Vision On Thu, Mar 5, 2009 at 3:50 PM, Wolfkiel, Joseph <[hidden email]>
wrote: I'm somewhat bummed that I haven't seen any discussion on this issue. THe URI structure does provide a simple API - a known way of
communication.
Agreed - it is not a database, and the one dimensional structure is hard to
define the layered interdependancies.
****
Good point. I suppose a URI is just a transport format, and ---if we can
do away with concepts like the "prefix property" and interpretation of
unpopulated spaces to mean something other than "unpopulated"--- it's probably
just as good as any other transport. However, I find it a little
distasteful since it violates the XML concept of self-documenting code, and it
requires a tool to parse twice, once for the XML, twice to get data out of the
URI. ****
It is a reasonable distinction - and any distinction allows further
filtering of data. Whether everybody needs the "application" information or not,
it can be output as a CPE 1.x or 2.x "string" as the result to a basic
query.
Once the string is satisfied, the additional metadata tags - open,
proprietary, community developed - can also be interrogated, assuming a general
output XML schema or database table schema is agreed.
In this way, it is the overall schema with its complex relationships that
defines CPE, not a limited, but valuable, string that gets us to narrow down the
choices that resolve a more complex query.
****
Okay, I'm not really hard over on that one. It is a good filter for
humans. It just starts getting sticky when you consider that JRE
serves as an OS, but runs on an OS and many similar relationships exist between
installable plug-ins and their applications. I just don't think there's
much machine-reasoning that can be built into the distinction between OS and
app. I'm much more comfortable with adding a tag to describe target
software architecture -- whether that be JRE, windows, OSX,
etc. ****
I don't think vendor For OSS - vendor in reality needs to be the lowest common denominator, or
can be a few. In my database, a single product can be joined to multiple vendors
due to changes in publishing, development, etc, over the life. I can therefore
"fix" dynamic and historically evolving data to be a flat string to serve the
needs of CPE constrained reporting, while my database is aware of the complex
relations that make up a name definition. This problem is not distinct to OSS,
as acquisitions, bankruptcies, lawsuits, all change hte ownership and the
provenance of software over its life.
If a CPE string is a reporting mechanism, then it works. The database has
to understand the complexity of the data, but hte string output can be
simplified and human friendly.
In this way, by CPE normalized output can be read by another tool that can
read CPE, and it can layer additional metadata into a three dimensional
output.
I still think that the basic string elements are valid, as long as we
agree to synchronize the highest level of the database, then we all
communicate using these strings.
Additionally, these strings may not be unique, or for any given
combination, there may be multiple different results. Understanding this, the
query that builds the result set needs to contain additional test logic to
further qualifiy the multiple result possibilities for the most likely
match.
**** Again, I don't disagree. If we can
agree to drop the "prefix property" and just note that the URI structure should
hold vendor name in the first position, product name in the 2nd, ... then it's
just a transport mechanism. As it is now, with matching and all the added
complexity the cpe URI is difficult to deal with. I also agree that,
in a user interface, giving the ability to sort products by the different
vendors that have distributed them is a great capability. But I don't
think saying that having a relationship where "vendor" is a "distributed-by"
relationship to product would prevent you from doing
that.****
Why deprecate at all? I maintain old names - all names, just in case older
data is floating around. I can correct the query and I still have all the
permutations. I don't care if I have ten different things that resolve to
apache:server (made up example), I can drill through a smaller result set of ten
records easier than 50,000 and more.
**** I
don't equate "deprecate" with "delete." I would expect any database to
maintain old, outdated names in a deprecated status so you can retain historical
relationships. I'm just not comfortable with "same as" as a way to deal
with product names that have been changed as part of an acquisition or
other process. This assumes you may have multiple ways for users to
select the same product name.****
|
||||||||||||||||
|
Tim Keanini
|
In reply to this post
by Wolfkiel, Joseph
If we are really talking about using an ontological approach, then I strongly recommend that we look at representing this domain in RDF/RDFS and maybe OWL although for our purposes RDFS might suffice.
If we are just talking about tagging and adding facets to the data, then XML Schema is all we need and I'm all for using the right tool for the job. Let me make a bet that if we don't make this move now to RDF/RDFS/OWL, we will be kicking ourselves in a year or less. Attached is your .xsd as represented in OWL-full. Again, we don't need to use OWL-full, and the beauty is that we can use only enough OWL as is needed to model the domain. If you have a tool like Protégé 4 or TopBraidComposer, you can open the .owl file I have attached. So what? What is so special about the owl versus the xsd representation? Once in RDFS or OWL, we would not only be able to assert RDF triples but also infer them. Inference is the force multiplier because anyone who thinks human are able to perform all the assertions to continuously model this complex and changing domain is fooling themselves. Don't get me started here because let me just end with: we should be inferring vulnerabilities and higher order concepts, NOT asserting them. What is the unique value in an ontological representation such as RDF/RDFS/OWL? 1) RDF will finally allow us to model using a graph 2) RDFS afford us ontological modeling. Features that allow us to manage type constraints, instance and class attributes, subclassof type propagation, binary and n-ary relationships, relation hierarchies, etc 3) Beyond RDFS, we may need these OWL features: disjoint-decomposition, cardinality constraints, binary functions, and we could use all or some of OWL on an as-needed basis. 4) when it comes time to tie it all together (CPE, CWE, CCE, etc) or just some of them, this type of federation is simple and easy to manage if we are modeled at the RDFS/OWL level. I can go on and on about the benefits of using these higher level W3C standards for our purposes but I'll just leave it at that. Forgive me for being so passionate about this topic but I probably have more scar tissue than most on this topic. --tk Timothy D. Keanini Sr., CTO nCircle Network Security Office: +1 (415) 625-5939 www.ncircle.com blog.ncircle.com -----Original Message----- From: Wolfkiel, Joseph [mailto:[hidden email]] Sent: Thursday, March 05, 2009 2:50 PM To: [hidden email] Subject: Re: [CPE-DISCUSSION-LIST] CPE Future Vision I'm somewhat bummed that I haven't seen any discussion on this issue. I'll go ahead and try to link it to earlier discussions about ontologies. Basically, I think the URI structure imposes unacceptable limits on our ability to express the names we need. I'd like to go to a tagged structure based on a more informed understanding of what a CPE is. At the end of the day, I think CPE should be about names for installable software, legal relationships between those names, and managing a body of community content that can be used to derive the consensus name for any given product--so we can build interoperable tools. Based on my experience in the DoD over the past 2.5 years, I'm thinking CPE should break away from the URI structure and go to a tagged structure that allows users to populate just the elements they want to communicate. I'm also thinking CPE should only address installable software inventories and not try to differentiate between OS and applications. I don't think vendor is a good base for products, particularly given the open source community and the potential to have a single product distributed by multiple vendors. I think product is a better base. I'm also wondering if any type of transport (URI/XML/CSV/JSON.. whatever) really has any business prescribing how a given vendor must use the text strings, titles, and other information tracked in CPE, so I'm thinking it may not have any business being part of the standard. I'm attaching a UML diagram of how I'm thinking an ontology for single CPE might look. I'm still trying to determine if there's a dependency between version and update, but I'm almost completely sure there isn't a dependency between edition and version, or between edition and update. I'm also thinking we may need to have subordinate elements of version that break out major, minor, and sub-minor version information. I also think deprecation should take place on a per-component basis. Also that there's no guarantee of uniqueness for the text names of CPE components, so they should be assigned unique identifiers, which should be the basis for managing deprecation. Let me know what you think. If we agree on this, we can put an "any" tag at the end of the standard CPE Component element list and other standards can expand on cpe core data by bringing in data elements that address function, family, hash, etc. That said, the CPE forum is meant to be consensus driven, so I'll bow to the collective wisdom of contributors to the list. Also attached an xml schema that implements the ontology as an XML language. Lt Col Joseph L. Wolfkiel Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office 9800 Savage Rd Ste 6767 Ft Meade, MD 20755-6767 Commercial 410-854-5401 DSN 244-5401 Fax 410-854-6700 -----Original Message----- From: Buttner, Drew [mailto:[hidden email]] Sent: Wednesday, March 04, 2009 2:07 PM To: [hidden email] Subject: [CPE-DISCUSSION-LIST] CPE Future Vision All, Version 2.0 of the CPE Specification was released on September 14th, 2008. At that time the community wanted to see CPE go through a stabilization period and have the community attempt to use the specification in order to get a better feel for future direction. The past year has seen a lot of conversation within the community about possible direction with some different ideas about the best future path to take. I wanted to start discussion on the future vision for CPE. In the very near term we have a new minor release (2.2) scheduled to be official on March 11. There is also a huge push currently ongoing to clean-up the Official CPE Dictionary. But where do we go after that? There is a lot in this email, for that I apologize. Hopefully some of these points can sparks some discussion as your views will help us better understand where CPE needs to go. Questions below: - What are we enumerating? - Is software inventory THE target technical use case? - Should the CPE Language be removed? - What should we do with CPE Matching? - Should we keep the URI? ------------------------- By name CPE is an enumeration. Probably the biggest question to be answered is what are we enumerating? CPE currently is about enumerating platform types, but this has proved to be a very broad term, and CPE has struggled to address what a platform type is. More on this in a second. Based on the research accomplished this past year regarding technical use cases, one option would be for CPE to focus on the software inventory technical use case. This seems to be the single use case that is shared across all members of the CPE Community. By narrowing our focus, we can hopefully deliver a solution that works for those users and not get bogged down trying to support fringe cases. Agree? The software inventory technical use case calls for enumerating platform types based on the underlying software products (either operating systems or applications). What is a software product? This could be defined using the following characteristics: * A user can download or buy it. * There is a vendor/organization that produces it. * An enterprise IT administrator can push it out over the enterprise network and install it into their environment. * It is (or can be) recorded by an asset management tool. In other words, every CPE Name should have at its root a software product. CPE would not try to name web pages, code libraries, functional types, etc. These areas are still important and we as a community need to address them. The suggestion however is to address them with their own enumerations and enable CPE to focus on its core mission. A movement toward multiple enumerations brings to light the need for a good expression language to tie everything together and make more complex statements. This in a way relates to the goal of the CPE Language. The CPE Language is currently under-used (if at all) and really goes against the idea of simplifying CPE. Should this be removed from CPE and stood up on its own or merged into an existing initiative? Thoughts? As we address the questions above, CPE might need to evolve to meet the technical challenges encountered and to try and solve the issues that have been experienced in version 2. Some of the ideas that have been brought up in the past: - don't make any major changes, even with its issues CPE is working well enough, focus on some minor tweaking - the URI is a major problem as the terms used are not permanent (e.g. product name changes) and are not consistent (e.g. 'windows_2003_server' and 'window_server_2003'), thus we should move away from the URI and switch to a numerical id - matching is the root of CPE's issues (e.g. the version component) and need to either be removed or completely rewritten (can we leverage an ontology?) - CPE should not try to be an enumeration but rather should be an expression enabling a user to talk about some combination of vendor, product, and version related to a target What is your reaction to the ideas above? Are there other ideas that need to be considered? Over the next few weeks I will be putting together a proposal for where to go with CPE and what changes should be considered, but I need your input to make sure that Version 3 is a long-term success. I thank you in advance for help you can provide on steering this ship. Thanks Drew --------- Andrew Buttner The MITRE Corporation [hidden email] 781-271-3515 |
||||||||||||||||
|
Wolfkiel, Joseph
|
I'd like some feedback on this.
In general, I consider the ontology discussion a human planning stage to determine exactly what we want to represent in XML, JSON, CSV, URI, or whatever transport we want to use. UML Class diagrams or E-R diagrams are great ways of representing ontological relationships that humans can understand, debate, and agree on, then easily implementable in XML schema or Relational Databases. OWL and RDF are intended (in my understanding-and consistent with your explanation) to support machine understanding of data relationships in an attempt to allow machines to "reason" and make "inferences" with the data. I'm under the impression that most of the vendors and consumers in the CPE market space aren't planning to do anything with the ontological discussions we're having other than to ensure the data structures they/we build are able to represent the data properly (i.e. by developing appropriate XML schemas or database table structures). That's the intent of my internal developers. However, if there is a large demand in the CPE community to express CPE ontological "knowledge" in RDF and OWL so it is machine-consumable, then by all means let's go there. Of course, not being a machine, I'll still want to see it in UML class diagrams or E-R diagrams. The request I'd like to make then, is: "If you're a vendor or consumer of CPE data and you plan to, or would like to use machine-consumable ontological data in the form of RDF/RDFS/OWL please share that information with the list." Of course, if I don't understand the use/value of the RDF/RDFS/OWL or the way different vendors/consumers would use it, I would like to have that information too. Lt Col Joseph L. Wolfkiel Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office 9800 Savage Rd Ste 6767 Ft Meade, MD 20755-6767 Commercial 410-854-5401 DSN 244-5401 Fax 410-854-6700 -----Original Message----- From: Tim Keanini [mailto:[hidden email]] Sent: Thursday, March 05, 2009 6:21 PM To: [hidden email] Subject: Re: [CPE-DISCUSSION-LIST] CPE Future Vision If we are really talking about using an ontological approach, then I strongly recommend that we look at representing this domain in RDF/RDFS and maybe OWL although for our purposes RDFS might suffice. If we are just talking about tagging and adding facets to the data, then XML Schema is all we need and I'm all for using the right tool for the job. Let me make a bet that if we don't make this move now to RDF/RDFS/OWL, we will be kicking ourselves in a year or less. Attached is your .xsd as represented in OWL-full. Again, we don't need to use OWL-full, and the beauty is that we can use only enough OWL as is needed to model the domain. If you have a tool like Protégé 4 or TopBraidComposer, you can open the .owl file I have attached. So what? What is so special about the owl versus the xsd representation? Once in RDFS or OWL, we would not only be able to assert RDF triples but also infer them. Inference is the force multiplier because anyone who thinks human are able to perform all the assertions to continuously model this complex and changing domain is fooling themselves. Don't get me started here because let me just end with: we should be inferring vulnerabilities and higher order concepts, NOT asserting them. What is the unique value in an ontological representation such as RDF/RDFS/OWL? 1) RDF will finally allow us to model using a graph 2) RDFS afford us ontological modeling. Features that allow us to manage type constraints, instance and class attributes, subclassof type propagation, binary and n-ary relationships, relation hierarchies, etc 3) Beyond RDFS, we may need these OWL features: disjoint-decomposition, cardinality constraints, binary functions, and we could use all or some of OWL on an as-needed basis. 4) when it comes time to tie it all together (CPE, CWE, CCE, etc) or just some of them, this type of federation is simple and easy to manage if we are modeled at the RDFS/OWL level. I can go on and on about the benefits of using these higher level W3C standards for our purposes but I'll just leave it at that. Forgive me for being so passionate about this topic but I probably have more scar tissue than most on this topic. --tk Timothy D. Keanini Sr., CTO nCircle Network Security Office: +1 (415) 625-5939 www.ncircle.com blog.ncircle.com -----Original Message----- From: Wolfkiel, Joseph [mailto:[hidden email]] Sent: Thursday, March 05, 2009 2:50 PM To: [hidden email] Subject: Re: [CPE-DISCUSSION-LIST] CPE Future Vision I'm somewhat bummed that I haven't seen any discussion on this issue. I'll go ahead and try to link it to earlier discussions about ontologies. Basically, I think the URI structure imposes unacceptable limits on our ability to express the names we need. I'd like to go to a tagged structure based on a more informed understanding of what a CPE is. At the end of the day, I think CPE should be about names for installable software, legal relationships between those names, and managing a body of community content that can be used to derive the consensus name for any given product--so we can build interoperable tools. Based on my experience in the DoD over the past 2.5 years, I'm thinking CPE should break away from the URI structure and go to a tagged structure that allows users to populate just the elements they want to communicate. I'm also thinking CPE should only address installable software inventories and not try to differentiate between OS and applications. I don't think vendor is a good base for products, particularly given the open source community and the potential to have a single product distributed by multiple vendors. I think product is a better base. I'm also wondering if any type of transport (URI/XML/CSV/JSON.. whatever) really has any business prescribing how a given vendor must use the text strings, titles, and other information tracked in CPE, so I'm thinking it may not have any business being part of the standard. I'm attaching a UML diagram of how I'm thinking an ontology for single CPE might look. I'm still trying to determine if there's a dependency between version and update, but I'm almost completely sure there isn't a dependency between edition and version, or between edition and update. I'm also thinking we may need to have subordinate elements of version that break out major, minor, and sub-minor version information. I also think deprecation should take place on a per-component basis. Also that there's no guarantee of uniqueness for the text names of CPE components, so they should be assigned unique identifiers, which should be the basis for managing deprecation. Let me know what you think. If we agree on this, we can put an "any" tag at the end of the standard CPE Component element list and other standards can expand on cpe core data by bringing in data elements that address function, family, hash, etc. That said, the CPE forum is meant to be consensus driven, so I'll bow to the collective wisdom of contributors to the list. Also attached an xml schema that implements the ontology as an XML language. Lt Col Joseph L. Wolfkiel Director, Computer Network Defense Research & Technology (CND R&T) Program Management Office 9800 Savage Rd Ste 6767 Ft Meade, MD 20755-6767 Commercial 410-854-5401 DSN 244-5401 Fax 410-854-6700 -----Original Message----- From: Buttner, Drew [mailto:[hidden email]] Sent: Wednesday, March 04, 2009 2:07 PM To: [hidden email] Subject: [CPE-DISCUSSION-LIST] CPE Future Vision All, Version 2.0 of the CPE Specification was released on September 14th, 2008. At that time the community wanted to see CPE go through a stabilization period and have the community attempt to use the specification in order to get a better feel for future direction. The past year has seen a lot of conversation within the community about possible direction with some different ideas about the best future path to take. I wanted to start discussion on the future vision for CPE. In the very near term we have a new minor release (2.2) scheduled to be official on March 11. There is also a huge push currently ongoing to clean-up the Official CPE Dictionary. But where do we go after that? There is a lot in this email, for that I apologize. Hopefully some of these points can sparks some discussion as your views will help us better understand where CPE needs to go. Questions below: - What are we enumerating? - Is software inventory THE target technical use case? - Should the CPE Language be removed? - What should we do with CPE Matching? - Should we keep the URI? ------------------------- By name CPE is an enumeration. Probably the biggest question to be answered is what are we enumerating? CPE currently is about enumerating platform types, but this has proved to be a very broad term, and CPE has struggled to address what a platform type is. More on this in a second. Based on the research accomplished this past year regarding technical use cases, one option would be for CPE to focus on the software inventory technical use case. This seems to be the single use case that is shared across all members of the CPE Community. By narrowing our focus, we can hopefully deliver a solution that works for those users and not get bogged down trying to support fringe cases. Agree? The software inventory technical use case calls for enumerating platform types based on the underlying software products (either operating systems or applications). What is a software product? This could be defined using the following characteristics: * A user can download or buy it. * There is a vendor/organization that produces it. * An enterprise IT administrator can push it out over the enterprise network and install it into their environment. * It is (or can be) recorded by an asset management tool. In other words, every CPE Name should have at its root a software product. CPE would not try to name web pages, code libraries, functional types, etc. These areas are still important and we as a community need to address them. The suggestion however is to address them with their own enumerations and enable CPE to focus on its core mission. A movement toward multiple enumerations brings to light the need for a good expression language to tie everything together and make more complex statements. This in a way relates to the goal of the CPE Language. The CPE Language is currently under-used (if at all) and really goes against the idea of simplifying CPE. Should this be removed from CPE and stood up on its own or merged into an existing initiative? Thoughts? As we address the questions above, CPE might need to evolve to meet the technical challenges encountered and to try and solve the issues that have been experienced in version 2. Some of the ideas that have been brought up in the past: - don't make any major changes, even with its issues CPE is working well enough, focus on some minor tweaking - the URI is a major problem as the terms used are not permanent (e.g. product name changes) and are not consistent (e.g. 'windows_2003_server' and 'window_server_2003'), thus we should move away from the URI and switch to a numerical id - matching is the root of CPE's issues (e.g. the version component) and need to either be removed or completely rewritten (can we leverage an ontology?) - CPE should not try to be an enumeration but rather should be an expression enabling a user to talk about some combination of vendor, product, and version related to a target What is your reaction to the ideas above? Are there other ideas that need to be considered? Over the next few weeks I will be putting together a proposal for where to go with CPE and what changes should be considered, but I need your input to make sure that Version 3 is a long-term success. I thank you in advance for help you can provide on steering this ship. Thanks Drew --------- Andrew Buttner The MITRE Corporation [hidden email] 781-271-3515 |
||||||||||||||||
|
Harold Booth-2
|
I apologize for the length of this response but please bear with me.
I feel that perhaps a discussion regarding what CPE 3.0 would look like without a full discussion of the use cases we wish to support as a community is perhaps premature. Drew's initial message on this thread touched on what should be the primary technical use cases and I would like to add my thoughts on this topic. The first use case is the ability to unambiguously identify a product. In the most commmon context this would be the ability to identify a software product which is installed on a system. Or to use this unambiguous identification as a means to communicate about this product either between a human and a machine or between two machines. To be clear, by unambiguous identification I mean that the name I provide cannot be used to identify another product. I don't think it matters whether multiple names could be used to unambiguously identify a product I believe it only matters that each name identifies that product and that product alone. If we wish, we can allow co-existence of multiple names, mark one name as preferred, or mark all but one name as deprecated. I think it does not matter, as long as every one ends up with the same semantic product, the rest is syntax. I believe this use case is equivalent to the software inventory technical use case identified in Drew's message. The second use case is the ability to associate product information with other entities (CVE, CCE, Checklists/XCCDF, Other CPEs, etc...). The type of product information we wish to associate with other entities is not necessarily limited in scope. By this I mean that the specification should not attempt to bound what product information (domains) can be used, or even what the range of values for a domain can be. A specification could (and probably should) provide a means for software to discover what the current domains and ranges are. Examples of the product information would include the data currently encoded in the components of the current cpe specification (part, vendor, product, version, update, edition, and language), but would be expanded to include other areas of interest. Additional examples of product information would be category type (product "is a") information like "is a" webserver, "is a" database, or "is an" operating system. For products which bundle or include other software (like operating systems) what are the products distributed with the primary product. Finally with the respect to versions and updates, the ability to make a statement about version/update ranges of software. The types of statements that could be made would be: This checklist applies to all webservers This vulnerability applies to FooProd versions 3.4 to 5.2 The product Bar 3.3 distributes Foo 2.2.1 This vulnerability applies to all versions of BarProd prior to 5.4.5 This vulnerability applies to any software which distributes Foo 2.24 This vulnerability applies to Foo 1.4 running on Bar 3.5 and Bar 4.2 update 4 A third use case joins the first and second use cases. Assume a set of entities have been tagged with product information and then given an unambiguous product name to determine what entities from the tagged set are applicable to the given product. A fourth use case would be something like product discovery. Some pieces of information about a software have been determined but not enough to make a conclusive identification. This information is processed against the current set of product information and a set of matching products are identified perhaps with the goal to conduct further tests to make a positive identification. I can also see the third and fourth use cases combined to return all of the possible entities which apply given some set of product information. These are just the use cases I am currently dealing with and there are others I have seen mentioned on this list that I have not touched on. To respond to Drew's comment about the CPE Language, the NVD is using it along with the matching algorithm to associate CPEs to CVEs as a means to determine applicability in order to satisfy the second and third use cases I identified above. An issue we have experienced, though, is that the names of the products must be normalized correctly in order for the matching algorithm to work. I believe decoupling the matching algorithm from the name of the product would alleviate the pain of coming up with a normalized name and allow for a wider range of matching possibilities to better satisfy the second use case. As for a solution to all of my use cases I tend to believe at the moment that a CPE with a numerical ID with matching handled through an ontological solution (RDFS/OWL) would be the best way forward, but I say that with only my use cases in mind. At NIST we have been exploring creating an onotology for CPE data that would allow us to better support the use cases described above. Our current modeling efforts have been focused on RDFS because we have not yet found the need for the additional power (and overhead) of OWL. Based on our current efforts we feel that the first use case can be trivially satisifed without the use of any technological solution by the use of a plain identifier (i.e. CPE-0001). An ontological model could just use this identifier as part of a URI uniquely identifying the product. The URI in the ontological model would not be the CPE name. The compositional nature of RDF/RDFS/OWL provides the capabilities to satisfy the second use case of associating product information with other entities. These entities can be modeled outside the scope of CPE, but a CPE onotoloy would describe how to make these associations. This would allow us to focus all of our attention on our problem domain of capturing all information related to a unique product while allowing others to focus on their domains, such as vulnerabilities and checklists. As TK has mentioned in a previous message, the open world assumption of RDF/RDFS allows for the expansion of the model in a way that does not change the underlying schema of the data, since all data is represented in RDF triples. The three main benefits we see with using an ontological model are: 1. The inferencing capabilities which would allow us to correctly return that Bar 3.4 is affected by Vulnerability CVE-2010-0003 given: Bar 3.4 distributes Foo 2.24 Vulnerability CVE-2010-0003 applies to any software which distributes Foo 2.24 2. The schemaless design allowing for the addition of new relationships over time without necessitating database schema changes. 3. Ability to query multiple data sets across a federated network since all data would be represented in the RDF triple format. These are our current thoughts regarding CPE, we would appreciate any comments or suggestions. Thanks, -Harold |
||||||||||||||||
|
Tim Keanini
|
In reply to this post
by Andrew Buttner
Some javascript/style in this post has been disabled (why?)
>>>>>>>>>>>>>>>>> >>In general, I consider the ontology discussion a
human planning stage to determine exactly what we want >>to represent in XML, JSON, CSV, URI, or whatever
transport we want to use. UML Class diagrams or E-R >>diagrams are great ways of representing
ontological relationships that humans can understand, debate, >>and agree on, then easily implementable in XML
schema or Relational Databases. >>OWL and RDF are intended (in my understanding-and
consistent with your >>explanation) to support machine understanding of
data relationships in an attempt to allow machines to >>"reason" and make
"inferences" with the data. >>I'm under the impression that most of the vendors
and consumers in the CPE market space aren't planning >>to do anything with the ontological discussions
we're having other than to ensure the data structures >>they/we build are able to represent the data
properly (i.e. by developing appropriate XML schemas or >>database table structures). That's the
intent of my internal developers. >>>>>>>>>>>>>>>>> Unfortunately the terms ontology is “semantically”
overloaded and this clarification is helpful. I understand now your ‘human
planning stage’ perspective so thank you. Our design principle at this standards level is to ensure
as late of a binding as possible. The attractiveness to a core
representation in RDF/RDFS/OWL is that source-material allow for “as late”
a transformation (late-binding) to a form that is appropriate for
consumption/production by humans and/or machines. To your impressions with vendors and consumers in this
CPE market space: the unfortunate situation is that customers care about multi-vendor
interoperability much more than vendors yet they have no means to articulate
this in a technical manner; vendors are biased toward lock-in and technical
differentiation and for the most part have not seen multi-vendor
interoperability as a priority for which to commit resources. Sure, you
can say that customers can make their demands with their wallets but remember,
vendors are driven by market dynamics and not individual opportunities –
at the very least, it is a slow and painful process. This is a long way
of saying that waiting for vendors to lead this process is going to be painful
and translating consumers needs to RDF/RDFS/Owl is nowhere near obvious. >>>>>>>>>>>>>>>>> >>However, if there is a large demand in the CPE
community to express CPE ontological "knowledge" in RDF >and OWL
so it is machine-consumable, then by all means let's go there. >>Of course, not being a machine, I'll still want
to see it in UML class diagrams or E-R diagrams. >>>>>>>>>>>>>>>>> The awkwardness of this discussion is in the fact that at
the surface, CPE is all about producing a name or a unique identifier for the
thing named. Just give me the damn name and let me be on my way. J Through all the fits and start of CPE, I think we can all
agree that this is necessary but not sufficient. Not only do we need globally unique identifiers for the
things being named, but we also need the same formalism for the relationships
between the things being named. The ‘R’ in RDF is for
resource before we go about inventing another way to describe resources, we
must consider RDF (and thus RDFS and OWL). Please note that this discussion is about the content capabilities
of the formalism we choose (XML, RDF, RDFS and OWL). Even after we choose
this, there is still hard work to be done. (ie. we can still screw up) I’m asking that we move up the semantic stack
slowly and carefully so that we can have more firepower to deal with problems
we face today and ones that we will face in the future. The requirement is that it be in a form that facilitates
both human and computers; we cannot compromise one for the other. My
personal feeling is that if we make this investment now, we will be creating
much more capabilities-cases for the future. But to do this, we need to
lead. >>>>>>>>>>>>>>>>> >>The request I'd like to make then, is: "If
you're a vendor or consumer of CPE data and you plan to, or >would like to
use machine-consumable ontological data in the form of RDF/RDFS/OWL please
share that >information with the list." >>Of course, if I don't understand the use/value of
the RDF/RDFS/OWL or the way different >vendors/consumers would use it, I
would like to have that information too. >>>>>>>>>>>>>>>>> To ask the value-prop question of a consumer of
RDF/RDFS/OWL is going to be hard to understand unless the explanation includes
the entire value-chain. I take a shot at this from a few perspectives. To the market (many consumers) in general, value can
materialize as: ·
Multi-vendor interoperability beyond that of
syntax. ·
Richer Service Oriented Descriptors To the content author, value can materialize as: ·
More precision in modeling the objects AND their
relationships ·
Less dependency on a _single_
authoritative structure (federation is build-in) To the vendor community, it can materialize as: ·
Multi-vendor interoperability means less
friction for consolidation (and lord knows we need to consolidate!) ·
More innovation with less of a barrier to entry
&& new markets I’ve said too much already so I’ll stop. Thanks for the discussion. --tk Timothy D.
Keanini Sr., CTO nCircle Network Security |
||||||||||||||||
|
Andrew Buttner
|
In reply to this post
by Harold Booth-2
>-----Original Message-----
>From: Harold Booth [mailto:[hidden email]] >Sent: Friday, March 06, 2009 10:18 AM >The first use case is the ability to unambiguously identify a product. >In the most common context this would be the ability to identify a >software product which is installed on a system. Or to use this >unambiguous identification as a means to communicate about this >product either between a human and a machine or between two machines. >To be clear, by unambiguous identification I mean that the name I >provide cannot be used to identify another product. Right on. I think this aligns with what we have heard and have termed the software inventory technical use case. In other words ... a product vendor uses CPE Names to tag data elements within their product's data model. >The second use case is the ability to associate product information with >other entities (CVE, CCE, Checklists/XCCDF, Other CPEs, etc...). I agree that this is a valid and needed use case within the community, but is this outside the scope of CPE? Should this be left up to some other initiative then enables a user to leverage enumerated entities and build a complex description of something? Should CPE focus on the task of enumerating the different software products and avoid the task of associating those entities with other enumerations? For example, the statement "Windows XP has vulnerability CVE-1234" is not something that CPE should maintain, right? >A third use case joins the first and second use cases. Assume a set of >entities have been tagged with product information and then given an >unambiguous product name to determine what entities from the tagged set >are applicable to the given product. This is the matching use case, right? In other words, given two CPE Names, we want to see if one represents a software product that is a subset of the other. Do others agree that this is something that CPE want to continue to support? Maybe the better question is how should CPE support this? Our discussion on ontologies is a possible direction for CPE in this area. >A fourth use case would be something like product discovery. Some >pieces of information about a software have been determined but not >enough to make a conclusive identification. This information is >processed against the current set of product information and a set of >matching products are identified perhaps with the goal to conduct >further tests to make a positive identification. Again, our ontology discussion could help here. I think from the rest of your email that you agree with this. >As for a solution to all of my use cases I tend to believe at the moment >that a CPE with a numerical ID with matching handled through an >ontological solution (RDFS/OWL) would be the best way forward, but I say >that with only my use cases in mind. I'm interested to hear counters to this idea. Especially the move away from the URI and toward a numerical id. Does pushing the information needed for matching into RDF make things easier or harder for users of CPE? I will try to explore this concept some more and see if it can help solve some of the current unresolved issues with CPE. Thanks Drew |
||||||||||||||||
|
Ernest Park-2
|
On Mon, Mar 9, 2009 at 8:20 AM, Buttner, Drew <[hidden email]> wrote:
I believe that this cannot happen in a name. A single product is an amalgam of other products. With the prevalence of open source, this is increasing rapidly. There are solutions that will identify and inventory what is discovered within an otherwise unknown set of software.
The trick is to have output formatted in a way that is useful to other applications. CPE output would allow additional reporting and policy creation with access only to the data output in CPE format.
What??? CPE is one of the main data sources for SCAP and it is the one unifying element around which the other data elements tie together. Without reliable names associated to additional information - OVAL, CCE, CVE, XCCDF, etc, then SCAP is just a wish list.
|
||||||||||||||||
|
Andrew Buttner
|
In reply to this post
by Tim Keanini
Response to the following below ...
>The awkwardness of this discussion is in the fact that at the surface, >CPE is all about producing a name or a unique identifier for the thing >named. Just give me the damn name and let me be on my way. J > >Through all the fits and start of CPE, I think we can all agree that >this is necessary but not sufficient. > >Not only do we need globally unique identifiers for the things being >named, but we also need the same formalism for the relationships between >the things being named. The 'R' in RDF is for resource before we go >about inventing another way to describe resources, we must consider RDF >(and thus RDFS and OWL). > >Please note that this discussion is about the content capabilities of >the formalism we choose (XML, RDF, RDFS and OWL). Even after we choose >this, there is still hard work to be done. (ie. we can still screw up) > >I'm asking that we move up the semantic stack slowly and carefully so >that we can have more firepower to deal with problems we face today and >ones that we will face in the future. > >The requirement is that it be in a form that facilitates both human and >computers; we cannot compromise one for the other. My personal feeling >is that if we make this investment now, we will be creating much more >capabilities-cases for the future. But to do this, we need to lead. I think the above very well sums up the current state of CPE. Basically we have realized that even considering one shared use case that we need more than just a common name, and we need more than what the current matching algorithm can provide. We have on the table the from following idea to consider: - create vocabularies (numerical based) to enumerate the 'things' - use RDF Schema to describe the types of info associated with 'things' - use RDF instance files to represent actual info about the 'things' - create different views (UML, etc) into the 'things' to support users We need to hear from others in the community if they think this move away from a URI might have detrimental effect on their use of CPE and why. Or might this move only be a positive impact on all of our work? In the meantime, I will try to put together a more thorough example of the RDF idea so that everyone can take a closer look at how it might be used. Thanks Drew |
||||||||||||||||
|
Andrew Buttner
|
In reply to this post
by Ernest Park-2
>>>The second use case is the ability to associate product information with
>>>other entities (CVE, CCE, Checklists/XCCDF, Other CPEs, etc...). >>I agree that this is a valid and needed use case within the community, >>but is this outside the scope of CPE? Should this be left up to some >>other initiative then enables a user to leverage enumerated entities >>and build a complex description of something? Should CPE focus on the >>task of enumerating the different software products and avoid the task >>of associating those entities with other enumerations? For example, >>the statement "Windows XP has vulnerability CVE-1234" is not something >>that CPE should maintain, right? >What??? CPE is one of the main data sources for SCAP and it is the one >unifying element around which the other data elements tie together. >Without reliable names associated to additional information - OVAL, >CCE, CVE, XCCDF, etc, then SCAP is just a wish list. Sorry, what I meant as that CPE shouldn't try to create all the different associations. CPE SHOULD make the associations possible by correctly enumerating the platform (or software product) space. CPE needs to focus on the enumeration and let another initiative stand up the cross associations between all the different standards. Agree? Thanks Drew |
||||||||||||||||
|
Tim Keanini
|
In reply to this post
by Andrew Buttner
[Harold Booth]
>>The second use case is the ability to associate product information with >>other entities (CVE, CCE, Checklists/XCCDF, Other CPEs, etc...). [Drew wrote this in response to Harold Booth's post:] >I agree that this is a valid and needed use case within the community, >but is this outside the scope of CPE? Should this be left up to some >other initiative then enables a user to leverage enumerated entities >and build a complex description of something? Should CPE focus on the >task of enumerating the different software products and avoid the task >of associating those entities with other enumerations? For example, >the statement "Windows XP has vulnerability CVE-1234" is not something >that CPE should maintain, right? This content interoperability goal must be not only valid for technology external to SCAP but also within SCAP. We cannot afford to be building more and more silos without the means to put humpty dumpty back together again. This is why choosing the presentation layer is so important. For instance, the choice of XML Schema allowed us to not have to be concerned with serialization and data-types; CPE did not solve this, XML Schema did. We are now at the point where our needs have gone beyond that which XML Schema can provide and instead of inventing this on our own, RDF is our strongest candidate for the job. (and one of the ways to encapsulate RDF is XML) While the statement "Windows XP has vulnerability CVE-1234" is not appropriate within the domain ontology of CPE, these RDF statements are valid: cpe:Windows_XP rdf:type rdfs:Class . cpe:Operating_System rdf:type rdfs:Class . cpe:Windows_XP rdfs:subClassOf cpe:Operating_System . This might not be how we would choose to model it but this is an example of 'type propagation' because now if we asserted the RDF triple: cpe:WinXP-SP2 rdf:type cpe:Windows_XP . we could then infer the RDF triple: cpe:WinXP-SP2 rdf:type cpe:Operating_System . Now in the CVE domain ontology, you can make statements that tie vulnerabilities to applications or shared libraries or the direct object of that weakness. CWE's domain ontology can then establish relationships with CVE's and so on and so on. I'm not trying to shove this W3C technology down anyones throat. Given that the SCAP community had no problem choosing W3C's XML Schema, it would only make sense that we co-evolve with them. To this degree, the subject line should really read: SCAP Future Vision. :-) --tk |
||||||||||||||||
|
Harold Booth-2
|
In reply to this post
by Andrew Buttner
Responses are inline.
> -----Original Message----- > From: Buttner, Drew [mailto:[hidden email]] > Sent: Monday, March 09, 2009 8:21 AM > To: [hidden email] > Subject: Re: [CPE-DISCUSSION-LIST] CPE Future Vision > > >-----Original Message----- > >From: Harold Booth [mailto:[hidden email]] > >Sent: Friday, March 06, 2009 10:18 AM HB) The first use case is the ability to unambiguously identify a product. HB) In the most common context this would be the ability to identify a HB) software product which is installed on a system. Or to use this HB) unambiguous identification as a means to communicate about this HB) product either between a human and a machine or between two machines. HB) To be clear, by unambiguous identification I mean that the name I HB) provide cannot be used to identify another product. DREW)Right on. I think this aligns with what we have heard and DREW)have termed the software inventory technical use case. In DREW)other words ... a product vendor uses CPE Names to tag data DREW)elements within their product's data model. What do you mean by "tag data elements within their product's data model?" I am not sure what you are saying here, could you please explain what you mean further? HB)The second use case is the ability to associate product information HB)with other entities (CVE, CCE, Checklists/XCCDF, Other CPEs, etc...). DREW)I agree that this is a valid and needed use case within the DREW)community, but is this outside the scope of CPE? Should this DREW)be left up to some other initiative then enables a user to DREW)leverage enumerated entities and build a complex description DREW)of something? Should CPE focus on the task of enumerating DREW)the different software products and avoid the task of DREW)associating those entities with other enumerations? For DREW)example, the statement "Windows XP has vulnerability DREW)CVE-1234" is not something that CPE should maintain, right? I would agree that CPE should not maintain information about what entities relate to what CPEs. What I do expect from CPE, or a related specification, is an agreed upon way to describe a relationship between a CPE and another entity. It would be advantageous for the description of this relationship to be expressed using a common model that can be shared across all entity types. This is one half of the matching use case. Information is associated with an entity and that information is later used to retrieve the entity based upon the matching scheme. A standardized means of representing this data would facilitate sharing of these relationships amongst the community without resorting to one-off or proprietary solutions. HB)A third use case joins the first and second use cases. Assume a set of HB)entities have been tagged with product information and then given an HB)unambiguous product name to determine what entities from the tagged set HB)are applicable to the given product. DREW)This is the matching use case, right? In other words, given DREW)two CPE Names, we want to see if one represents a software DREW)product that is a subset of the other. Do others agree that DREW)this is something that CPE want to continue to support? DREW)Maybe the better question is how should CPE support this? DREW)Our discussion on ontologies is a possible direction for CPE DREW)in this area. Your scope for the matching use case is too narrow here. I am not just interested in whether a product is a subset of another. I am interested in querying the associations between CPE data and other entity types. The entities could also include other CPEs. So what would this look like? Assume there is a repository of CPE data that contains meta-data about CPE names. Assume also that I have a collection of other entity types, in this case, CVE, CCE, and XCCDF which have defined relationships to CPE data. Here are the questions I need to answer: Given a CPE Name what Vulnerabilities, Configurations or Checklists apply to this CPE name? Given some product meta-data what CPEs match? Given some product meta-data what Vulnerabilities, Configurations or Checklists apply? Given a CPE name what other CPE names distribute this product? Given a CPE name what other CPE names does this product distribute? And on and on... The only bounds for a query is based upon what data is available. To reiterate from my previous post, product meta-data could include the basic information of vendor, product, version, update, edition, and language, but could also include product md5/sha hashes, categorizations, skus, target platform, etc... Basically anything that describes a product in some way. |
||||||||||||||||
|
Harold Booth-2
|
In reply to this post
by Tim Keanini
To summarize my comments in this thread I would like the following from CPE
or one or more other related specifications: 1. An unambiguous name to communicate about a product. 2. A standardized way to associate product meta-data information to the unambiguous name from 1. 3. A standardized way to associate meta-data information or unambiguous names with entities (i.e. CVE, CCE, Checklists/XCCDF, CPEs, etc...). 4. A standardized way to query against the product meta-data specified in 2 and 3. The current CPE standard attempts (either explicitly or implicitly) to address all of these concerns in one way or the other. The points above are addressed in the current CPE specification respectively: 1. The existence of a CPE name, unfortunately it is not always unambiguous. 2. The CPE name encodes some product information into to it, but it is not easily extended. Associating additional information to the product is possible, but there is no standardized way to share it. 3. Done through the use of a CPE Name or CPE Language. No standardized way to use any additional information not already included in a CPE name. 4. This use case is handled on an ad-hoc basis through CPE matching and the CPE language, but no standardized way to perform this exists. Currently queries are limited in scope to only information contained within the name. |
||||||||||||||||
|
Tim Keanini
|
I would also add that the current URI scheme is problematic when used in
RDF. I can go into more details but the summary is that the %-encoding introduces a dangerous level of ambiguity for RDF libraries; and what was meant to be human readable string becomes very distorted. Example: cpe:/a:apache:http_server:1.3.30 becomes <cpe:/a%3Aapache%3Ahttp_server%3A1.3.30> --tk -----Original Message----- From: Harold Booth [mailto:[hidden email]] Sent: Monday, March 09, 2009 1:25 PM To: [hidden email] Subject: Re: [CPE-DISCUSSION-LIST] CPE Future Vision To summarize my comments in this thread I would like the following from CPE or one or more other related specifications: 1. An unambiguous name to communicate about a product. 2. A standardized way to associate product meta-data information to the unambiguous name from 1. 3. A standardized way to associate meta-data information or unambiguous names with entities (i.e. CVE, CCE, Checklists/XCCDF, CPEs, etc...). 4. A standardized way to query against the product meta-data specified in 2 and 3. The current CPE standard attempts (either explicitly or implicitly) to address all of these concerns in one way or the other. The points above are addressed in the current CPE specification respectively: 1. The existence of a CPE name, unfortunately it is not always unambiguous. 2. The CPE name encodes some product information into to it, but it is not easily extended. Associating additional information to the product is possible, but there is no standardized way to share it. 3. Done through the use of a CPE Name or CPE Language. No standardized way to use any additional information not already included in a CPE name. 4. This use case is handled on an ad-hoc basis through CPE matching and the CPE language, but no standardized way to perform this exists. Currently queries are limited in scope to only information contained within the name. |
||||||||||||||||
| Free Embeddable Forum Powered by Nabble | Help |