|
|
| 1 2 |
|
Valeri, David [USA]
|
While I know that the use of Microsoft Windows as an example may drag some
baggage into this discussion, please focus on the questions/issues I am raising instead of the use of Microsoft Windows as an example. I chose it purely because it is a product suite that I know well and that exhibits the behavior in question within the dictionary. In the current dictionary, a cpe-item is defined for the following Windows XP variants as well as for SP2 variants. cpe:/o:microsoft:windows_xp cpe:/o:microsoft:windows_xp::gold cpe:/o:microsoft:windows_xp::gold:embedded cpe:/o:microsoft:windows_xp::gold:media_center cpe:/o:microsoft:windows_xp::gold:professional cpe:/o:microsoft:windows_xp::gold:tablet_pc cpe:/o:microsoft:windows_xp::sp1:embedded cpe:/o:microsoft:windows_xp::sp1:media_center cpe:/o:microsoft:windows_xp::sp1:professional cpe:/o:microsoft:windows_xp::sp1:tablet_pc It seems to me that cpe:/o:microsoft:windows_xp and cpe:/o:microsoft:windows_xp::gold are not tangible products that I can install onto an asset. The other variants can be installed onto an asset. For this discussion I will refer to the first two CPE Names in the above list as abstract and the later CPE Names as concrete. I am under the impression that the abstract CPE Names are used to represent hierarchical metadata such as OVAL definitions that apply to the other CPE Names below it (I am envisioning a tree with the root node of cpe:/o:microsoft:windows_xp, an internal node of cpe:/o:microsoft:windows_xp::gold and leaf nodes of the concrete CPE Names in the list above). Now to the questions/issues: 1) Is my hierarchical interpretation correct? The specification mentions hierarchical on page 19, but doesn't discuss hierarchy in the dictionary. ("Matching helps define the relationship between different CPE Names (or language statements) and follows the hierarchical relationship built into the naming format.") 1.a) If 1 is correct, is there a requirement that a check associated to cpe:/o:microsoft:windows_xp be applicable to the concrete CPE Names that build upon it? 1.b) Are these checks required to be future-proof? That is, if a new edition, version, language, etc. of cpe:/o:microsoft:windows_xp is released, are the checks associated to cpe:/o:microsoft:windows_xp required to detect this new variant? If not, are the checks updated or is the metadata in the dictionary updated to remove this out-of-date reference to the check? 1.c) Are there other use cases that require abstract CPE names to be in the dictionary? 2) BAH supports clients that are leveraging the CPE specification in order to represent configuration information about computing assets. In this use case, only concrete CPE Names are of value as I cannot have an asset with cpe:/o:microsoft:windows_xp::gold installed. I can only have an asset with a tangible product installed and these are represented by concrete CPE Names only. How can I parse the CPE dictionary to only extract concrete names? I think I can do it by parsing the list of cpe-items (with great difficulty). I think I can do it by parsing NIST's metadata (with less difficulty), but, I think that I only want the leaf nodes in the hierarchy. 2.a) If I do it by parsing for leaf nodes, is there ever the case where an internal node represents a concrete CPE name? For instance, assume that Microsoft released an operating system with a single edition only, we will call it cpe:/o:microsoft:windows_example::gold (again, ignore the use of a Microsoft OS here and focus on the potential issue). Now say shortly thereafter a large group of countries decides that Microsoft needs to decouple some parts of the OS so Microsoft releases cpe:/o:microsoft:windows_example::gold:eu to comply with the ruling (don't focus on cpe:/o:microsoft:windows_example::gold:eu being the right CPE Name to represent this condition, focus on the fact that a new and unanticipated variant has been released). At this point, I see a potential problem, cpe:/o:microsoft:windows_example::gold was originally the only variant of the OS, but now cpe:/o:microsoft:windows_example::gold:eu could exist as well. At this point, cpe:/o:microsoft:windows_example::gold would need to be deprecated, as a CPE logical expression matching cpe:/o:microsoft:windows_example::gold would also match cpe:/o:microsoft:windows_example::gold:eu. cpe:/o:microsoft:windows_example::gold can no longer be uniquely identified because of this new variant. Has a situation such as this one occurred in the dictionary and is a process in place to avoid such a situation? While I don't see this type of issue arising often with well-known or mature pieces of software, I do see it occurring with the smaller and more rapidly changing items in the dictionary. 3) Is Windows XP Home edition missing from the dictionary? I just want to make sure that cpe:/o:microsoft:windows_xp::gold is representative of an abstract CPE Name and not of Windows XP Home. If it does represent Windows XP Home, then issue 2.a has already occurred. David Valeri Booz Allen Hamilton 8281 Greensboro Dr. McLean, VA 22102 Tel: 703.377.5607 Fax: 703.902.3330 |
||||||||||||||||
|
Andrew Buttner
|
>cpe:/o:microsoft:windows_xp
I will try to clear up this confusion as best I can. Please let me know if
>cpe:/o:microsoft:windows_xp::gold >cpe:/o:microsoft:windows_xp::gold:embedded >cpe:/o:microsoft:windows_xp::gold:media_center >cpe:/o:microsoft:windows_xp::gold:professional >cpe:/o:microsoft:windows_xp::gold:tablet_pc >cpe:/o:microsoft:windows_xp::sp1:embedded >cpe:/o:microsoft:windows_xp::sp1:media_center >cpe:/o:microsoft:windows_xp::sp1:professional >cpe:/o:microsoft:windows_xp::sp1:tablet_pc > >It seems to me that cpe:/o:microsoft:windows_xp and >cpe:/o:microsoft:windows_xp::gold are not tangible products that I can >install onto an asset. The other variants can be installed onto an >asset. I fail miserably :) None of the names above represent a tangible product. In fact a CPE is not meant to accomplish this. Rather, a CPE Name represents a "platform type". For the first name in your example, this platform type would be - any system that has Windows XP installed. For the last example this would be - any system that has Windows XP SP1 tablet edition installed. Note that even this last one doesn't identify a tangible product as there are different language release, etc. In summary, a CPE Name identifies a Platform Type, not a specific platform. >1) Is my hierarchical interpretation correct? The specification >mentions >hierarchical on page 19, but doesn't discuss hierarchy in the >dictionary. >("Matching helps define the relationship between different CPE Names (or >language statements) and follows the hierarchical relationship built >into >the naming format.") > >1.a) If 1 is correct, is there a requirement that a check associated to >cpe:/o:microsoft:windows_xp be applicable to the concrete CPE Names that >build upon it? determine if a given system can be grouped under a specific CPE Name. In this case, a system that returns true for the cpe:/o:microsoft:windows_xp check does not necessarily return true for the cpe:/o:microsoft:windows_xp::sp1 check. But a system that returns true for cpe:/o:microsoft:windows_xp::sp1 would return true for cpe:/o:microsoft:windows_xp. >1.b) Are these checks required to be future-proof? That is, if a new >edition, version, language, etc. of cpe:/o:microsoft:windows_xp is >released, are the checks associated to cpe:/o:microsoft:windows_xp >required to detect this new variant? Ideally yes, although in reality a check may have to be updated. Basically, we are looking for a check that answers the question: "is this system part of the windows xp platform type?" This is not always able to be done in a future-proof way. >If not, are the checks updated or is the metadata in the >dictionary updated to remove this out-of-date reference to the check? The check should be updated >2) BAH supports clients that are leveraging the CPE specification in >order to represent configuration information about computing assets. >In this use case, only concrete CPE Names are of value as I cannot >have an asset with cpe:/o:microsoft:windows_xp::gold installed. I >can only have an asset with a tangible product installed and these >are represented by concrete CPE Names only. How can I parse the CPE >dictionary to only extract concrete names? I think I can do it by >parsing the list of cpe-items (with great difficulty). I think I can >do it by parsing NIST's metadata (with less difficulty), but, I >think that I only want the leaf nodes in the hierarchy. Names that use certain components. A simple regular expression or Xpath statement might be able to do the trick here. >2.a) If I do it by parsing for leaf nodes, is there ever the case where >an internal node represents a concrete CPE name? For instance, assume >that Microsoft released an operating system with a single edition only, >we will call it cpe:/o:microsoft:windows_example::gold (again, ignore >the use of a Microsoft OS here and focus on the potential issue). Now >say shortly thereafter a large group of countries decides that >Microsoft needs to decouple some parts of the OS so Microsoft releases >cpe:/o:microsoft:windows_example::gold:eu to comply with the ruling >(don't focus on cpe:/o:microsoft:windows_example::gold:eu being the >right CPE Name to represent this condition, focus on the fact that a >new and unanticipated variant has been released). At this point, I see >a potential problem, cpe:/o:microsoft:windows_example::gold was >originally the only variant of the OS, but now >cpe:/o:microsoft:windows_example::gold:eu could exist as well. At this >point, cpe:/o:microsoft:windows_example::gold would need to >be deprecated, as a CPE logical expression matching >cpe:/o:microsoft:windows_example::gold would also match >cpe:/o:microsoft:windows_example::gold:eu. >cpe:/o:microsoft:windows_example::gold can no longer be uniquely >identified because of this new variant. Has a situation such as this >one occurred in the dictionary and is a process in place to avoid such >a situation? While I don't see this type of issue arising often with >well-known or mature pieces of software, I do see it occurring with the >smaller and more rapidly changing items in the dictionary. concrete name, but rather we need to look at them as representing platform types. In your example, after the decoupling, cpe:/o:microsoft:windows_example::gold would represent the platform type "any platform with Windows Example Gold installed". This would include any specific edition including 'eu'. The name cpe:/o:microsoft:windows_example::gold:eu would refer to the platform type "any platform with Windows Example Gold EU edition installed". No change would be needed for the existing CPE Names to work with the new product structure. >3) Is Windows XP Home edition missing from the dictionary? I just want >to make sure that cpe:/o:microsoft:windows_xp::gold is representative of an >abstract CPE Name and not of Windows XP Home. If it does represent >Windows XP Home, then issue 2.a has already occurred. My guess is it is missing. cpe:/o:microsoft:windows_xp::gold represents any Windows XP Gold platform, including Home, Professional, Media Center, etc. Again, I hope this helped. Please let us know if there are further questions. Thanks Drew |
||||||||||||||||
|
Harold Booth-2
|
Drew,
I am hoping to get a clarification on the following statements: Quoting "Buttner, Drew" <[hidden email]>: > None of the names above represent a tangible product. In fact a CPE is not > meant to accomplish this. Rather, a CPE Name represents a "platform type". > For the first name in your example, this platform type would be - any system > that has Windows XP installed. For the last example this would be - any > system that has Windows XP SP1 tablet edition installed. Note that even > this last one doesn't identify a tangible product as there are different > language release, etc. > > In summary, a CPE Name identifies a Platform Type, not a specific platform. > In CPE how would you distinguish between a product reference and a platform type? A product reference would be a leaf node in the CPE tree hierarchy. How would you expect an asset database to use CPE where it needs to refer to specific products? Or how would two asset databases communicate with each other using CPE when they need to refer to specific products? (Which I believe to be a paraphrase of a stated use case.) |
||||||||||||||||
|
Andrew Buttner
|
>In CPE how would you distinguish between a product reference and a
>platform type? CPE doesn't try to be a specific product reference. When creating the list of components, we did not try to create a full list of stuff needed to uniquely id specific products. Rather we tried to establish a list of components that are relatively common across different platforms and would help us create unique identifiers for the level of specificity desired. In this case we decided to go down to the language level. >A product reference would be a leaf node in the CPE >tree hierarchy. How would you expect an asset database to use CPE >where it needs to refer to specific products? My guess is that when an asset database needs to refer to a product, that it means product type the way CPE thinks about it. In other words, it needs to know how many systems have some version of Windows XP, or how many systems have Windows XP SP1, or how many systems have Windows XP SP1 Professional English. All of these are platform types. >Or how would two >asset databases communicate with each other using CPE when they >need to refer to specific products? (Which I believe to be a >paraphrase of a stated use case.) I think we are using "product" and "product type" the same here. The thing to make sure we aren't confusing is the term "system identification" and "product type". Thanks Drew |
||||||||||||||||
|
Valeri, David [USA]
|
Drew,
In your last response you said: "My guess is that when an asset database needs to refer to a product, that it means product type the way CPE thinks about it. In other words, it needs to know how many systems have some version of Windows XP, or how many systems have Windows XP SP1, or how many systems have Windows XP SP1 Professional English. All of these are platform types." These are all valid use cases for an analyst looking into the repository. Our analysts and systems very much wish to look into the repository and find assets that have specific traits such as software, OS, and hardware configurations. The CPE Logical Expression gives these analysts and systems the language that they need to construct these queries. However, the maintainers of the asset repositories are looking at CPE from the other side of the system. They want to assign software, operating systems, and hardware to an asset. That is, assemble a collection of CPE names that represent, exactly, what comprises each asset in their repository. These maintainers include government employees and contractors hand-jamming information into the repository and commercial vendors supporting automated scanning tools that report system configurations to the asset repository. These stakeholders, in this scenario, are looking for the authoritative list of software, hardware, and operating systems that could actually be installed on their assets, not for a broad family of products. For example, knowing that Windows XP is installed on an asset does not tell an analyst if the asset is susceptible to a vulnerability that affect Windows XP SP2 Professional only. Similarly, the presence of the issue that I described in 2.a of my original email also precludes the realization of these stakeholders' use cases. The use cases in sections 2.1 and 2.3 (in conjunction with the Logical Expression) of the specification describe part of the use case our clients are trying to realize; however, it seems unlikely that their use case can be realized to the desired level of accuracy without unique identifiers for the software, hardware, and operating systems that could actually be installed on their assets. In the end, that brings me back to my original need: a definitive list of unique IDs for software, hardware, and operating systems that can actually be installed on or comprise the contents of an asset. David Valeri Booz Allen Hamilton 8281 Greensboro Dr. McLean, VA 22102 Tel: 703.377.5607 Fax: 703.902.3330 -----Original Message----- From: Buttner, Drew [mailto:[hidden email]] Sent: Monday, June 09, 2008 1:03 PM To: [hidden email] Subject: Re: [CPE-DISCUSSION-LIST] Abstract and Concrete CPE Names in the Dictionary >In CPE how would you distinguish between a product reference and a >platform type? CPE doesn't try to be a specific product reference. When creating the list of components, we did not try to create a full list of stuff needed to uniquely id specific products. Rather we tried to establish a list of components that are relatively common across different platforms and would help us create unique identifiers for the level of specificity desired. In this case we decided to go down to the language level. >A product reference would be a leaf node in the CPE tree hierarchy. >How would you expect an asset database to use CPE where it needs to >refer to specific products? My guess is that when an asset database needs to refer to a product, that it means product type the way CPE thinks about it. In other words, it needs to know how many systems have some version of Windows XP, or how many systems have Windows XP SP1, or how many systems have Windows XP SP1 Professional English. All of these are platform types. >Or how would two >asset databases communicate with each other using CPE when they need to >refer to specific products? (Which I believe to be a paraphrase of a >stated use case.) I think we are using "product" and "product type" the same here. The thing to make sure we aren't confusing is the term "system identification" and "product type". Thanks Drew |
||||||||||||||||
|
Andrew Buttner
|
>However, the maintainers of the asset repositories are looking at
>CPE from the other side of the system. They want to assign software, >operating systems, and hardware to an asset. That is, assemble a >collection of CPE names that represent, exactly, what comprises each >asset in their repository. CPE should work great for this. Each CPE Name that assigned to an asset identifies a platform type that the asset belongs to. For example, if they want to tag an asset as having an OS related to Windows XP then they can use the CPE Name cpe:/o:microsoft:windows_xp. If they want to express that an asset has an OS related to Windows XP SP1 Embedded Edition then they can use the CPE Name cpe:/o:microsoft:windows_xp::sp1:embedded. If they are looking for a way to express exact system details, then they should look at a language like OVAL System Characteristic. CPE is not designed to express these details. >These stakeholders, in this scenario, are looking for the authoritative >list of software, hardware, and operating systems that could actually >be installed on their assets, not for a broad family of products. CPE fits into this by providing an identifier for the platform types once the list has been created. It does not try to encode the information necessary to answer the question about whether an application or OS can be installed on a system. For that you should look at a language like OVAL Definitions. >In the end, that brings me back to my original need: >a definitive list of unique IDs for software, hardware, and operating >systems that can actually be installed on or comprise the contents of >an asset. CPE is focused on making sure those unique IDs exists. Unfortunately, CPE does not try to determine what the list of IDs should be. To solve your need, someone must create a mapping that relates different CPE Names based on applicability and ability to install. This mapping might look like: Host OS Application --------------------------------------------------- cpe:/o:microsoft:windows_xp cpe:/a:vendor:app1 cpe:/o:microsoft:windows_xp cpe:/a:vendor:app3 cpe:/o:microsoft:windows_xp::sp1 cpe:/a:vendor:app2 cpe:/o:microsoft:windows_2003 cpe:/a:vendor:app3 Application Host OS --------------------------------------------------- cpe:/a:vendor:app1 cpe:/o:microsoft:windows_xp cpe:/a:vendor:app1 cpe:/o:microsoft:windows_2003 Does this help answer your question? Thanks Drew |
||||||||||||||||
|
Valeri, David [USA]
|
Drew,
Thanks for the reply. I think the language I chose may have misrepresented my point. I'll try to clarify below. The stakeholders in the scenarios I gave do not want a list of which software can be installed on which operating systems and which operating systems can be installed on which types of hardware, nor are they immediately interested in the level of detail described in the OVAL System Characteristics schema. Currently, they are interested in the exact details of installed software, operating systems, and to a lesser degree the hardware type that the software and operating systems are installed on. In your reply, you stated: "For example, if they want to tag an asset as having an OS related to Windows XP then they can use the CPE Name cpe:/o:microsoft:windows_xp. If they want to express that an asset has an OS related to Windows XP SP1 Embedded Edition then they can use the CPE Name cpe:/o:microsoft:windows_xp::sp1:embedded" The example you provide is exactly what the stakeholders wish to do; however, they do not want to express a "related to" relationship. They wish to express a definitive "has this" or "is this" relationship to the finest degree possible in CPE. To the language level is preferred; however, I believe to the edition level is sufficient for most use cases. I am doubtful that there are a large number of bugs related to language packs, but I may be wrong. To represent a "has this" or "is this" relationship to the desired level of specificity, a CPE assigned to an asset must be unique, of the finest granularity possible, and must not be susceptible to the scenario I laid out in question 2.a of my original email. Currently, my stakeholders have a need to extract the entries from the dictionary that represent products or hardware that an asset can have or be. There is no need from my stakeholders to extract CPEs that represent "relates to" or "in the family of" information. Furthermore, my stakeholders have a need for a logical language that can be used to construct queries to identify assets that have certain configurations. These queries may be very general and leverage the wildcard features of the matching algorithm. These queries may also be very specific and target a single language or edition of a product. In the later case, the matching algorithm and library need to be able to return only assets that have the exact product installed (which is where my concern for issue 2.a comes in). CPE seems to come very close to offering these capabilities; however, I'm concerned that these use cases are not supported in the current dictionary data or governance process. I hope this email clears up any confusion and can set the stage for a more productive discussion. David Valeri Booz Allen Hamilton 8281 Greensboro Dr. McLean, VA 22102 Tel: 703.377.5607 Fax: 703.902.3330 -----Original Message----- From: Buttner, Drew [mailto:[hidden email]] Sent: Tuesday, June 10, 2008 11:01 AM To: [hidden email] Subject: Re: [CPE-DISCUSSION-LIST] Abstract and Concrete CPE Names in the Dictionary >However, the maintainers of the asset repositories are looking at CPE >from the other side of the system. They want to assign software, >operating systems, and hardware to an asset. That is, assemble a >collection of CPE names that represent, exactly, what comprises each >asset in their repository. CPE should work great for this. Each CPE Name that assigned to an asset identifies a platform type that the asset belongs to. For example, if they want to tag an asset as having an OS related to Windows XP then they can use the CPE Name cpe:/o:microsoft:windows_xp. If they want to express that an asset has an OS related to Windows XP SP1 Embedded Edition then they can use the CPE Name cpe:/o:microsoft:windows_xp::sp1:embedded. If they are looking for a way to express exact system details, then they should look at a language like OVAL System Characteristic. CPE is not designed to express these details. >These stakeholders, in this scenario, are looking for the authoritative >list of software, hardware, and operating systems that could actually >be installed on their assets, not for a broad family of products. CPE fits into this by providing an identifier for the platform types once the list has been created. It does not try to encode the information necessary to answer the question about whether an application or OS can be installed on a system. For that you should look at a language like OVAL Definitions. >In the end, that brings me back to my original need: >a definitive list of unique IDs for software, hardware, and operating >systems that can actually be installed on or comprise the contents of >an asset. CPE is focused on making sure those unique IDs exists. Unfortunately, CPE does not try to determine what the list of IDs should be. To solve your need, someone must create a mapping that relates different CPE Names based on applicability and ability to install. This mapping might look like: Host OS Application --------------------------------------------------- cpe:/o:microsoft:windows_xp cpe:/a:vendor:app1 cpe:/o:microsoft:windows_xp cpe:/a:vendor:app3 cpe:/o:microsoft:windows_xp::sp1 cpe:/a:vendor:app2 cpe:/o:microsoft:windows_2003 cpe:/a:vendor:app3 Application Host OS --------------------------------------------------- cpe:/a:vendor:app1 cpe:/o:microsoft:windows_xp cpe:/a:vendor:app1 cpe:/o:microsoft:windows_2003 Does this help answer your question? Thanks Drew |
||||||||||||||||
|
Andrew Buttner
|
>They wish to express a definitive "has this" or "is this" relationship
>to the finest degree possible in CPE. To the language level is >preferred; however, I believe to the edition level is sufficient for >most use cases. This relationship is something that they need to define anyway (CPE doesn't try to define it) so it is perfectly acceptable for them to say that the CPE Name that has been assigned to the asset means "the asset is of this platform type". >To represent a "has this" or "is this" relationship to the desired level >of specificity, a CPE assigned to an asset must be unique, All CPE Names must be unique by definition. >of the finest granularity possible I would reword this to say "must be at the granularity they desire". You even stated that going down to language is not desired. >and must not be susceptible to the scenario I laid >out in question 2.a of my original email. Agreed and I think this is covered. >Currently, my stakeholders have a need to extract the entries from the >dictionary that represent products or hardware that an asset can have or >be. >There is no need from my stakeholders to extract CPEs that represent >"relates to" or "in the family of" information. I think this might be the root of the confusion. CPE does nothing to try and support this. All that CPE is trying to do is provide a list of all the known identifiers. The amount of metadata provided is very sparse. Just enough to know what it is that has been identified. I think what you need is additional metadata associated with a CPE that would enable you to search through the list and find the CPE Names that are desired. I think this is a very important use case but is one that is outside the scope of CPE. This problem is very similar to one that faces the CVE community. CVE is a list of vulnerability identifiers. Users that need more metadata rely on an external databases to get it. These external databases (NVD is an example) use the CVE identifier to tag each entry and associate metadata to it. I think what you need is for the "National Product Database" to be created. Agree? This is something that had come up at CPE Developer Days as well. >Furthermore, my >stakeholders have a need for a logical language that can be used to >construct queries to identify assets that have certain configurations. >These queries may be very general and leverage the wildcard features of >the matching algorithm. These queries may also be very specific and target >a single language or edition of a product. In the later case, the >matching algorithm and library need to be able to return only assets that have >the exact product installed (which is where my concern for issue 2.a comes >in). This seems to align with the goals of OVAL. Thanks Drew |
||||||||||||||||
|
Waltermire, David
|
Drew,
From what I have been able to determine from Dave's emails, his use case is based on the need to create database records that represent authoritative, discrete product references in the CPE name format. I would define a discrete product reference as a CPE name that refers to a SKU or electronically distributed content. CPE names used in this fashion are not generated by reference or by report, thus he is forced to look to the official CPE dictionary. Since this is a database application he is unable to use the system inventory approach using OVAL definitions to qualify what CPE names to use. Instead he is looking for another authoritative hint. Furthermore, it is not enough that all CPE names are unique; he is looking for the set of CPE names that correspond directly on a one-to-one basis with actual installed software. Said a different way he needs the ability for a tool to be able to associate a discrete product with a corresponding CPE name. This mapping must be able to occur at differing levels of abstraction relative to the CPE components. I believe for most products that we have the granularity in the CPE name to accomplish this so why not do this? These capabilities are key for asset, license and procurement management applications. These use cases are definitely within the intended scope of CPE, as references to standard product names are needed. If all he needs to support these use-cases is a flag that indicates if a CPE name refers to a discrete product or not, I think we need to find a way to support this in an official CPE capacity. If the CPE standard does not support these use-cases, we risk alienating users/vendors. The worst case scenario for this situation is the creation of a competing standard. This is not a win for CPE. We need to find a way forward to make this work. Dave -----Original Message----- From: Buttner, Drew [mailto:[hidden email]] Sent: Tuesday, June 10, 2008 1:45 PM To: [hidden email] Subject: Re: [CPE-DISCUSSION-LIST] Abstract and Concrete CPE Names in the Dictionary >They wish to express a definitive "has this" or "is this" relationship >to the finest degree possible in CPE. To the language level is >preferred; however, I believe to the edition level is sufficient for >most use cases. This relationship is something that they need to define anyway (CPE doesn't try to define it) so it is perfectly acceptable for them to say that the CPE Name that has been assigned to the asset means "the asset is of this platform type". >To represent a "has this" or "is this" relationship to the desired level >of specificity, a CPE assigned to an asset must be unique, All CPE Names must be unique by definition. >of the finest granularity possible I would reword this to say "must be at the granularity they desire". You even stated that going down to language is not desired. >and must not be susceptible to the scenario I laid >out in question 2.a of my original email. Agreed and I think this is covered. >Currently, my stakeholders have a need to extract the entries from the >dictionary that represent products or hardware that an asset can have or >be. >There is no need from my stakeholders to extract CPEs that represent >"relates to" or "in the family of" information. I think this might be the root of the confusion. CPE does nothing to try and support this. All that CPE is trying to do is provide a list of all the known identifiers. The amount of metadata provided is very sparse. Just enough to know what it is that has been identified. I think what you need is additional metadata associated with a CPE that would enable you to search through the list and find the CPE Names that are desired. I think this is a very important use case but is one that is outside the scope of CPE. This problem is very similar to one that faces the CVE community. CVE is a list of vulnerability identifiers. Users that need more metadata rely on an external databases to get it. These external databases (NVD is an example) use the CVE identifier to tag each entry and associate metadata to it. I think what you need is for the "National Product Database" to be created. Agree? This is something that had come up at CPE Developer Days as well. >Furthermore, my >stakeholders have a need for a logical language that can be used to >construct queries to identify assets that have certain configurations. >These queries may be very general and leverage the wildcard features of >the matching algorithm. These queries may also be very specific and target >a single language or edition of a product. In the later case, the >matching algorithm and library need to be able to return only assets that have >the exact product installed (which is where my concern for issue 2.a comes >in). This seems to align with the goals of OVAL. Thanks Drew |
||||||||||||||||
|
Ernest Park-2
|
Hi -
I have 4 databases, all which talk to each other indirectly through common CPE language.
I can do the following . . .
select vendor,application,part_v2,release,inventory_db.application_installed_date,master_db.platform,external_nvd.nvd_name,current
from open_source_db
inner join inventory_db on (inventory_db.application= open_source_db.application)
inner join external_nvd on (external_nvd.application=open_source_db.application)
inner join master_db on (master_db.application=open_source_db.application)
where inventory_db.application_installed_status = 'YES'
and
where inventory_db.application_currentreleasename is 'NOT NULL'
into outfile 'CPE_v2-inventory list.csv'
Excuse any typos above - I wrote a simple query to illustrate the point.
Using common CPE syntax in all my databases, I can reliably exchange data between databases, and output them with an identifier that will conform to having CPE pieces.
In my output, the URI can be parsed from the output, even in Excel, such as . . .
=CONCATENATE("cpe",A4,B4,":",C4,":",D4," =",E4,"=",F4,"=",G4)
Fields in < > refer to Excel cells.
Note that platform and install date are data fields that I supply and in this case are not constrained to anything. Platform could be "External approved WIndows - v12", and so on.
Therefore,
I can output an inventory report down to the installed release, identify when it was installed, what platform, and the latest relevant CVE name. I do a report for subscribers that includes CVEs associated to currently installed releases and sorts results by CVSS, Work load index and my risk score. I scan 5 databases and the NVD XML to output the inventory risk management report. The CPE gives me a uniform handle by which to describe things across datasources.
So - NVD gives me CVE names, CVSS, WLI, I generate an inventory report using scanning tools, I apply poicy using policy management software, and so on.
If a user of my data wants to pull OSS license information, or all known releases, or patch status, etc, they can do so with a query built around CPE constructs. One of these days, a web service may front end the query above.
Ernie
On Tue, Jun 10, 2008 at 4:13 PM, David Waltermire <[hidden email]> wrote: Drew, |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Andrew Buttner
|
In reply to this post
by Waltermire, David
>These capabilities are key for asset, license and procurement
management >applications. I completely agree, and hopefully CPE enables these types of capabilities, but is it best for CPE to try and provide full support? Or is it better for CPE work with other efforts to provide full support? >These use cases are definitely within the intended scope of >CPE, as references to standard product names are needed. If all he >needs to support these use-cases is a flag that indicates if a CPE >name refers to a discrete product or not, I think we need to find >a way to support this in an official CPE capacity. I very much want to hear from others in the community about the above statement. Is this within the scope of CPE? Should CPE work to define additional metadata related to the identifier? Or should CPE leave this metadata work to others and focus solely on building the list of identifiers? A couple of points that I would like to bring to the table for this ... * What is discrete to one user might not be discrete to another. How would CPE make this determination? Is WinXP Pro discrete (I think that is what you buy) or is WinXP Pro SP1 discrete (after you download the SP)? Actually, in reality you buy WinXP Pro English so is that discrete? This is similar to the issue of providing weights to vulnerabilities. Everyone has a different answer for what the weight should be. If the enumeration tries to set one as official, those that don't agree will be alienated. * Does expanding the scope of CPE beyond the task of providing identifiers reduce our ability to succeed? This is a lesson that has been learned in the past. CVE was one of the first to just focus on the identifier and leave the metadata problem to others. This resulted in an enumeration that has by all accounts succeeded. Other efforts have tried to do too much and have failed (recently look at AVDL). CPE has started down the CVE path and focused on the identifier, my personal suggestion is to stay on course. * This type of metadata is perfect for an external data repository built on CPE. This allows CPE to focus on the identifiers, and the repository to focus on supplying the use-case specific metadata. These repositories would not be competing with CPE, but rather leveraging CPE. CPE would allow these repositories to share information, and users to pull information from each of the repositories. I think this discussion is extremely useful for this community and I urge everyone to weigh-in with their thoughts. I personally think that the issue of where metadata lives is at the root of many of the concerns that get brought up. Should CPE focus on just the identifier? Or should it also focus on providing a repository of metadata related to the identifier? Thanks Drew |
||||||||||||||||
|
Kevin Sitto
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256 Hi, Dave's summary did an excellent job of summarizing what appears to be a core risk with the current implementation of CPE - it's very difficult to relate actual items present on an asset to distinct CPE identifiers. We are currently working with some of the same requirements David Valeri is; we have a set of customers who need to be able to express an inventory of the software/hardware present on their assets. What's more, they would also like multiple systems to be able to compare notes regarding the inventory they identify while also having some sort of agreed upon mechanism for sharing that inventory with other organizations. CPE would appear to be a natural fit for fulfilling such requirements. However, sharing that concrete inventory information using CPE is entirely dependant on standardizing the references to those concrete inventory items. If CPE merely identifies the abstract notion of "Microsoft, Windows XP, Gold" and relies of metadata existing elsewhere to fill the "gap" between that identifier and the actual application ("Microsoft, Windows XP, Professional, SP2, x64, en", we risk losing the ability to guarantee that we are all using the same syntax to identify the same software - something I had always perceived to be the core use case of CPE. Attempting to build actual CPE identifies for concrete inventory items brings along its own set of complexities. The one I've had the most difficulty working around is that software does not fit naturally within the notion of a hierarchy. Rather, an inventory item is most naturally represented as the intersection between multiple pieces of metadata. For instance, following the Windows XP Example from above, how would one build the CPE entry for "Microsoft Windows XP Professional SP2 x64 en"? Would it be "cpe:/o:microsoft:windows_xp:professional:sp2:x64:en"? Or "cpe:/o:microsoft:windows_xp:professional:sp2:en:x64"? In this case, the operating system installed is the intersection between Product ("Microsoft Windows XP Professional"), Version ("SP2"), Architecture ("x64") and Language ("en"). Any one of these could reasonably fit at any place within the hierarchy. Fortunately, as has been identified previously in the thread, getting from here to there is immediately possible without any sort of major shift in the way CPE is currently defined. Following the same hierarchical model and leveraging much of the existing content, it's just a matter of making a specific effort to add content which refers to those concrete items (ie: We as a community all agree to use "cpe:/o:microsoft:windows_xp:professional:sp2:x64:en" and put that in the dictionary). It will require some magic on the back end to support different types of aggregation (ie: "list all assets with x64 operating systems"), but that's what app developers do well. Thanks, Kevin - -----Original Message----- From: Buttner, Drew [mailto:[hidden email]] Sent: Wednesday, June 11, 2008 8:35 AM To: [hidden email] Subject: Re: [CPE-DISCUSSION-LIST] Abstract and Concrete CPE Names in the Dictionary >These capabilities are key for asset, license and procurement management >applications. I completely agree, and hopefully CPE enables these types of capabilities, but is it best for CPE to try and provide full support? Or is it better for CPE work with other efforts to provide full support? >These use cases are definitely within the intended scope of >CPE, as references to standard product names are needed. If all he >needs to support these use-cases is a flag that indicates if a CPE >name refers to a discrete product or not, I think we need to find >a way to support this in an official CPE capacity. I very much want to hear from others in the community about the above statement. Is this within the scope of CPE? Should CPE work to define additional metadata related to the identifier? Or should CPE leave this metadata work to others and focus solely on building the list of identifiers? A couple of points that I would like to bring to the table for this ... * What is discrete to one user might not be discrete to another. How would CPE make this determination? Is WinXP Pro discrete (I think that is what you buy) or is WinXP Pro SP1 discrete (after you download the SP)? Actually, in reality you buy WinXP Pro English so is that discrete? This is similar to the issue of providing weights to vulnerabilities. Everyone has a different answer for what the weight should be. If the enumeration tries to set one as official, those that don't agree will be alienated. * Does expanding the scope of CPE beyond the task of providing identifiers reduce our ability to succeed? This is a lesson that has been learned in the past. CVE was one of the first to just focus on the identifier and leave the metadata problem to others. This resulted in an enumeration that has by all accounts succeeded. Other efforts have tried to do too much and have failed (recently look at AVDL). CPE has started down the CVE path and focused on the identifier, my personal suggestion is to stay on course. * This type of metadata is perfect for an external data repository built on CPE. This allows CPE to focus on the identifiers, and the repository to focus on supplying the use-case specific metadata. These repositories would not be competing with CPE, but rather leveraging CPE. CPE would allow these repositories to share information, and users to pull information from each of the repositories. I think this discussion is extremely useful for this community and I urge everyone to weigh-in with their thoughts. I personally think that the issue of where metadata lives is at the root of many of the concerns that get brought up. Should CPE focus on just the identifier? Or should it also focus on providing a repository of metadata related to the identifier? Thanks Drew -----BEGIN PGP SIGNATURE----- Version: 9.6.3 (Build 3017) wsBVAwUBSE/8Ip3xz8BLNKAgAQi0VggAkQiqwALtvBFChrrdpfReCP7f/Q+mS9Ph G0VkmDIz3kJIJ5CHsEMSYmW70CbLhcN3sGAvBGApl3UaLDqhjlxR1iQjZ1W6rjGj 3M/qNpWghVtDj3c97HTz6PLB5J0UcYfn7YFmJ2B5QnvkdbVJ7RZXMmMwDOER6kUv yCmPOpapzGaUrkRB4XpFccxKIO6ppFwEOa5nrqQ00/r8ykzpC558JGag/bs1nWpe ye5yC3G3yonMriFwQMxcpSBNrAwO8v0lxv//EWUWfmv2BXzVajWbcmr3HlbYWOg6 zGJmpKNnzIaQ78sTu6B2MdmDk/JdRfT+1hVFQaejNhy5lxyM6Fbl5Q== =t2qc -----END PGP SIGNATURE----- |
||||||||||||||||
|
Harold Booth-2
|
In reply to this post
by Andrew Buttner
My response is in-line below.
> >These capabilities are key for asset, license and procurement management > >applications. > > I completely agree, and hopefully CPE enables these types of > capabilities, but is it best for CPE to try and provide full support? > Or is it better for CPE work with other efforts to provide full > support? > > >These use cases are definitely within the intended scope of > >CPE, as references to standard product names are needed. If all he > >needs to support these use-cases is a flag that indicates if a CPE > >name refers to a discrete product or not, I think we need to find > >a way to support this in an official CPE capacity. > > I very much want to hear from others in the community about the above > statement. Is this within the scope of CPE? Should CPE work to define > additional metadata related to the identifier? Or should CPE leave > this metadata work to others and focus solely on building the list of > identifiers? Why shouldn't CPE try to provide support for these types of capabilities? Why add another standard to the mix when CPE can handle this use case with some minor changes? Admittedly, CPE cannot be all things to all people but it should handle the basic problem of communicating product/platform information across various domains. The domains need to include not only vulnerability and checklist data providers but other aspects of an enterprise such as asset and licensing management. My understanding is that the CPE community has decided that part, vendor, product, version, update, edition, and language are sufficient to uniquely identify a product. Is this understanding correct? If not what could be added to allow for unique identification? Another way to solve this without even requiring adding a bit is if the "official" dictionary contains CPEs down to the language level, even if that requires specifying default values for some of the components. According to the matching algorithm, CPEs using fewer component lengths are implied. Data providers could provide the meta-data for CPEs of fewer components as needed or desired. Taking the previous solution a bit farther I would argue that an official "CPE" identifier is always all seven components. Shorter CPEs are merely useful in the context of matching or where less specificity is needed. Since you are concerned with meta-data creep why not have the "official" dictionary provide just the identifiers with maybe a brief description and no other additional meta-data? Titles, references, checks, and any other meta-data would be value-add to the CPE provided by data providers (like the NVD). Addressing your points: > * What is discrete to one user might not be discrete to another. How > would CPE make this determination? Is WinXP Pro discrete (I think that > is what you buy) or is WinXP Pro SP1 discrete (after you download the > SP)? Actually, in reality you buy WinXP Pro English so is that > discrete? This is similar to the issue of providing weights to > vulnerabilities. Everyone has a different answer for what the weight > should be. If the enumeration tries to set one as official, those that > don't agree will be alienated. I strongly disagree with the analogy to CVEs and CVSS scoring. What is being talked about here is what constitutes an official identifier not what is the particular value for a piece of meta-data. As mentioned earlier, an "official" CPE should be as fine-grained as the standard currently defines. Less granular CPEs are always implied by the more specific ones. In this way no one is "alienated"; users of the less granular CPEs still have an agreed upon identifier. Taking your examples above: Initial release of the English Language version of WinXP Pro could be: cpe:/o:microsoft:windows_xp::gold:pro:en Once Service Pack 1 is released (either for download or eventually sold retail) a new entry would be: cpe:/o:microsoft:windows_xp::sp1:pro:en If a reference to only Microsoft Windows XP is desired then the cpe: cpe:/o:microsoft:windows_xp would refer to both of these CPEs in this example. > > * Does expanding the scope of CPE beyond the task of providing > identifiers reduce our ability to succeed? This is a lesson that has > been learned in the past. CVE was one of the first to just focus on > the identifier and leave the metadata problem to others. I have two points to this. First the problem that CPE is trying to solve is more difficult than for CVEs. Not only do we wish to communicate about specific products but we wish to also talk about groups of them as well. Second CPE already extends beyond just the task of providing identifiers. The CPE language structure codifies how to combine various products together to create a platform, and the CPE dictionary specification describes how to communicate lists of CPEs along with the meta-data. I agree that CPE should not attempt to describe every possible association of meta-data to a particular identifier. But CPE should specify a minimum set of "globally useful" meta-data as well as a mechanism to allow for arbitrary meta-data associations, facilitating communication of this meta-data between products which use CPEs. In the current version of the CPE specification the required encoded meta-data could be the seven component pieces of a CPE identifier. > > * This type of metadata is perfect for an external data repository > built on CPE. This allows CPE to focus on the identifiers, and the > repository to focus on supplying the use-case specific metadata. These > repositories would not be competing with CPE, but rather leveraging > CPE. CPE would allow these repositories to share information, and > users to pull information from each of the repositories. See above. > Should CPE focus on just the identifier? > Or should it also focus on providing a repository of metadata related > to the identifier? Ultimately, I would like to see CPE as a means to communicate about products. I think the CPE specification should not only provide a means to associate an identifier with a product but also a way to communicate about that product or groups of products. The CPE specification should also provide a standardized means to associate arbitrary meta-data with a CPE as well as a way to describe this meta-data. -Harold |
||||||||||||||||
|
Ernest Park-2
|
I think an element that we are missing is a definitive identifier that associate a human friendly name with an absolute ID.
When a name is created, we need MD5s (as example) for a definitive file that identifies that app, release, patch.
Any name therafter is a valuable identifier, but can be aliased. In this way, multiple names can identify a single release. Additionally, multiple machine identifiers can be identified with a "name".
Regarding the comments below , I thought of a few things
The problem above is that we go from distinct naming, where we are identifying "an electronic thing" to now identifying "an electronic thing" and it's condition of being.
Sort of like a license plate that has to identify the owner and the highway on which the car operates, all in the 7 characters on the plate.
So, maybe what is missing is the further development of the concept of a "part" - cpe:/a:.
Does there need to be greater granularity with parts, in order to identify not only operating systems, those that are 64bit, 32bit, and so on, specific to a chipset, etc?
Then, if we have a "part" identifier, then each name will have "<vendor>:<app>:<release>", will be joined with "part".
Part may need to be decoupled from name in order to allow highest order names to be built before all the granular "part" details are finished.
In sum, part becomes a distinction of an application, but is different than a release.
On Wed, Jun 11, 2008 at 12:35 PM, Harold Booth <[hidden email]> wrote: My response is in-line below. |
||||||||||||||||
|
Eirik Iverson
|
In reply to this post
by Ernest Park-2
Some javascript/style in this post has been disabled (why?)
Ernest – Nice database example. The questions below illustrate my
confusion about CPE… How was your inventory database actually
populated? Was it done so explicitly via data entry or import (e.g., host
XYZ has Microsoft Excel 2003)? Or, was it derived from another table with
retrieved host data such as “C:\Program Files\Microsoft Office\OFFICE11\excel.exe”
and “11.0.8211.0”? How does one systematically determine asset
inventory for an entire population of endpoints based on data that can be collected
so that one can take advantage of the rest of the framework (e.g., CVE, CVSS,
etc.)? Is the scope of CPE limited such that tools
(e.g., patch management, configuration management, vulnerability assessment,
etc.) must first identify assets and report their findings in a CPE compliant
manner so that one can then leverage all of this great work? Cheers,
From: Ernest Park
[mailto:[hidden email]] Hi - I have 4 databases, all which talk to each other indirectly through
common CPE language. I can do the following . . . select vendor,application,part_v2,release,inventory_db.application_installed_date,master_db.platform,external_nvd.nvd_name,current
from open_source_db inner join inventory_db on (inventory_db.application=
open_source_db.application) inner join external_nvd on
(external_nvd.application=open_source_db.application) inner join master_db on
(master_db.application=open_source_db.application) where inventory_db.application_installed_status = 'YES' and where inventory_db.application_currentreleasename is 'NOT NULL' into outfile 'CPE_v2-inventory list.csv' Excuse any typos above - I wrote a simple query to illustrate the
point. Using common CPE syntax in all my databases, I can reliably exchange
data between databases, and output them with an identifier that will conform to
having CPE pieces. In my output, the URI can be parsed from the output, even in Excel,
such as . . . =CONCATENATE("cpe",A4,B4,":",C4,":",D4,"
=",E4,"=",F4,"=",G4) Fields in < > refer to Excel cells.
Note that platform and install date are data fields that I supply
and in this case are not constrained to anything. Platform could be "External
approved WIndows - v12", and so on. Therefore, I can output an inventory report down to the installed release,
identify when it was installed, what platform, and the latest relevant CVE
name. I do a report for subscribers that includes CVEs associated to currently
installed releases and sorts results by CVSS, Work load index and my risk
score. I scan 5 databases and the NVD XML to output the inventory risk
management report. The CPE gives me a uniform handle by which to describe
things across datasources. So - NVD gives me CVE names, CVSS, WLI, I generate an inventory report
using scanning tools, I apply poicy using policy management software, and so
on. If a user of my data wants to pull Ernie On Tue, Jun 10, 2008 at 4:13 PM, David Waltermire <[hidden email]>
wrote: Drew,
Sent: Tuesday, June 10,
2008 1:45 PM >They wish to express a definitive "has this" or "is
this" relationship |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ernest Park-2
|
Hi Eirik -
On Wed, Jun 11, 2008 at 3:39 PM, Eirik Iverson <[hidden email]> wrote:
Neither. I use Palamida IP Amplifier product. I added custom signatures and additional vendor, app, release metadata.
IP Amp scans and IDs the unknown files. Once I have the products IDed, I start layering metadata in - so all Apache prods get a "vendor" attriibute like apache_software_foundation, and so on.
I maintain a number of parallel databases, so I use common application name searching across all DBs. When we automatically add a new product to the inventory database, if no metadata exists, the report returns the best match to LIKE searches. We can then manually go in and either permanently add a new product, a new alias for existing, or correct identification for a product that was added.
http://gpl3.palamida.com has a downloadable database that has all the pieces needed to parse into a CPE name.
The inventory report is used as part of a wget/google api script. Using what I know, I query the internet and attempt to auto-match metadata, like URL, project home page, vendor name, product description, and so on.
Code scanning tools do exactly this. I use IP Amp to scan, find releases, associate CPE names, and so on.
BTW - due to data issues, leveraging CVE data is not entirely enabled via CPE names. I end up doing indirect vendor,app, release lookups against the CVE data to find matching CVEs to a release.
Yes. CPE is a name string. It is only a way to assure that cpe:/a:vendor:app:release means the same to you that it does to me.
Once you have a way to identify an asset via a distinct CPE name, or a higher level part of the asset, like cpe:/a:vendor, you can now map metadata to the identifier.
Much of my data is automatically collected, but some of it, like GPL3, is hand collected. I use forms to constrain the information going into the database so that it includes all the required elements, and interns resolve any "errata", unresolved names, bad information, failure to map within databasdes - like unknown or new vendors, products, etc.
CPE is not a dat management solution. It is just a way to share distinct information with common keys between users and electronic sources.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
dwhite-5
|
In reply to this post
by Andrew Buttner
The NIST NSRL has such a mapping, though not currently in CPE format.
On Jun 10, 2008, at 11:00 AM, Buttner, Drew wrote: > To solve your > need, someone must create a mapping that relates different CPE Names > based > on applicability and ability to install. This mapping might look > like: > > Host OS Application > --------------------------------------------------- > cpe:/o:microsoft:windows_xp cpe:/a:vendor:app1 > cpe:/o:microsoft:windows_xp cpe:/a:vendor:app3 > cpe:/o:microsoft:windows_xp::sp1 cpe:/a:vendor:app2 > cpe:/o:microsoft:windows_2003 cpe:/a:vendor:app3 > > > Application Host OS > --------------------------------------------------- > cpe:/a:vendor:app1 cpe:/o:microsoft:windows_xp > cpe:/a:vendor:app1 cpe:/o:microsoft:windows_2003 > Douglas White National Institute of Standards and Technology NIST, 100 Bureau Drive Stop 8970, Gaithersburg, MD 20899-8970 National Software Reference Library - http://www.nsrl.nist.gov Voice: 301-975-4761 Fax: 301-975-6097 Email: [hidden email] My opinions aren't necessarily my employer's nor any other organization's. "Even if you're on the right track, you'll get run over if you just sit there" - Will Rogers |
||||||||||||||||
|
dwhite-5
|
In reply to this post
by Andrew Buttner
On Jun 11, 2008, at 8:34 AM, Buttner, Drew wrote:
>> These use cases are definitely within the intended scope of >> CPE, as references to standard product names are needed. If all he >> needs to support these use-cases is a flag that indicates if a CPE >> name refers to a discrete product or not, I think we need to find >> a way to support this in an official CPE capacity. > > I very much want to hear from others in the community about the above > statement. Is this within the scope of CPE? Should CPE work to > define > additional metadata related to the identifier? Or should CPE leave > this metadata work to others and focus solely on building the list of > identifiers? > * This type of metadata is perfect for an external data repository > built on CPE. This allows CPE to focus on the identifiers, and the > repository to focus on supplying the use-case specific metadata. > These > repositories would not be competing with CPE, but rather leveraging > CPE. CPE would allow these repositories to share information, and > users to pull information from each of the repositories. The NSRL is looking to the CPE for the exact reason above; to leverage the CPE in order to share information. NSRL collects metadata, e.g. SHA-1/MD5/CRC hashes, filename, directory path, bytesize, MAC timestamps, etc. and we are very receptive to collecting other metadata or running algorithms against our collection, if we don't already have what the community needs. Douglas White National Institute of Standards and Technology NIST, 100 Bureau Drive Stop 8970, Gaithersburg, MD 20899-8970 National Software Reference Library - http://www.nsrl.nist.gov Voice: 301-975-4761 Fax: 301-975-6097 Email: [hidden email] My opinions aren't necessarily my employer's nor any other organization's. "Even if you're on the right track, you'll get run over if you just sit there" - Will Rogers |
||||||||||||||||
|
Gary Newman-2
|
In reply to this post
by Ernest Park-2
Hi Ernest,
Do your tools use vendor provided data to name the vendor and application, e.g. those strings from an RPM distribution? Or, are the vendor and application names hand added? Cheers, -Gary- On 11 Jun 2008 at 21:45, Ernest Park wrote: > Neither. I use Palamida IP Amplifier product. I added custom signatures and > additional vendor, app, release metadata. > > So, step 1. Use a tool that can scan unknown code and create an inventory. > Ideally, choose a tool that outputs CPE constructs or a CPE URI. I push out > the pieces, since I need to massage and correct misassociations later. By > having the pieces, I can be certain of vendor, app, and just fix a release > name. > > > > IP Amp scans and IDs the unknown files. Once I have the products IDed, I start > layering metadata in - so all Apache prods get a "vendor" attriibute like > apache_software_foundation, and so on. |
||||||||||||||||
|
Ernest Park-2
|
One example I have worked as follows -
all automatic -
for a given archive file, I extract onto the file system, crawl the tree and look for strings - copyright, published, copying, and i look at the attribution in the header of source files -
We assemble an array of search terms.
(some human intervention here usually) We feed the most common strings into the google API, and then get back the project URL and the vendor URL.
Keep in mind - when doing this by hand, we can find the vendor in a minute. Automatically, we look for directory names, key file names, attribution, and then we intersect this.
Google API will give me the vendor "likely name", the URL, the project home page, the project title, the short title (like jboss versus JBoss Application Server).
Once I have this, a human reviews the results, and then we crawl the web for releases, and feed the releases to WGET. I string search the release names to get the unique elements from the archive file (like 3.1.a out of jboss-3.1.a.tar.gz).
This can be done without a third party code scanner.
For the GPLv3 project at http://gpl3.palamida.com, all entries are hand added, but I use the Google API to populate all fields.
THe project starts with crawlers that look for releases that include the GPLv3 license. Researchers take the release info, and download the open source project. They expand the project and look for the short name, the title, and the vendor. The web UI uses this data to narrow the possible results down, and the researcher confirms the selection of the Vendor, the App, and the release. These names are compared to the existing vendor/apps hosted by NVD.NIST.GOV, and CPE friendly names are suggested if they already exist for vendor and app.
A human makes the association, and then the record is stored in the database.
If you have the ability to use a string search, you can certainly associate a distro with a name. If you need to get more automatic, you can build string tools, or use commercial or OSS scanners to get you the first part.
It seems that our forms in the collection process combine some human smarts with web crawlers and Google API to attempt to resolve the most likely CPE name information quickly.
Ernie
On Thu, Jun 12, 2008 at 6:54 PM, Gary Newman <[hidden email]> wrote: Hi Ernest, |
||||||||||||||||
| Free Embeddable Forum Powered by Nabble | Help |