OSS CPEs

13 messages Options
Embed this post
Permalink
Ernest Park-2

OSS CPEs

Reply Threaded More More options
Print post
Permalink

I have a dictionary of a few hundred thousand OSS project names with
metadata and releases.

If someone writes the web service front end, I will publish all of this
to a database available to the service via the web. Basically, I have
most of open source software in CPE format.

Any volunteers?


Ernie

 

On Wed, Jun 4, 2008 at 1:15 PM, Buttner, Drew <[hidden email]>
wrote:


I like your approach here and this is a perfect use of CPE.  You have
created a schema for your database that uses the CPE Name to id
platform information.  This will theoretically allow others to interact
with your database using a CPE Name, or will allow you to interact with
other data sources via CPE Name.

The "alias" feature is right along the lines of what we discussed at
Developer Days.  Nice!

Thanks
Drew

Vladimir Giszpenc

Re: OSS CPEs

Reply Threaded More More options
Print post
Permalink
Some javascript/style in this post has been disabled (why?)

 

1.                   NVD is the maintainer of the official CPE dictionary and it would make sense to add these as beta/unvetted/unofficial content to that dictionary

2.                   Verifying that none are dupes or wrong in some other way is a large undertaking.

3.                   If I remember correctly, the CPE ids are in CPE 1.0 format, so they would need to be transformed to 2.1.

 

If Dave Waltermire and company need help setting up the web services, database and other plumbing, I may be able to contribute developer time to such a project.

 

Have a nice weekend!

 

Best regards,

Vladimir Giszpenc
DSCI Contractor Supporting
US Army CERDEC S&TCD IAD Tactical Network Protection Branch
(732) 532-8959


From: Ernest Park [mailto:[hidden email]]
Sent: Thursday, June 05, 2008 4:05 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] OSS CPEs

 

I have a dictionary of a few hundred thousand OSS project names with
metadata and releases.

If someone writes the web service front end, I will publish all of this
to a database available to the service via the web. Basically, I have
most of open source software in CPE format.

Any volunteers?


Ernie

 

On Wed, Jun 4, 2008 at 1:15 PM, Buttner, Drew <[hidden email]>
wrote:


I like your approach here and this is a perfect use of CPE.  You have
created a schema for your database that uses the CPE Name to id
platform information.  This will theoretically allow others to interact
with your database using a CPE Name, or will allow you to interact with
other data sources via CPE Name.

The "alias" feature is right along the lines of what we discussed at
Developer Days.  Nice!

Thanks
Drew



smime.p7s (4K) Download Attachment
Ernest Park-2

Re: OSS CPEs

Reply Threaded More More options
Print post
Permalink
The problem lies in adding yet more unofficial content. My research team has a recognized expertise in open source software. Our contribution and expertience warrants my work to be treated as "authoritative" for open source software. Doing so allows the database to grow quickly, despite possible disagreements. Not doing so leaves things exactly as they are.
 
If we accepted 100,000 products and the 1,000,000 releases as "beta", who plans to review this, using what rules and metrics, and over what time?
 
Why would we accept as official contributions from Symantec but not from me? Product and release names for open source software are more critical to my business than any of the authoritative sources you currently have. Open source software provides no centralized source for data. My team and I look at each release, each license file, and validate all related information, and then keep such on file. We make effort to be thorough and complete, since our work represents the contributions of thousands who are not doing this for themselves.
 
 
By having a flexible "alias"concept, the community can accept the names, and modify the names through aliasing to support new standards without undermining the volume contributions.
 
 
We must invite volume contribution from trusted sources. What is the criteria that we apply to trust contributions as official from certain authoritative sources?
 
An example of my research is . . .
 
 
 
 
Ernie

On Fri, Jun 6, 2008 at 11:31 AM, Vladimir Giszpenc <[hidden email]> wrote:

 

1.                   NVD is the maintainer of the official CPE dictionary and it would make sense to add these as beta/unvetted/unofficial content to that dictionary

2.                   Verifying that none are dupes or wrong in some other way is a large undertaking.

3.                   If I remember correctly, the CPE ids are in CPE 1.0 format, so they would need to be transformed to 2.1.

 

If Dave Waltermire and company need help setting up the web services, database and other plumbing, I may be able to contribute developer time to such a project.

 

Have a nice weekend!

 

Best regards,

Vladimir Giszpenc
DSCI Contractor Supporting
US Army CERDEC S&TCD IAD Tactical Network Protection Branch
(732) 532-8959


From: Ernest Park [mailto:[hidden email]]
Sent: Thursday, June 05, 2008 4:05 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] OSS CPEs

 

I have a dictionary of a few hundred thousand OSS project names with
metadata and releases.

If someone writes the web service front end, I will publish all of this
to a database available to the service via the web. Basically, I have
most of open source software in CPE format.

Any volunteers?


Ernie

 

On Wed, Jun 4, 2008 at 1:15 PM, Buttner, Drew <[hidden email]>
wrote:


I like your approach here and this is a perfect use of CPE.  You have
created a schema for your database that uses the CPE Name to id
platform information.  This will theoretically allow others to interact
with your database using a CPE Name, or will allow you to interact with
other data sources via CPE Name.

The "alias" feature is right along the lines of what we discussed at
Developer Days.  Nice!

Thanks
Drew


Thomas R. Jones

Re: OSS CPEs

Reply Threaded More More options
Print post
Permalink
In reply to this post by Ernest Park-2
On Fri, 2008-06-06 at 16:26 -0400, Ernest Park wrote:

> Please keep in mind that I am deeply involved with managing and
> maintaining distinct records for millions of releases and billions of
> files and related components. I believe that what CPE represents is
> incredibly important.
>  
>  
> Comments below -
>
>
> On Fri, Jun 6, 2008 at 3:56 PM, Thomas R. Jones
> <[hidden email]> wrote:
>
>         Responses inline.  
>        
>         Sent from my iPhone
>         On Jun 6, 2008, at 2:21 PM, Ernest Park
>         <[hidden email]> wrote:
>        
>        
>        
>         > Hi Tom, notes inline.
>         >
>         >
>         > On Fri, Jun 6, 2008 at 2:51 PM, Thomas R. Jones
>         > <[hidden email]> wrote:
>         >
>         >         Hello ernest,
>         >        
>         >         I have a few reservations. First of all, I am one of
>         >         a small minority of open source researchers and
>         >         contributors to cpe. So I would like to extend a
>         >         welcome to you and your colleagues. Second, the vast
>         >         amount of contributions is almost disconcerning. I
>         >         am sure yourself and your colleagues have worked
>         >         diligently to provide a much needed service to this
>         >         community. And I for one thank you!
>         >        
>         >  
>         >        
>         >         However, what you propose is very difficult to
>         >         envision on such a scale. No one in the community,
>         >         that I know of, has had an opprtunity to evaluate
>         >         the contributions proposed. This should be a
>         >         pre-requisite before anyone jumps on board. A view
>         >         of the database structure is vital.
>         >  
>         > Why do you need to view the data??
>        
>         The data is what is relevant. If I, and others that may
>         possibly contribute, are not allowed to have access to said
>         data then it is difficult to provide our support.  
>        
>         As an analogy, would you buy a car if you not only could not
>         see it but also not drive it?
>        
>  
>         There are many many reasons that any one of us may want to
>         obtain a subset of data.
>        
>  
>  
> The analogy is incorrect. The CPE, despite the discussions here, is
> intended by its own definition to be an identifier, a URI - like
> string. In your analogy, this merely means that if I were buying a
> car, I would want a license plate that distinctly identified my car.
> Any additional data would be stored in my car, separate from that
> record with the unique identifier.

No. The analogy is correct. How may I know that my paint job is in fact
a particular color if I may not see it? How do I know that my automobile
is in fact made by a particular automaker if I can not see the emblem?
How may I be assured that particular safety features I may rely on, if I
can not definitively say are there for my utilization?
 
>  
> The problem when we make CPE into a complex database is that we blur
> so much the lines of what it is and is not that we dissuade
> contributions and usage by the community.

This is a political view and/or opinion that does not need to be brought
to light within the conversation. Lest you forget, that I am too an open
source contributor. I know all to well the complexities of contributing
to a vendor majority sponsored standard. In fact I have done so through
many standards within the w3c and IEEE communities. But we try as much
as possible to reduce the amount of seclusion and segregation as this.
And i'll be honest in my opinion that Mitre and the individuals charged
with this project have done an outstanding job doing so! ;)

>  
> The CPE is a name that points to something, and with an inferred
> relational hierarchy in the name.
>  
> If I want to deploy a database that supports CPE 1.x query, you do NOT
> need to qualify the database. If I offer to provide, or keep secret,
> anything beyond those elements which distinctly confirm a valid name
> and its association with a distinct technology component, that should
> be sufficient.

But you are asking the community to put forth faith in an infrastructure
that we have not seen. How can we do that? Is it an IP issue that may be
at hand? Im sure that anyone here would put forth signatory recognition
of an NDA if need be. Or do we just blindly go forth?

>  
>  
> When we try to make CPE something it is not, it will never be what it
> can be. If it is merely a naming identifier, it becomes a unification
> point for data from multiple providers. I could allow software
> companies to query my data. They may invite me to query theirs. The
> common unification is the name.
>  
> Nothing should matter to CPE beyond a valid name and association to a
> distinct element no more than the DMW cares about what fuel you run in
> the car.
>  
>  
>        
>         > CPE is not a database or a schema. It is a string identifier
>         > format for distinct technology elements - nothing more. The
>         > idea at the end of the day is to provide a dictionary of
>         > names. The data underlying that is irrelevant, may be
>         > proprietary, and may have nothing to do with defining a
>         > name. I continually see the problem of CPE that we all fall
>         > into the mistake of making it something more than it is. CPE
>         > is a phone book - a set of distinct and human friendly
>         > identifiers for technology assets, nothing more.
>         >  
>         > If I can provide you with Vendor, Applicatioon, Title,
>         > Release, URL, maybe an MD5, as part of a query, then it is
>         > the result set you should be looking at.
>        
>         This statement relates to the first question. The subset IS
>         what is important. But how the data is obtained is also in
>         question. I simply would like to see the SQL structure. What
>         type of tables are utilized? Can they be easily restructured?
>         Are we inhibited by the structure to not provide future
>         advancements within the standard? May this data be replicated?
>         Does the SQL structure take into consideration
>         internationalization?
>        
>  
>  
> From the CPE homepage (http://cpe.mitre.org)
>  
>         CPE™ is a structured naming scheme for information technology
>         systems, platforms, and packages. Based upon the generic
>         syntax for Uniform Resource Identifiers (URI), CPE includes a
>         formal name format, a language for describing complex
>         platforms, a method for checking names against a system, and a
>         description format for binding text and tests to a name.
>          
>          
> There is not reference to SQL structure in the definition of CPE, nor
> a reference implementation. CPE is NOT a database or a data storage
> system of any kind. CPE does not denote a schema, but such information
> can be stored in a number of formats while still containing CPE
> compliant information.

True. But your data is housed in a database. If it were a simple text
file, as is the current dictionary, than we as a community would want to
see it and verify. We would ensure that the character encoding is
sufficient for the community to process. That there is not a structural
issue within the XML nodes that inhibits its utilization. It is no
different.

We, as a community(and I may be speaking for only myself here---i do not
presume to speak for all the cpe community), would ensure that the data
quality and availability is intact. As an information security
organization, you surely understand the need for compliance with this
aspect of the TRIAD.

>  
> I am sure Symantec and McAfee store proprietary information along with
> having those components that support CPE in their data repositories,
> but they would not more open these databases to review than I will. If
> CPE is a name identifier constrained by elements, if I can provide the
> elements, perhaps:
>  
> vendor, URL, application, app home page, release, release file name
> and URL, MD5 for release file,
>  
>  
> any string containing components from above is an identifier.
>  
>  
>        
>         I could easily pose a few questions to you regarding the
>         database and informational manipulation if you would prefer.
>        
>         >
>         >
> What are the fields required in order to accept a third party
> contribution of a CPE name?
>  
>  
>         >  
>         > Also, if I can provide something that nobody else has
>         > provided, why not use it until it is contested? If not, the
>         > database is perpetually bottlenecked by a subjective
>         > approval process that due to realistic limitations will
>         > never grow as fast as the growth in new open source projects
>         > over any measure of time.
>        
>         I applaud you and your colleagues contribution. However a
>         standard MUST undergo an official review and proposal process.
>         Otherwise it is just another run-of-the-mill project to "put
>         out the fires" of today's problems.
>  
> If this process does not accept open participation from the community
> for submission volume in size with the growth and expansion of the
> market that we are describing, the process is inherently flawed, and
> the open source community and commercial vendors will be compelled to
> solve this issue.

Open participation is more than welcome. I would be the first to welcome
such contribution. For it seems, I have been a single open source voice
in a predominantly vendor sponsored standard. However, this community
has made great strides in the CPE standard. And we will continue to do
so. There are a great many wonderful and committed people and
organizations here within this standard. We welcome your organizations
effort and contribution. I am very excited to see such progress.

However you are missing the point that I am trying to convey, WE as a
community must develop and propagate the standard. This is a community
oriented standard. I am simply asking that we slow down a bit and review
the changes that you propose. Your proposal is of such vast magnitude
that we cannot simply just go forth head first and accept on a whim.

Both open source and vendor entities must be presented the possibilities
and subsequently review the proposal before any action can be taken.


>  
>        
>        
>         >
>         >  
>         > Naming open source is a problem that will require an open
>         > community approval process to function. The database needs
>         > to be able to grow as fast as possible, allowing voluminous
>         > contributions from certain trusted partners.
>        
>         The speed at which the cpe database "grows" is irrelevant. The
>         quality of the data that it possesses is of paramount
>         importance.
>  
> It is a self limiting repository that will become less relevent over
> time if it cannot effectively describe the "market" of objects that it
> represents. If it only describes a quality subset, then it becomes a
> flawed and subjective list, and will force the commercial market to
> come up with something faster, better, and able to adapt to the growth
> in certain parts of the technology market and our need to universally
> describe these pieces.
>  
>        
>        
>         As well, who determines what entails a "trusted partner"? How
>         is this status obtained? Who authorizes or denies such
>         claims?
>  
> Why do we accept information from a commercial vendor as being
> authoritative, yet professional open source and commercial software
> researchers do not get offered this trust?

This is a flawed presumption. Please take the time to review the
mailinglist archives as I have previously noted. There has been great
discussion over this exact topic.

>  
>        
>        
>         > It is my business to research and catalog open source
>         > software. My work is cited by every major analyst every
>         > week. Not discounting the work of your team, but I implore
>         > you to "qualify" certain contributors as "authoritative" in
>         > order to allow growth.
>        
>         I would love to engage in further discussions of an
>         "authoritive" entity. There has in fact been previous
>         discussions regarding the authoritative subject for open
>         source products. It should be available within the mailinglist
>         archives. However, maybe a review and/or re-discussion is due.
>         I would happily contribute to such.
>        
>        
>         >
>  
> Please feel free to reach out to me privately for further discussion.
>  
> I can be reached at [hidden email] .

Thank you ernest. ;) I will most definitely place you within my address
book. I am sure we will converse much more in the near future.

However, I feel it is in the best interest of the community at this time
to ensure that all discussions related to the topic at hand are
presented in a candid and open manner for all to review, reflect and
hopefully comment on in the near future.

>         >  
Ken Lassesen-3

Re: OSS CPEs

Reply Threaded More More options
Print post
Permalink
In reply to this post by Ernest Park-2
Some javascript/style in this post has been disabled (why?)

I have the skills to do so --- and can host the webservice/website  on a non-vendor related domain (Lassesen.com  OR reddwarfdogs.com )

 

Some basic questions:

·         What database are you using?  If you can dump all of your data as XML, then it’s a meaningless question

·         For updates to the database what is your plan?

o   Update it manually via an interface on the website?

o   Upload a delta as Xml?

·          

 

Ken Lassesen,

Home/Office: 360-724-3190 Fax: 952-516-5077
Cell: 360-509-2402  Skype: Ken.Lassesen

IM: [hidden email]  http://www.linkedin.com/in/lassesen

CONFIDENTIALITY NOTICE

The information contained in this electronic message may contain confidential and privileged information and is intended only for use by the individual(s) or entity(ies) to whom it was addressed. Any unauthorized review, use, disclosure, or distribution of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and permanently delete and destroy the original message.

 

From: Ernest Park [mailto:[hidden email]]
Sent: Thursday, June 05, 2008 1:05 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] OSS CPEs

 

I have a dictionary of a few hundred thousand OSS project names with
metadata and releases.

If someone writes the web service front end, I will publish all of this
to a database available to the service via the web. Basically, I have
most of open source software in CPE format.

Any volunteers?


Ernie

 

On Wed, Jun 4, 2008 at 1:15 PM, Buttner, Drew <[hidden email]>
wrote:


I like your approach here and this is a perfect use of CPE.  You have
created a schema for your database that uses the CPE Name to id
platform information.  This will theoretically allow others to interact
with your database using a CPE Name, or will allow you to interact with
other data sources via CPE Name.

The "alias" feature is right along the lines of what we discussed at
Developer Days.  Nice!

Thanks
Drew

Ernest Park-2

Re: OSS CPEs

Reply Threaded More More options
Print post
Permalink
To Ken - thanks! I will contact you for help to sort this out. I think that the CPE dictionary needs to be a real time dynamic framework that conforms to a URL resolution of a name query. Such resolution would allow information providers to "append" metadata to any record in a uniform format, and clarify that the primary reason of CPE to exist is to provide a distinct identifier to technology that can be further dfescribed, and knowing the distinct name, such information can be shared and collaborated with. 
 
----------------------------------------------------------------------
 
I think this is the right idea. I will discuss hosting with Drew. I certainly have the gear and domains to put this on a vendor neurtal site, but unless this is hosted on the "sanctioned" site, it is just Ken and Ernie posting a list.
 
 
---------------------------------------------------------------------------------------
 
If Drew says that my site, or Ken's site, or a new, unnamed site, will be the source for EVERYTHING, then it will work. We cannot decouple the open source content as being distinct from that which has a vendor. In practice, most if not all of the commercial software has some element of open source anyway. If we get smart at naming stuff, do we want to actually name as follows -
 
commercial product ->contains->open source product
 
In practice, some commercial products are actually aliases for ana amalgum of open source components.
 
 
 
the above is conceptual, but stresses the realistic importance of maintaining a singular, trusted and sanctioned source. Otherwise, my data is readily available and has been under a CC license for a year.
 
 
---------------------------------------------------------------------------------------
 
Regarding data, I store it across 4 MySQL databases in a few dozen tables. The CPE friendly output is the result of a ten way inner join. I could generate a join table that represents ONLY those fields that we need to construct a CPE name and validate it with an artifact, like a hash, a URL, a license file, etc. An XML schemal works as well if we all agree on a simple schema for name synch, not data storage.
 
Granted, once you have the name, you can query my database across about 4 billion records to investigate trending, software usage, patterns, etc. By having a standard name, I can expose my web service to certain queries without just synching my DB.
 
 
From what I have seen, I may currently have the single largest CPE compliant implementation. It needs endorsement from the community of users, automatic integration into the big database, and a facility with which we can query the data.
 
The data is currently maintained as updates to the database. I could either push XML updates, or synch tables, or push SQL changes.
 
I am certain that the volume of records that I have may be fraught with inconsistencies and errors. However, the data has been copiously reviewed by a staff of 50, and is at least of quality equivalent to what we have. If we agree on a way to accept this data, perhaps we can agree on a way of accepting a "non-static" dictionary. If the dictionary were a dynamic point in time representation of our accumulated data, stored in a database or series of databases, queried by approved memebers through a secured web service, we can all collaboratively grow this data with less bottlenecks.
 
 
---------------------------------------------------------------------------------------
 
Trend Analysis -
 
10 years ago, open source reported vulnerabilities represented less than 30% of all issues
Currently, over 55%.
 
The linear trend will have 80% of all vulnerabilities reported against open source within 6 years.
 
 
 
 
There are over 500,000 open source software projects worldwide. There are an average of 8 recognized releases per project, so with potentially 4,000,000 releases to be named, this is a large task.
 
A number of analysts quoting large corporate buyers have cited a trend that will be reflected within 5 years. What was confirmed is the reality that 80% of software in use by government and enterprise will be open source based, and 50 - 80% of that will be delivered as a web service - software kept on a remote server, and only the service experienced as the result of an interaction with a web browser.
 
In summary, this tells us that the importance that we currently put on vendor supported names will have much less relevance in the real use of technology assets over the next half decade. If we don't embrace an understanding of the real inpact of open source within our computing world, then CPE will continue to be primarily a naming system for commercial apps and those things that NVD finds.
 
 
On Fri, Jun 6, 2008 at 1:48 PM, Ken Lassesen <[hidden email]> wrote:

I have the skills to do so --- and can host the webservice/website  on a non-vendor related domain (Lassesen.com  OR reddwarfdogs.com )

 

Some basic questions:

·         What database are you using?  If you can dump all of your data as XML, then it's a meaningless question

·         For updates to the database what is your plan?

o   Update it manually via an interface on the website?

o   Upload a delta as Xml?

·          

 

Ken Lassesen,

Home/Office: 360-724-3190 Fax: 952-516-5077
Cell: 360-509-2402  Skype: Ken.Lassesen

IM: [hidden email]  http://www.linkedin.com/in/lassesen

CONFIDENTIALITY NOTICE

The information contained in this electronic message may contain confidential and privileged information and is intended only for use by the individual(s) or entity(ies) to whom it was addressed. Any unauthorized review, use, disclosure, or distribution of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and permanently delete and destroy the original message.

 

From: Ernest Park [mailto:[hidden email]]
Sent: Thursday, June 05, 2008 1:05 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] OSS CPEs

 

I have a dictionary of a few hundred thousand OSS project names with
metadata and releases.

If someone writes the web service front end, I will publish all of this
to a database available to the service via the web. Basically, I have
most of open source software in CPE format.

Any volunteers?


Ernie

 

On Wed, Jun 4, 2008 at 1:15 PM, Buttner, Drew <[hidden email]>
wrote:


I like your approach here and this is a perfect use of CPE.  You have
created a schema for your database that uses the CPE Name to id
platform information.  This will theoretically allow others to interact
with your database using a CPE Name, or will allow you to interact with
other data sources via CPE Name.

The "alias" feature is right along the lines of what we discussed at
Developer Days.  Nice!

Thanks
Drew


Thomas R. Jones

Re: OSS CPEs

Reply Threaded More More options
Print post
Permalink
Some javascript/style in this post has been disabled (why?)
Hello ernest,

I have a few reservations. First of all, I am one of a small minority of open source researchers and contributors to cpe. So I would like to extend a welcome to you and your colleagues. Second, the vast amount of contributions is almost disconcerning. I am sure yourself and your colleagues have worked diligently to provide a much needed service to this community. And I for one thank you!

However, what you propose is very difficult to envision on such a scale. No one in the community, that I know of, has had an opprtunity to evaluate the contributions proposed. This should be a pre-requisite before anyone jumps on board. A view of the database structure is vital. We, as you surely understand, are all putting valuable time into the standard. And to facilitate further development within your proposal; we must be able to ensure that the "project" is not flawed within design or structure constraints. 

Furthermore, there should be a community discussion of the stewardship of such a project. The notion that "EVERYTHING" be authoritative through this project is ambitious but wholly flawed. There should be an overwheling discussion of such aspects and subsequent requests before such proposals may be presented. 

I look forward to seeing and hearing of the magnitude of contributions that you and your colleagues may provide. Thank you once again. 

Sent from my iPhone

On Jun 6, 2008, at 1:19 PM, Ernest Park <[hidden email]> wrote:

To Ken - thanks! I will contact you for help to sort this out. I think that the CPE dictionary needs to be a real time dynamic framework that conforms to a URL resolution of a name query. Such resolution would allow information providers to "append" metadata to any record in a uniform format, and clarify that the primary reason of CPE to exist is to provide a distinct identifier to technology that can be further dfescribed, and knowing the distinct name, such information can be shared and collaborated with. 
 
----------------------------------------------------------------------
 
I think this is the right idea. I will discuss hosting with Drew. I certainly have the gear and domains to put this on a vendor neurtal site, but unless this is hosted on the "sanctioned" site, it is just Ken and Ernie posting a list.
 
 
---------------------------------------------------------------------------------------
 
If Drew says that my site, or Ken's site, or a new, unnamed site, will be the source for EVERYTHING, then it will work. We cannot decouple the open source content as being distinct from that which has a vendor. In practice, most if not all of the commercial software has some element of open source anyway. If we get smart at naming stuff, do we want to actually name as follows -
 
commercial product ->contains->open source product
 
In practice, some commercial products are actually aliases for ana amalgum of open source components.
 
 
 
the above is conceptual, but stresses the realistic importance of maintaining a singular, trusted and sanctioned source. Otherwise, my data is readily available and has been under a CC license for a year.
 
 
---------------------------------------------------------------------------------------
 
Regarding data, I store it across 4 MySQL databases in a few dozen tables. The CPE friendly output is the result of a ten way inner join. I could generate a join table that represents ONLY those fields that we need to construct a CPE name and validate it with an artifact, like a hash, a URL, a license file, etc. An XML schemal works as well if we all agree on a simple schema for name synch, not data storage.
 
Granted, once you have the name, you can query my database across about 4 billion records to investigate trending, software usage, patterns, etc. By having a standard name, I can expose my web service to certain queries without just synching my DB.
 
 
From what I have seen, I may currently have the single largest CPE compliant implementation. It needs endorsement from the community of users, automatic integration into the big database, and a facility with which we can query the data.
 
The data is currently maintained as updates to the database. I could either push XML updates, or synch tables, or push SQL changes.
 
I am certain that the volume of records that I have may be fraught with inconsistencies and errors. However, the data has been copiously reviewed by a staff of 50, and is at least of quality equivalent to what we have. If we agree on a way to accept this data, perhaps we can agree on a way of accepting a "non-static" dictionary. If the dictionary were a dynamic point in time representation of our accumulated data, stored in a database or series of databases, queried by approved memebers through a secured web service, we can all collaboratively grow this data with less bottlenecks.
 
 
---------------------------------------------------------------------------------------
 
Trend Analysis -
 
10 years ago, open source reported vulnerabilities represented less than 30% of all issues
Currently, over 55%.
 
The linear trend will have 80% of all vulnerabilities reported against open source within 6 years.
 
 
 
 
There are over 500,000 open source software projects worldwide. There are an average of 8 recognized releases per project, so with potentially 4,000,000 releases to be named, this is a large task.
 
A number of analysts quoting large corporate buyers have cited a trend that will be reflected within 5 years. What was confirmed is the reality that 80% of software in use by government and enterprise will be open source based, and 50 - 80% of that will be delivered as a web service - software kept on a remote server, and only the service experienced as the result of an interaction with a web browser.
 
In summary, this tells us that the importance that we currently put on vendor supported names will have much less relevance in the real use of technology assets over the next half decade. If we don't embrace an understanding of the real inpact of open source within our computing world, then CPE will continue to be primarily a naming system for commercial apps and those things that NVD finds.
 
 
On Fri, Jun 6, 2008 at 1:48 PM, Ken Lassesen <[hidden email]> wrote:

I have the skills to do so --- and can host the webservice/website  on a non-vendor related domain (Lassesen.com  OR reddwarfdogs.com )

 

Some basic questions:

·         What database are you using?  If you can dump all of your data as XML, then it's a meaningless question

·         For updates to the database what is your plan?

o   Update it manually via an interface on the website?

o   Upload a delta as Xml?

·          

 

Ken Lassesen,

Home/Office: 360-724-3190 Fax: 952-516-5077
Cell: 360-509-2402  Skype: Ken.Lassesen

IM: [hidden email]  http://www.linkedin.com/in/lassesen

CONFIDENTIALITY NOTICE

The information contained in this electronic message may contain confidential and privileged information and is intended only for use by the individual(s) or entity(ies) to whom it was addressed. Any unauthorized review, use, disclosure, or distribution of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and permanently delete and destroy the original message.

 

From: Ernest Park [mailto:[hidden email]]
Sent: Thursday, June 05, 2008 1:05 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] OSS CPEs

 

I have a dictionary of a few hundred thousand OSS project names with
metadata and releases.

If someone writes the web service front end, I will publish all of this
to a database available to the service via the web. Basically, I have
most of open source software in CPE format.

Any volunteers?


Ernie

 

On Wed, Jun 4, 2008 at 1:15 PM, Buttner, Drew <[hidden email]>
wrote:


I like your approach here and this is a perfect use of CPE.  You have
created a schema for your database that uses the CPE Name to id
platform information.  This will theoretically allow others to interact
with your database using a CPE Name, or will allow you to interact with
other data sources via CPE Name.

The "alias" feature is right along the lines of what we discussed at
Developer Days.  Nice!

Thanks
Drew


Andrew Buttner

Re: OSS CPEs

Reply Threaded More More options
Print post
Permalink
In reply to this post by Ernest Park-2
All,

I think this work is a huge help for CPE and will get us much further
down the road than where we are today.  But I'd like to scale back a
little of what I think I am reading.  CPE as a project is focused on
the naming specification and hosting the Official CPE Dictionary.  This
dictionary should be focused on providing a list of all known CPE
names, similar in scope to the CVE list.  Having these names available
to the community will enable external application to stand up and
support added metadata.

What I think I am reading fits into two very different jobs going
forward.  First is the submission to the Official CPE Dictionary of the
CPE Names for the open source platforms you have knowledge about.
Second is work on an application outside of CPE that provides a
database (keyed off of CPE Name) of appended metadata.

Is this understanding correct?  If so, I would really like CPE as a
project to focus on the first step.  Agree?

Thanks!
Drew


>-----Original Message-----
>From: Ernest Park [mailto:[hidden email]]
>Sent: Friday, June 06, 2008 2:20 PM
>To: cpe-discussion-list CPE Community Forum
>Subject: Re: [CPE-DISCUSSION-LIST] OSS CPEs
>
>To Ken - thanks! I will contact you for help to sort this out. I think
>that the CPE dictionary needs to be a real time dynamic framework that
>conforms to a URL resolution of a name query. Such resolution would
>allow information providers to "append" metadata to any record in a
>uniform format, and clarify that the primary reason of CPE to exist is
>to provide a distinct identifier to technology that can be further
>dfescribed, and knowing the distinct name, such information can be
>shared and collaborated with.
>
>----------------------------------------------------------------------
>
>I think this is the right idea. I will discuss hosting with Drew. I
>certainly have the gear and domains to put this on a vendor neurtal
>site, but unless this is hosted on the "sanctioned" site, it is just
Ken
>and Ernie posting a list.
>
>
>----------------------------------------------------------------------
--
>---------------
>
>If Drew says that my site, or Ken's site, or a new, unnamed site, will
>be the source for EVERYTHING, then it will work. We cannot decouple
the
>open source content as being distinct from that which has a vendor. In
>practice, most if not all of the commercial software has some element
of

>open source anyway. If we get smart at naming stuff, do we want to
>actually name as follows -
>
>commercial product ->contains->open source product
>
>In practice, some commercial products are actually aliases for ana
>amalgum of open source components.
>
>
>
>the above is conceptual, but stresses the realistic importance of
>maintaining a singular, trusted and sanctioned source. Otherwise, my
>data is readily available and has been under a CC license for a year.
>
>
>----------------------------------------------------------------------
--
>---------------
>
>Regarding data, I store it across 4 MySQL databases in a few dozen
>tables. The CPE friendly output is the result of a ten way inner join.
I
>could generate a join table that represents ONLY those fields that we
>need to construct a CPE name and validate it with an artifact, like a
>hash, a URL, a license file, etc. An XML schemal works as well if we
all
>agree on a simple schema for name synch, not data storage.
>
>Granted, once you have the name, you can query my database across
about
>4 billion records to investigate trending, software usage, patterns,
>etc. By having a standard name, I can expose my web service to certain
>queries without just synching my DB.
>
>
>From what I have seen, I may currently have the single largest CPE
>compliant implementation. It needs endorsement from the community of
>users, automatic integration into the big database, and a facility
with
>which we can query the data.
>
>The data is currently maintained as updates to the database. I could
>either push XML updates, or synch tables, or push SQL changes.
>
>
>I am certain that the volume of records that I have may be fraught
with
>inconsistencies and errors. However, the data has been copiously
>reviewed by a staff of 50, and is at least of quality equivalent to
what
>we have. If we agree on a way to accept this data, perhaps we can
agree
>on a way of accepting a "non-static" dictionary. If the dictionary
were
>a dynamic point in time representation of our accumulated data, stored
>in a database or series of databases, queried by approved memebers
>through a secured web service, we can all collaboratively grow this
data
>with less bottlenecks.
>
>
>----------------------------------------------------------------------
--
>---------------
>
>Trend Analysis -
>
>10 years ago, open source reported vulnerabilities represented less
than

>30% of all issues
>Currently, over 55%.
>
>The linear trend will have 80% of all vulnerabilities reported against
>open source within 6 years.
>
>
>http://gpl3.blogspot.com/2008/03/gpl-project-watch-list-for-week-of-
>0328.html
>
>
>There are over 500,000 open source software projects worldwide. There
>are an average of 8 recognized releases per project, so with
potentially
>4,000,000 releases to be named, this is a large task.
>
>A number of analysts quoting large corporate buyers have cited a trend
>that will be reflected within 5 years. What was confirmed is the
reality
>that 80% of software in use by government and enterprise will be open
>source based, and 50 - 80% of that will be delivered as a web service
-
>software kept on a remote server, and only the service experienced as
>the result of an interaction with a web browser.
>
>In summary, this tells us that the importance that we currently put on
>vendor supported names will have much less relevance in the real use
of

>technology assets over the next half decade. If we don't embrace an
>understanding of the real inpact of open source within our computing
>world, then CPE will continue to be primarily a naming system for
>commercial apps and those things that NVD finds.
>
>
>
>On Fri, Jun 6, 2008 at 1:48 PM, Ken Lassesen
><[hidden email]> wrote:
>
>
> I have the skills to do so --- and can host the
webservice/website

>on a non-vendor related domain (Lassesen.com  OR reddwarfdogs.com
><http://reddwarfdogs.com/>  )
>
>
>
> Some basic questions:
>
> *         What database are you using?  If you can dump all of
>your data as XML, then it's a meaningless question
>
> *         For updates to the database what is your plan?
>
> o   Update it manually via an interface on the website?
>
> o   Upload a delta as Xml?
>
> *
>
>
>
> Ken Lassesen,
>
> Home/Office: 360-724-3190 Fax: 952-516-5077
> Cell: 360-509-2402  Skype: Ken.Lassesen
>
> IM: [hidden email]  http://www.linkedin.com/in/lassesen
>
> CONFIDENTIALITY NOTICE
>
> The information contained in this electronic message may
contain
>confidential and privileged information and is intended only for use
by

>the individual(s) or entity(ies) to whom it was addressed. Any
>unauthorized review, use, disclosure, or distribution of this
>communication is strictly prohibited. If you are not the intended
>recipient, please contact the sender by reply email and permanently
>delete and destroy the original message.
>
>
>
> From: Ernest Park [mailto:[hidden email]]
> Sent: Thursday, June 05, 2008 1:05 PM
> To: [hidden email]
> Subject: [CPE-DISCUSSION-LIST] OSS CPEs
>
>
>
> I have a dictionary of a few hundred thousand OSS project names
>with
> metadata and releases.
>
> If someone writes the web service front end, I will publish all
of
>this
> to a database available to the service via the web. Basically,
I

>have
> most of open source software in CPE format.
>
> Any volunteers?
>
>
> Ernie
>
>
>
> On Wed, Jun 4, 2008 at 1:15 PM, Buttner, Drew
<[hidden email]>
> wrote:
>
>
> I like your approach here and this is a perfect use of CPE.
You
>have
> created a schema for your database that uses the CPE Name to id
> platform information.  This will theoretically allow others to
>interact
> with your database using a CPE Name, or will allow you to
interact
>with
> other data sources via CPE Name.
>
> The "alias" feature is right along the lines of what we
discussed
>at
> Developer Days.  Nice!
>
> Thanks
> Drew
>
Ernest Park-2

Re: OSS CPEs

Reply Threaded More More options
Print post
Permalink
In reply to this post by Thomas R. Jones
Hi Tom, notes inline.

On Fri, Jun 6, 2008 at 2:51 PM, Thomas R. Jones <[hidden email]> wrote:
Hello ernest,

I have a few reservations. First of all, I am one of a small minority of open source researchers and contributors to cpe. So I would like to extend a welcome to you and your colleagues. Second, the vast amount of contributions is almost disconcerning. I am sure yourself and your colleagues have worked diligently to provide a much needed service to this community. And I for one thank you!
 

However, what you propose is very difficult to envision on such a scale. No one in the community, that I know of, has had an opprtunity to evaluate the contributions proposed. This should be a pre-requisite before anyone jumps on board. A view of the database structure is vital.
 
Why do you need to view the data?? CPE is not a database or a schema. It is a string identifier format for distinct technology elements - nothing more. The idea at the end of the day is to provide a dictionary of names. The data underlying that is irrelevant, may be proprietary, and may have nothing to do with defining a name. I continually see the problem of CPE that we all fall into the mistake of making it something more than it is. CPE is a phone book - a set of distinct and human friendly identifiers for technology assets, nothing more.
 
If I can provide you with Vendor, Applicatioon, Title, Release, URL, maybe an MD5, as part of a query, then it is the result set you should be looking at.
 
Also, if I can provide something that nobody else has provided, why not use it until it is contested? If not, the database is perpetually bottlenecked by a subjective approval process that due to realistic limitations will never grow as fast as the growth in new open source projects over any measure of time.
 
Naming open source is a problem that will require an open community approval process to function. The database needs to be able to grow as fast as possible, allowing voluminous contributions from certain trusted partners. It is my business to research and catalog open source software. My work is cited by every major analyst every week. Not discounting the work of your team, but I implore you to "qualify" certain contributors as "authoritative" in order to allow growth. Further, you should specify the minimum acceptable data required to satisfy a valid entry - like a naming API.
 
We, as you surely understand, are all putting valuable time into the standard. And to facilitate further development within your proposal; we must be able to ensure that the "project" is not flawed within design or structure constraints. 
 
 
 
My experience is that the community as a whole does a good job of cleaning and managing a system. By publishing it all, but in a community maintainable "wiki" format for name associations, the community can resolve the dictionary dynamically, without impeding its growth and immediate value.
 

Furthermore, there should be a community discussion of the stewardship of such a project. The notion that "EVERYTHING" be authoritative through this project is ambitious but wholly flawed. There should be an overwheling discussion of such aspects and subsequent requests before such proposals may be presented. 

I look forward to seeing and hearing of the magnitude of contributions that you and your colleagues may provide. Thank you once again. 

Sent from my iPhone

On Jun 6, 2008, at 1:19 PM, Ernest Park <[hidden email]> wrote:

To Ken - thanks! I will contact you for help to sort this out. I think that the CPE dictionary needs to be a real time dynamic framework that conforms to a URL resolution of a name query. Such resolution would allow information providers to "append" metadata to any record in a uniform format, and clarify that the primary reason of CPE to exist is to provide a distinct identifier to technology that can be further dfescribed, and knowing the distinct name, such information can be shared and collaborated with. 
 
----------------------------------------------------------------------
 
I think this is the right idea. I will discuss hosting with Drew. I certainly have the gear and domains to put this on a vendor neurtal site, but unless this is hosted on the "sanctioned" site, it is just Ken and Ernie posting a list.
 
 
---------------------------------------------------------------------------------------
 
If Drew says that my site, or Ken's site, or a new, unnamed site, will be the source for EVERYTHING, then it will work. We cannot decouple the open source content as being distinct from that which has a vendor. In practice, most if not all of the commercial software has some element of open source anyway. If we get smart at naming stuff, do we want to actually name as follows -
 
commercial product ->contains->open source product
 
In practice, some commercial products are actually aliases for ana amalgum of open source components.
 
 
 
the above is conceptual, but stresses the realistic importance of maintaining a singular, trusted and sanctioned source. Otherwise, my data is readily available and has been under a CC license for a year.
 
 
---------------------------------------------------------------------------------------
 
Regarding data, I store it across 4 MySQL databases in a few dozen tables. The CPE friendly output is the result of a ten way inner join. I could generate a join table that represents ONLY those fields that we need to construct a CPE name and validate it with an artifact, like a hash, a URL, a license file, etc. An XML schemal works as well if we all agree on a simple schema for name synch, not data storage.
 
Granted, once you have the name, you can query my database across about 4 billion records to investigate trending, software usage, patterns, etc. By having a standard name, I can expose my web service to certain queries without just synching my DB.
 
 
From what I have seen, I may currently have the single largest CPE compliant implementation. It needs endorsement from the community of users, automatic integration into the big database, and a facility with which we can query the data.
 
The data is currently maintained as updates to the database. I could either push XML updates, or synch tables, or push SQL changes.
 
I am certain that the volume of records that I have may be fraught with inconsistencies and errors. However, the data has been copiously reviewed by a staff of 50, and is at least of quality equivalent to what we have. If we agree on a way to accept this data, perhaps we can agree on a way of accepting a "non-static" dictionary. If the dictionary were a dynamic point in time representation of our accumulated data, stored in a database or series of databases, queried by approved memebers through a secured web service, we can all collaboratively grow this data with less bottlenecks.
 
 
---------------------------------------------------------------------------------------
 
Trend Analysis -
 
10 years ago, open source reported vulnerabilities represented less than 30% of all issues
Currently, over 55%.
 
The linear trend will have 80% of all vulnerabilities reported against open source within 6 years.
 
 
 
 
There are over 500,000 open source software projects worldwide. There are an average of 8 recognized releases per project, so with potentially 4,000,000 releases to be named, this is a large task.
 
A number of analysts quoting large corporate buyers have cited a trend that will be reflected within 5 years. What was confirmed is the reality that 80% of software in use by government and enterprise will be open source based, and 50 - 80% of that will be delivered as a web service - software kept on a remote server, and only the service experienced as the result of an interaction with a web browser.
 
In summary, this tells us that the importance that we currently put on vendor supported names will have much less relevance in the real use of technology assets over the next half decade. If we don't embrace an understanding of the real inpact of open source within our computing world, then CPE will continue to be primarily a naming system for commercial apps and those things that NVD finds.
 
 
On Fri, Jun 6, 2008 at 1:48 PM, Ken Lassesen <[hidden email]> wrote:

I have the skills to do so --- and can host the webservice/website  on a non-vendor related domain (Lassesen.com  OR reddwarfdogs.com )

 

Some basic questions:

·         What database are you using?  If you can dump all of your data as XML, then it's a meaningless question

·         For updates to the database what is your plan?

o   Update it manually via an interface on the website?

o   Upload a delta as Xml?

·          

 

Ken Lassesen,

Home/Office: 360-724-3190 Fax: 952-516-5077
Cell: 360-509-2402  Skype: Ken.Lassesen

IM: [hidden email]  http://www.linkedin.com/in/lassesen

CONFIDENTIALITY NOTICE

The information contained in this electronic message may contain confidential and privileged information and is intended only for use by the individual(s) or entity(ies) to whom it was addressed. Any unauthorized review, use, disclosure, or distribution of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and permanently delete and destroy the original message.

 

From: Ernest Park [mailto:[hidden email]]
Sent: Thursday, June 05, 2008 1:05 PM
To: [hidden email]
Subject: [CPE-DISCUSSION-LIST] OSS CPEs

 

I have a dictionary of a few hundred thousand OSS project names with
metadata and releases.

If someone writes the web service front end, I will publish all of this
to a database available to the service via the web. Basically, I have
most of open source software in CPE format.

Any volunteers?


Ernie

 

On Wed, Jun 4, 2008 at 1:15 PM, Buttner, Drew <[hidden email]>
wrote:


I like your approach here and this is a perfect use of CPE.  You have
created a schema for your database that uses the CPE Name to id
platform information.  This will theoretically allow others to interact
with your database using a CPE Name, or will allow you to interact with
other data sources via CPE Name.

The "alias" feature is right along the lines of what we discussed at
Developer Days.  Nice!

Thanks
Drew



Ernest Park-2

Re: OSS CPEs

Reply Threaded More More options
Print post
Permalink
In reply to this post by Andrew Buttner

Yes -

1. The contribution from acknowledged "authoritative" sources for
volumes of CPE compliant identifiers is important. There needs to be a
format to identify certain contributors as authoritative. Second, there
needs to be a minimum requirement for what form the data needs to be
submitted in.

2. Value add extended metadata is nice, but clearly beyond the scope of
what CPE is. Adding metadata to a CPE identifier is the benefit of a
broad CPE dictionary, it is not the responsibility of the dictionary to
evolve into a database. We should all be using CPE names so we can make
our apps talk to each other - like inventory list from one product
driving rules and policy creation on the other, and so on. I track
large amounts of extended metadata associated with CPEs. That data is
my value add, but the common element is the CPE name. It is my hope
that someone could use a reporting tool to query my data with their
inventory list and get rich and relevant information. Again, without
the common identifier, each integration would take manual mapping,
making reporting custom and difficult each time.

Even if some contributed content is not perfect, it is better than
nothing, and by attracting usage and more eyes to a broader source of
names, it will quickly get closer to the goal.

On Fri, Jun 6, 2008 at 3:18 PM, Buttner, Drew <[hidden email]> wrote:
All,

I think this work is a huge help for CPE and will get us much further
down the road than where we are today.  But I'd like to scale back a
little of what I think I am reading.  CPE as a project is focused on
the naming specification and hosting the Official CPE Dictionary.  This
dictionary should be focused on providing a list of all known CPE
names, similar in scope to the CVE list.  Having these names available
to the community will enable external application to stand up and
support added metadata.

What I think I am reading fits into two very different jobs going
forward.  First is the submission to the Official CPE Dictionary of the
CPE Names for the open source platforms you have knowledge about.
Second is work on an application outside of CPE that provides a
database (keyed off of CPE Name) of appended metadata.

Is this understanding correct?  If so, I would really like CPE as a
project to focus on the first step.  Agree?

Thanks!
Drew

 
Thomas R. Jones

Re: OSS CPEs

Reply Threaded More More options
Print post
Permalink
In reply to this post by Ernest Park-2
Some javascript/style in this post has been disabled (why?)
Responses inline. 

Sent from my iPhone

On Jun 6, 2008, at 2:21 PM, Ernest Park <[hidden email]> wrote:

Hi Tom, notes inline.

On Fri, Jun 6, 2008 at 2:51 PM, Thomas R. Jones <[hidden email]> wrote:
Hello ernest,

I have a few reservations. First of all, I am one of a small minority of open source researchers and contributors to cpe. So I would like to extend a welcome to you and your colleagues. Second, the vast amount of contributions is almost disconcerning. I am sure yourself and your colleagues have worked diligently to provide a much needed service to this community. And I for one thank you!
 

However, what you propose is very difficult to envision on such a scale. No one in the community, that I know of, has had an opprtunity to evaluate the contributions proposed. This should be a pre-requisite before anyone jumps on board. A view of the database structure is vital.
 
Why do you need to view the data??

The data is what is relevant. If I, and others that may possibly contribute, are not allowed to have access to said data then it is difficult to provide our support. 

As an analogy, would you buy a car if you not only could not see it but also not drive it? 

There are many many reasons that any one of us may want to obtain a subset of data. 

CPE is not a database or a schema. It is a string identifier format for distinct technology elements - nothing more. The idea at the end of the day is to provide a dictionary of names. The data underlying that is irrelevant, may be proprietary, and may have nothing to do with defining a name. I continually see the problem of CPE that we all fall into the mistake of making it something more than it is. CPE is a phone book - a set of distinct and human friendly identifiers for technology assets, nothing more.
 
If I can provide you with Vendor, Applicatioon, Title, Release, URL, maybe an MD5, as part of a query, then it is the result set you should be looking at.

This statement relates to the first question. The subset IS what is important. But how the data is obtained is also in question. I simply would like to see the SQL structure. What type of tables are utilized? Can they be easily restructured? Are we inhibited by the structure to not provide future advancements within the standard? May this data be replicated? Does the SQL structure take into consideration internationalization?

I could easily pose a few questions to you regarding the database and informational manipulation if you would prefer.

 
Also, if I can provide something that nobody else has provided, why not use it until it is contested? If not, the database is perpetually bottlenecked by a subjective approval process that due to realistic limitations will never grow as fast as the growth in new open source projects over any measure of time.

I applaud you and your colleagues contribution. However a standard MUST undergo an official review and proposal process. Otherwise it is just another run-of-the-mill project to "put out the fires" of today's problems. 

 
Naming open source is a problem that will require an open community approval process to function. The database needs to be able to grow as fast as possible, allowing voluminous contributions from certain trusted partners.

The speed at which the cpe database "grows" is irrelevant. The quality of the data that it possesses is of paramount importance. 

As well, who determines what entails a "trusted partner"? How is this status obtained? Who authorizes or denies such claims? 

It is my business to research and catalog open source software. My work is cited by every major analyst every week. Not discounting the work of your team, but I implore you to "qualify" certain contributors as "authoritative" in order to allow growth.

I would love to engage in further discussions of an "authoritive" entity. There has in fact been previous discussions regarding the authoritative subject for open source products. It should be available within the mailinglist archives. However, maybe a review and/or re-discussion is due. I would happily contribute to such. 

Further, you should specify the minimum acceptable data required to satisfy a valid entry - like a naming API.
 
We, as you surely understand, are all putting valuable time into the standard. And to facilitate further development within your proposal; we must be able to ensure that the "project" is not flawed within design or structure constraints. 
 
 
 
My experience is that the community as a whole does a good job of cleaning and managing a system. By publishing it all, but in a community maintainable "wiki" format for name associations, the community can resolve the dictionary dynamically, without impeding its growth and immediate value.
 

Furthermore, there should be a community discussion of the stewardship of such a project. The notion that "EVERYTHING" be authoritative through this project is ambitious but wholly flawed. There should be an overwheling discussion of such aspects and subsequent requests before such proposals may be presented. 

I look forward to seeing and hearing of the magnitude of contributions that you and your colleagues may provide. Thank you once again. 

Sent from my iPhone

On Jun 6, 2008, at 1:19 PM, Ernest Park <[hidden email]> wrote:

To Ken - thanks! I will contact you for help to sort this out. I think that the CPE dictionary needs to be a real time dynamic framework that conforms to a URL resolution of a name query. Such resolution would allow information providers to "append" metadata to any record in a uniform format, and clarify that the primary reason of CPE to exist is to provide a distinct identifier to technology that can be further dfescribed, and knowing the distinct name, such information can be shared and collaborated with. 
 
----------------------------------------------------------------------
 
I think this is the right idea. I will discuss hosting with Drew. I certainly have the gear and domains to put this on a vendor neurtal site, but unless this is hosted on the "sanctioned" site, it is just Ken and Ernie posting a list.
 
 
---------------------------------------------------------------------------------------
 
If Drew says that my site, or Ken's site, or a new, unnamed site, will be the source for EVERYTHING, then it will work. We cannot decouple the open source content as being distinct from that which has a vendor. In practice, most if not all of the commercial software has some element of open source anyway. If we get smart at naming stuff, do we want to actually name as follows -
 
commercial product ->contains->open source product
 
In practice, some commercial products are actually aliases for ana amalgum of open source components.
 
 
 
the above is conceptual, but stresses the realistic importance of maintaining a singular, trusted and sanctioned source. Otherwise, my data is readily available and has been under a CC license for a year.
 
 
---------------------------------------------------------------------------------------
 
Regarding data, I store it across 4 MySQL databases in a few dozen tables. The CPE friendly output is the result of a ten way inner join. I could generate a join table that represents ONLY those fields that we need to construct a CPE name and validate it with an artifact, like a hash, a URL, a license file, etc. An XML schemal works as well if we all agree on a simple schema for name synch, not data storage.
 
Granted, once you have the name, you can query my database across about 4 billion records to investigate trending, software usage, patterns, etc. By having a standard name, I can expose my web service to certain queries without just synching my DB.
 
 
From what I have seen, I may currently have the single largest CPE compliant implementation. It needs endorsement from the community of users, automatic integration into the big database, and a facility with which we can query the data.
 
The data is currently maintained as updates to the database. I could either push XML updates, or synch tables, or push SQL changes.
 
I am certain that the volume of records that I have may be fraught with inconsistencies and errors. However, the data has been copiously reviewed by a staff of 50, and is at least of quality equivalent to what we have. If we agree on a way to accept this data, perhaps we can agree on a way of accepting a "non-static" dictionary. If the dictionary were a dynamic point in time representation of our accumulated data, stored in a database or series of databases, queried by approved memebers through a secured web service, we can all collaboratively grow this data with less bottlenecks.
 
 
---------------------------------------------------------------------------------------
 
Trend Analysis -
 
10 years ago, open source reported vulnerabilities represented less than 30% of all issues
Currently, over 55%.
 
The linear trend will have 80% of all vulnerabilities reported against open source within 6 years.
 
 
 
 
There are over 500,000 open source software projects worldwide. There are an average of 8 recognized releases per project, so with potentially 4,000,000 releases to be named, this is a large task.
 
A number of analysts quoting large corporate buyers have cited a trend that will be reflected within 5 years. What was confirmed is the reality that 80% of software in use by government and enterprise will be open source based, and 50 - 80% of that will be delivered as a web service - software kept on a remote server, and only the service experienced as the result of an interaction with a web browser.
 
In summary, this tells us that the importance that we currently put on vendor supported names will have much less relevance in the real use of technology assets over the next half decade. If we don't embrace an understanding of the real inpact of open source within our computing world, then CPE will continue to be primarily a naming system for commercial apps and those things that NVD finds.
 
 
On Fri, Jun 6, 2008 at 1:48 PM, Ken Lassesen <[hidden email][hidden email]> wrote:

I have the skills to do so --- and can host the webservice/website  on a non-vendor related domain (Lassesen.com  OR reddwarfdogs.com )

 

Some basic questions:

·         What database are you using?  If you can dump all of your data as XML, then it's a meaningless question

·         For updates to the database what is your plan?

o   Update it manually via an interface on the website?

o   Upload a delta as Xml?

·          

 

Ken Lassesen,

Home/Office: 360-724-3190 Fax: 952-516-5077
Cell: 360-509-2402  Skype: Ken.Lassesen

IM: [hidden email][hidden email]  http://www.linkedin.com/in/lassesen

CONFIDENTIALITY NOTICE

The information contained in this electronic message may contain confidential and privileged information and is intended only for use by the individual(s) or entity(ies) to whom it was addressed. Any unauthorized review, use, disclosure, or distribution of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and permanently delete and destroy the original message.

 

From: Ernest Park [mailto:[hidden email][hidden email]]
Sent: Thursday, June 05, 2008 1:05 PM
To: [hidden email][hidden email]
Subject: [CPE-DISCUSSION-LIST] OSS CPEs

 

I have a dictionary of a few hundred thousand OSS project names with
metadata and releases.

If someone writes the web service front end, I will publish all of this
to a database available to the service via the web. Basically, I have
most of open source software in CPE format.

Any volunteers?


Ernie

 

On Wed, Jun 4, 2008 at 1:15 PM, Buttner, Drew <[hidden email][hidden email]>
wrote:


I like your approach here and this is a perfect use of CPE.  You have
created a schema for your database that uses the CPE Name to id
platform information.  This will theoretically allow others to interact
with your database using a CPE Name, or will allow you to interact with
other data sources via CPE Name.

The "alias" feature is right along the lines of what we discussed at
Developer Days.  Nice!

Thanks
Drew



or will allow you to interact with
other data sources via CPE Name.

The "alias" feature is right along the lines of what we discussed at
Developer Days.  Nice!

Thanks
Drew



Ernest Park-2

Re: OSS CPEs

Reply Threaded More More options
Print post
Permalink
Please keep in mind that I am deeply involved with managing and maintaining distinct records for millions of releases and billions of files and related components. I believe that what CPE represents is incredibly important.
 
 
Comments below -

On Fri, Jun 6, 2008 at 3:56 PM, Thomas R. Jones <[hidden email]> wrote:
Responses inline. 
Sent from my iPhone
On Jun 6, 2008, at 2:21 PM, Ernest Park <[hidden email]> wrote:
Hi Tom, notes inline.
On Fri, Jun 6, 2008 at 2:51 PM, Thomas R. Jones <[hidden email]> wrote:
Hello ernest,
I have a few reservations. First of all, I am one of a small minority of open source researchers and contributors to cpe. So I would like to extend a welcome to you and your colleagues. Second, the vast amount of contributions is almost disconcerning. I am sure yourself and your colleagues have worked diligently to provide a much needed service to this community. And I for one thank you!
 
However, what you propose is very difficult to envision on such a scale. No one in the community, that I know of, has had an opprtunity to evaluate the contributions proposed. This should be a pre-requisite before anyone jumps on board. A view of the database structure is vital.
 
Why do you need to view the data??
The data is what is relevant. If I, and others that may possibly contribute, are not allowed to have access to said data then it is difficult to provide our support. 
As an analogy, would you buy a car if you not only could not see it but also not drive it? 
 
There are many many reasons that any one of us may want to obtain a subset of data. 
 
 
The analogy is incorrect. The CPE, despite the discussions here, is intended by its own definition to be an identifier, a URI - like string. In your analogy, this merely means that if I were buying a car, I would want a license plate that distinctly identified my car. Any additional data would be stored in my car, separate from that record with the unique identifier.
 
The problem when we make CPE into a complex database is that we blur so much the lines of what it is and is not that we dissuade contributions and usage by the community.
 
The CPE is a name that points to something, and with an inferred relational hierarchy in the name.
 
If I want to deploy a database that supports CPE 1.x query, you do NOT need to qualify the database. If I offer to provide, or keep secret, anything beyond those elements which distinctly confirm a valid name and its association with a distinct technology component, that should be sufficient.
 
 
When we try to make CPE something it is not, it will never be what it can be. If it is merely a naming identifier, it becomes a unification point for data from multiple providers. I could allow software companies to query my data. They may invite me to query theirs. The common unification is the name.
 
Nothing should matter to CPE beyond a valid name and association to a distinct element no more than the DMW cares about what fuel you run in the car.
 
 
CPE is not a database or a schema. It is a string identifier format for distinct technology elements - nothing more. The idea at the end of the day is to provide a dictionary of names. The data underlying that is irrelevant, may be proprietary, and may have nothing to do with defining a name. I continually see the problem of CPE that we all fall into the mistake of making it something more than it is. CPE is a phone book - a set of distinct and human friendly identifiers for technology assets, nothing more.
 
If I can provide you with Vendor, Applicatioon, Title, Release, URL, maybe an MD5, as part of a query, then it is the result set you should be looking at.
This statement relates to the first question. The subset IS what is important. But how the data is obtained is also in question. I simply would like to see the SQL structure. What type of tables are utilized? Can they be easily restructured? Are we inhibited by the structure to not provide future advancements within the standard? May this data be replicated? Does the SQL structure take into consideration internationalization?
 
 
From the CPE homepage (http://cpe.mitre.org)
 
CPE™ is a structured naming scheme for information technology systems, platforms, and packages. Based upon the generic syntax for Uniform Resource Identifiers (URI), CPE includes a formal name format, a language for describing complex platforms, a method for checking names against a system, and a description format for binding text and tests to a name.
 
 
There is not reference to SQL structure in the definition of CPE, nor a reference implementation. CPE is NOT a database or a data storage system of any kind. CPE does not denote a schema, but such information can be stored in a number of formats while still containing CPE compliant information.
 
I am sure Symantec and McAfee store proprietary information along with having those components that support CPE in their data repositories, but they would not more open these databases to review than I will. If CPE is a name identifier constrained by elements, if I can provide the elements, perhaps:
 
vendor, URL, application, app home page, release, release file name and URL, MD5 for release file,
 
 
any string containing components from above is an identifier.
 
 
I could easily pose a few questions to you regarding the database and informational manipulation if you would prefer.
What are the fields required in order to accept a third party contribution of a CPE name?
 
 
 
Also, if I can provide something that nobody else has provided, why not use it until it is contested? If not, the database is perpetually bottlenecked by a subjective approval process that due to realistic limitations will never grow as fast as the growth in new open source projects over any measure of time.
I applaud you and your colleagues contribution. However a standard MUST undergo an official review and proposal process. Otherwise it is just another run-of-the-mill project to "put out the fires" of today's problems. 
 
If this process does not accept open participation from the community for submission volume in size with the growth and expansion of the market that we are describing, the process is inherently flawed, and the open source community and commercial vendors will be compelled to solve this issue.
 
 
Naming open source is a problem that will require an open community approval process to function. The database needs to be able to grow as fast as possible, allowing voluminous contributions from certain trusted partners.
The speed at which the cpe database "grows" is irrelevant. The quality of the data that it possesses is of paramount importance. 
 
It is a self limiting repository that will become less relevent over time if it cannot effectively describe the "market" of objects that it represents. If it only describes a quality subset, then it becomes a flawed and subjective list, and will force the commercial market to come up with something faster, better, and able to adapt to the growth in certain parts of the technology market and our need to universally describe these pieces.
 
As well, who determines what entails a "trusted partner"? How is this status obtained? Who authorizes or denies such claims? 
 
Why do we accept information from a commercial vendor as being authoritative, yet professional open source and commercial software researchers do not get offered this trust?
 
It is my business to research and catalog open source software. My work is cited by every major analyst every week. Not discounting the work of your team, but I implore you to "qualify" certain contributors as "authoritative" in order to allow growth.
I would love to engage in further discussions of an "authoritive" entity. There has in fact been previous discussions regarding the authoritative subject for open source products. It should be available within the mailinglist archives. However, maybe a review and/or re-discussion is due. I would happily contribute to such. 
 
Please feel free to reach out to me privately for further discussion.
 
I can be reached at [hidden email] .
 
Ernest Park-2

Re: OSS CPEs

Reply Threaded More More options
Print post
Permalink
In reply to this post by Thomas R. Jones
Hi Tom,
 
 
Exactly what data elements are required in order to satisfactorily deliver a contribution for a single CPE name?
 
 
  1. Vendor
  2. Vendor URL
  3. Application Name
  4. Title
  5. Release
What else?
 
I will send you fully qualified name strings in a data format specified, or I will populate a SQL database if you can provide a standardized table format.
 
 
Keep in mind, part of the data is proprietary. If I extract the data and decouple it from proprietary information in a way that satisfies CPE contribution requirements, I can do that.
 
Tom, what is missing is that we want to look under the hood without defining what is being looked for. Instead, give me a clear and definite format - comma separated, SQL, etc, and I will send a sample of compliant data for review.
 
 
 
Ernie
 
 
 
On Fri, Jun 6, 2008 at 12:42 PM, Thomas R. Jones <[hidden email]> wrote:
On Fri, 2008-06-06 at 16:26 -0400, Ernest Park wrote:
> Please keep in mind that I am deeply involved with managing and
> maintaining distinct records for millions of releases and billions of
> files and related components. I believe that what CPE represents is
> incredibly important.
>
>
> Comments below -
>
>
> On Fri, Jun 6, 2008 at 3:56 PM, Thomas R. Jones
> <[hidden email]> wrote:
>
>         Responses inline.
>
>         Sent from my iPhone
>         On Jun 6, 2008, at 2:21 PM, Ernest Park
>         <[hidden email]> wrote:
>
>
>
>         > Hi Tom, notes inline.
>         >
>         >
>         > On Fri, Jun 6, 2008 at 2:51 PM, Thomas R. Jones
>         > <[hidden email]> wrote:
>         >
>         >         Hello ernest,
>         >
>         >         I have a few reservations. First of all, I am one of
>         >         a small minority of open source researchers and
>         >         contributors to cpe. So I would like to extend a
>         >         welcome to you and your colleagues. Second, the vast
>         >         amount of contributions is almost disconcerning. I
>         >         am sure yourself and your colleagues have worked
>         >         diligently to provide a much needed service to this
>         >         community. And I for one thank you!
>         >
>         >
>         >
>         >         However, what you propose is very difficult to
>         >         envision on such a scale. No one in the community,
>         >         that I know of, has had an opprtunity to evaluate
>         >         the contributions proposed. This should be a
>         >         pre-requisite before anyone jumps on board. A view
>         >         of the database structure is vital.
>         >
>         > Why do you need to view the data??
>
>         The data is what is relevant. If I, and others that may
>         possibly contribute, are not allowed to have access to said
>         data then it is difficult to provide our support.
>
>         As an analogy, would you buy a car if you not only could not
>         see it but also not drive it?
>
>
>         There are many many reasons that any one of us may want to
>         obtain a subset of data.
>
>
>
> The analogy is incorrect. The CPE, despite the discussions here, is
> intended by its own definition to be an identifier, a URI - like
> string. In your analogy, this merely means that if I were buying a
> car, I would want a license plate that distinctly identified my car.
> Any additional data would be stored in my car, separate from that
> record with the unique identifier.

No. The analogy is correct. How may I know that my paint job is in fact
a particular color if I may not see it? How do I know that my automobile
is in fact made by a particular automaker if I can not see the emblem?
How may I be assured that particular safety features I may rely on, if I
can not definitively say are there for my utilization?

>
> The problem when we make CPE into a complex database is that we blur
> so much the lines of what it is and is not that we dissuade
> contributions and usage by the community.

This is a political view and/or opinion that does not need to be brought
to light within the conversation. Lest you forget, that I am too an open
source contributor. I know all to well the complexities of contributing
to a vendor majority sponsored standard. In fact I have done so through
many standards within the w3c and IEEE communities. But we try as much
as possible to reduce the amount of seclusion and segregation as this.
And i'll be honest in my opinion that Mitre and the individuals charged
with this project have done an outstanding job doing so! ;)

>
> The CPE is a name that points to something, and with an inferred
> relational hierarchy in the name.
>
> If I want to deploy a database that supports CPE 1.x query, you do NOT
> need to qualify the database. If I offer to provide, or keep secret,
> anything beyond those elements which distinctly confirm a valid name
> and its association with a distinct technology component, that should
> be sufficient.

But you are asking the community to put forth faith in an infrastructure
that we have not seen. How can we do that? Is it an IP issue that may be
at hand? Im sure that anyone here would put forth signatory recognition
of an NDA if need be. Or do we just blindly go forth?

>
>
> When we try to make CPE something it is not, it will never be what it
> can be. If it is merely a naming identifier, it becomes a unification
> point for data from multiple providers. I could allow software
> companies to query my data. They may invite me to query theirs. The
> common unification is the name.
>
> Nothing should matter to CPE beyond a valid name and association to a
> distinct element no more than the DMW cares about what fuel you run in
> the car.
>
>
>
>         > CPE is not a database or a schema. It is a string identifier
>         > format for distinct technology elements - nothing more. The
>         > idea at the end of the day is to provide a dictionary of
>         > names. The data underlying that is irrelevant, may be
>         > proprietary, and may have nothing to do with defining a
>         > name. I continually see the problem of CPE that we all fall
>         > into the mistake of making it something more than it is. CPE
>         > is a phone book - a set of distinct and human friendly
>         > identifiers for technology assets, nothing more.
>         >
>         > If I can provide you with Vendor, Applicatioon, Title,
>         > Release, URL, maybe an MD5, as part of a query, then it is
>         > the result set you should be looking at.
>
>         This statement relates to the first question. The subset IS
>         what is important. But how the data is obtained is also in
>         question. I simply would like to see the SQL structure. What
>         type of tables are utilized? Can they be easily restructured?
>         Are we inhibited by the structure to not provide future
>         advancements within the standard? May this data be replicated?
>         Does the SQL structure take into consideration
>         internationalization?
>
>
>