|
|
|
Steven M. Christey-2
|
All,
We've significantly modified the schema for Draft 9. The primary driver was to improve support for multiple views, and to better distinguish between the different types of elements that we are covering in CWE. Thanks to Sean Barnum for figuring out the bulk of this. The MITRE team took his inputs and made some small tweaks here and there. As CWE is at a crossroads with respect to the schema, we welcome any feedback or alternatives to our current approaches. Specifically, while we have chosen XML so far, we are open to leveraging other techniques to storing and working with the data, if those techniques are more effective. For example, if it makes sense to store CWE in a database and use an application server to help present and link everything together, we are open to pursuing that. We also plan to investigate RDF, XGGML, and other languages that might be more directly supportive of graph-based relationships. Please note that even if we stay with XML and related technologies, we expect that the schema will still need to change a little bit. However, we believe that one requirement for "CWE 1.0" is to have stable schema. In Draft 9, we are definitely a lot closer than we were. Bob and I will post our requirements for "CWE 1.0" once they've been finalized. For Draft 9, some of the highest level schema changes are covered here: http://cwe.mitre.org/data/reports/diff_xsd_10_3.0.html The rest of this document assumes that you read the preceding document. Any and all feedback would be appreciated, especially if there are still outstanding issues in the schema that prevent you from using CWE as extensively as you would want to. Schema Evaluation Criteria -------------------------- Here are some of the criteria that I think we should be applying while finalizing the schema: - Expressiveness: we should be able to express everything that we want to. In Draft 9, some examples of this are the creation of explicit views, and the requirement for relationships to specify the views they are part of. But, we still don't have a way of saying things like "this issue theoretically affects any language that performs direct memory management, but it's especially common in C." That's important, because if C is not explicitly mentioned in an element, then that element won't be part of the C language view. - Extraction: it should be as easy as possible for CWE users to extract the data that they want, using commonly available XML parsers and related tools. In Draft 9, the relevant data for named chains are not necessarily easy to extract. - Maintenance - minimize maintenance costs: the MITRE team, and outside contributors, should be able to quickly represent the necessary information. - minimize preventable errors in data entry: we want to minimize errors in the CWE representation that cannot be caught by an XML validator, but nonetheless require consistency. - minimize XML "bloat": this is hopefully self-explanatory. The relationships in Draft 9 might exhibit some bloat, although at the same time, there's a major benefit to their increased expressiveness. - Flexibility: ideally, the schema would remain stable, while allowing us to build in additional capabilities. For Draft 9, we believe that we've added flexibility for defining new kinds of relationships and views. The introduction of compound elements will hopefully allow us to support other kinds of concepts besides chains and composites that might arise in the future; for example, some CWE nodes are really talking about multiple distinct issues and could be called "loose composites." In light of these criteria, I wanted to explain some of the rationale for the schema changes, and what we have left ahead of us for CWE 1.0. Views ----- We added a number of views to CWE Draft 9. For the most part, this involved converting weakness/"groupings" from Draft 8, into the new Views type for draft 9. See http://cwe.mitre.org/data/index.html for a list of views. Slices are basically lists of elements, without any relationships between them. Membership in a slice can be explicit or implicit. In explicit slices, all the relevant entries have some ChildOf relationship where the View node is the parent; see CWE-630 (Weaknesses Examined by SAMATE) and CWE-635 (Weaknesses Used by NVD) for examples. In implicit slices, the slice has some filtering criteria that define membership, and there aren't any relationships within the XML that are explicitly defined. For example, CWE-658 is a slice that covers weaknesses found in the C language. This implicit slice has a Filter that specifies that member entries have "C" under the Applicable_Platforms field. The Comprehensive CWE Dictionary view, CWE-2000, is actually an implicit slice that selects everything from CWE by using a filter that always returns true. Views can also be graphs, such as CWE-1000 (Natural Hierarchy). Currently, graphs are expected to have explicit ChildOf relationships within the member elements. Before Draft 9, everything was effectively under the Natural Hierarchy. In Draft 9, however, some of those elements have been removed from the Natural Hierarchy altogether, like deprecated nodes and the resource-based view. We suspect that some individual views might be best described as a combination of slices *and* graphs, with a combination of implicit or explicit membership. A view might be best expressed via some set of explicit relationships (maybe between some implicit slices), then defaulting to the relationships of a different view at some point. The most concrete example of this is CWE-631 (Resource-specific Weaknesses), at: http://cwe.mitre.org/data/graphs/631.html The higher-level nodes have explicit relationships defined within View 631. Its children - such as the Category node CWE-632 (Weaknesses that Affect Files or Directories) - have explicitly specified children such as CWE-22 (Path Traversal). That is, Path Traversal has an explicit "ChildOf CWE-632" relationship. However, instead of the explicit relationships, CWE-632 could potentially be defined as an implicit slice of "all elements that have an Affected_Resource field of File/Directory." That would reduce maintenance costs and improve accuracy, but it is not possible in Draft 9, because CWE-632 is a Category type - it's *in* a view, but not a view itself. In addition, the resource-based view, CWE-631, could be more comprehensive by "view hopping." In Draft 9, CWE-631 stops at CWE-22 (Path Traversal), but there are several children under CWE-22 that would also match - except those children are only listed under the natural hierarchy (view CWE-1000). It would probably be quite tedious and error-prone just to copy all the natural hierarchy relationships over to this new view. This might be best handled by allowing views to link to each other, but this is not possible in Draft 9. In addition, the "hops" might wind up including elements that were not intended. Finally, we have encountered some difficulties in generating a "Comprehensive Graph" that merges all views together - the natural hierarchy, the resource-based graph, the language-specific slices, etc. So, there isn't a single graph on the CWE web site that covers the entire CWE. We do have a PDF file that contains most nodes; it focuses on the natural hierarchy (CWE-1000), and all other nodes are effectively "orphans." We don't necessarily have to solve this problem for a comprehensive view - after all, it's not clear who would have a need for such a thing - but I thought it was worthwhile to mention. Relationships ------------- The expression of relationships has changed significantly for Draft 9. Much of this is covered by the schema diff report listed at the top of this document, but there are some fields that I wanted to highlight. Relationship_Type: The Draft 8 version of "Relationship_Type" has been renamed to "Relationship_Nature". The Draft 9 version of this field is intended to identify the type of the entry that is being linked to. Since we now have multiple types of entries in CWE, this field might be useful in simplifying some extraction and presentation logic for XSLT's. We have not needed this field in generating the web site for Draft 9, although it might be convenient for others. However, this field is currently being manually maintained, and this value was often incorrect, because we changed the types of a number of elements in Draft 9, which immediately invalidated this field in dozens of relationships. We are able to perform a consistency check to ensure that these values are correct before release, but it's still a little bit of labor. As a result, we will be looking at this field more closely, trying to balance utility to the community with maintenance costs to the CWE team. Relationship_View_IDs: We anticipate that, in the future, we will have multiple views that share a lot of the same structure. As one example - CWE's Natural Hierarchy (CWE-1000) is beginning to diverge more from the Seven Pernicious Kingdoms (SPK) way of organizing the world, so it might be reasonable to create a view into CWE that's useful for people who are knowledgeable about SPK. The Natural Hierarchy and an SPK view would probably have a lot of different elements near the top of the tree, but they would share a lot at a lower level. With closely overlapping views, this would produce a large number of duplicate relationships that might contribute significantly to XML bloat. The MITRE team decided that allowing multiple Relationship_View_IDs would be a useful shorthand that might be easier to maintain. Current Challenges ------------------ Here are some of the current challenges that we still face, and plan to resolve by CWE 1.0. 1) The Draft 9 schema does not have the expressiveness to define the more complex views, and there are some associated maintenance costs, as outlined in the previous sections. 2) Chains and composites, views, and categories all have some overlapping uses that we'd like to clarify and, to the degree possible, unify. For example, both chains and composites involve a small selection of entries from CWE, and dictate relationships between them. In this sense, they can be regarded as views - perhaps micro-views. Yet, we expect that they will have a distinct and important role throughout CWE. As another example, the resource-based view (CWE-632) has children that are categories. These categories might be best described by defining what their membership should be, but in Draft 9, this type of automatic population is only possible through filters in View elements. So, we had to manually create ChildOf relationships. 3) Relationship Directionality Some views, like CWE-635 (Weaknesses Used by NVD), are defined more by external criteria than anything that is implicit within individual nodes, so these are explicit slices. In terms of maintenance costs and ease of extraction, it might be best for CWE-635 to explicitly state what its "members" are. Instead, each member has a ChildOf relationship, with View_ID=635, that is a ChildOf 635. Thus, maintenance of the NVD slice is done not by operating on the slice itself, but by operating on its individual members. This proved to be moderately expensive for us to do when we changed the membership of the SAMATE view in Draft 8 - it took an hour or so to edit some nodes to remove the SAMATE relationship, and then edit other nodes to add the SAMATE relationship; if we could just edit the SAMATE list directly, it would have been a 5-minute task. However, as I understand it, one of the mantras of knowledge management is that data is kept as close to individual nodes as possible; but relationships "belong" to multiple nodes, even though in Draft 9 they are only explicit in one node. It would be possible for us to automate some of those maintenance tasks, but that would involve additional development. Also, we have multiple relationships that are mutual, but only expressed in one direction. For example, "X ChildOf Y" might be specified in the XML, which implies "Y ParentOf X" - but we have no ParentOf relationships that are explicitly stated. The same thing applies for relationships that support chains and composites. As a result, extraction logic can be complicated, because an entry doesn't explicitly know what its children are. As a result of this complexity, the extraction logic can be hard to maintain, and sometimes computationally expensive. We have encountered this problem in various ways while generating web site pages. One possibility would be to create separate XML files and representations for the relationships (and maybe for views), possibly with separate schema. This might preserve expressiveness and simplify maintenance, but it might make it more difficult for some people to extract. 4) Named Chains There are a couple issues with named chains. See http://cwe.mitre.org/data/reports/chains_and_composites.html for background. All the data that's required to determine the links of a named chain are within the XML, so there is sufficient expressiveness. However, extraction is a little more difficult. For a named chain X, the code has to search throughout all of CWE for entries with all the CanPrecede relationships with a Chain_ID of X, then order them appropriately. If you want to know what elements are in a named chain, you HAVE to do this navigation throughout CWE - a named chain does not explicitly state what its links are. Just like composites explicitly state which items they require, it might be reasonable to have named chains explicitly know what their starting links are. In addition, named chains can be difficult to classify, especially under the natural hierarchy. Because named chains are new, we decided not to create a separate view to handle them. We do have a view that lists chain elements (CWE-679), but that view is actually an implicit slice for extracting components of all the CanPrecede relationships, whether they're related to a Named Chain or not. The extraction and presentation logic for presenting chains in general was too complicated for us to handle cleanly by the release of Draft 9, so they are generated by external programs, instead of through XSLT. |
||||||||||||||||
| Free Embeddable Forum Powered by Nabble | Help |