Data Priinciples


The data architecture is worked in the phase C. But, the principles of the data architecture are defined in the Preliminary phase and refined in the Architecture Vision phase. TOGAF recommends some fundamental data principles, which can be tailored and applied for the enterprise in question.

Data is an Asset


Data is costly, and loss of data is costlier. This understanding has to flow through each element of the data architecture.

Statement


Data is an asset that has value to the enterprise and is managed accordingly.

Rationale


Data is a valuable corporate resource; it has real, measurable value. In simple terms, the purpose of data is to aid decision-making. Accurate, timely data is critical to accurate, timely decisions. Most corporate assets are carefully managed, and data is no exception. Data is the foundation of our decision making, so we must also carefully manage data to ensure that we know where it is, can rely upon its accuracy, and can obtain it when and where we need it.

Implications


  • This is one of three closely-related principles regarding data: data is an asset; data is shared; and data is easily accessible. The implication is that there is an education task to ensure that all organizations within the enterprise understand the relationship between value of data, sharing of data, and accessibility to data.
  • Stewards must have the authority and means to manage the data for which they are accountable
  • We must make the cultural transition from "data ownership" thinking to "data stewardship" thinking
  • The role of data steward is critical because obsolete, incorrect, or inconsistent data could be passed to enterprise personnel and adversely affect decisions across the enterprise
  • Part of the role of data steward, who manages the data, is to ensure data quality. Procedures must be developed and used to prevent and correct errors in the information and to improve those processes that produce flawed information. Data quality will need to be measured and steps taken to improve data quality — it is probable that policy and procedures will need to be developed for this as well.
  • A forum with comprehensive enterprise-wide representation should decide on process changes suggested by the steward
  • Since data is an asset of value to the entire enterprise, data stewards accountable for properly managing the data must be assigned at the enterprise level

Data is Shared


Data is meaningless unless it reaches the person who is meant to use it. It has to be shared between the source and target.

Statement:


Users have access to the data necessary to perform their duties; therefore, data is shared across enterprise functions and organizations.

Rationale


Timely access to accurate data is essential to improving the quality and efficiency of enterprise decision-making. It is less costly to maintain timely, accurate data in a single application, and then share it, than it is to maintain duplicative data in multiple applications. The enterprise holds a wealth of data, but it is stored in hundreds of incompatible stovepipe databases. The speed of data collection, creation, transfer, and assimilation is driven by the ability of the organization to efficiently share these islands of data across the organization.

Shared data will result in improved decisions since we will rely on fewer (ultimately one virtual) sources of more accurate and timely managed data for all of our decision-making. Electronically shared data will result in increased efficiency when existing data entities can be used, without re-keying, to create new entities.

Implications


  • This is one of three closely-related principles regarding data: data is an asset; data is shared; and data is easily accessible. The implication is that there is an education task to ensure that all organizations within the enterprise understand the relationship between value of data, sharing of data, and accessibility to data.
  • To enable data sharing we must develop and abide by a common set of policies, procedures, and standards governing data management and access for both the short and the long term
  • For the short term, to preserve our significant investment in legacy systems, we must invest in software capable of migrating legacy system data into a shared data environment
  • We will also need to develop standard data models, data elements, and other metadata that defines this shared environment and develop a repository system for storing this metadata to make it accessible
  • For the long term, as legacy systems are replaced, we must adopt and enforce common data access policies and guidelines for new application developers to ensure that data in new applications remains available to the shared environment and that data in the shared environment can continue to be used by the new applications
  • For both the short term and the long term we must adopt common methods and tools for creating, maintaining, and accessing the data shared across the enterprise
  • Data sharing will require a significant cultural change
  • This principle of data sharing will continually "bump up against" the principle of data security — under no circumstances will the data sharing principle cause confidential data to be compromised
  • Data made available for sharing will have to be relied upon by all users to execute their respective tasks This will ensure that only the most accurate and timely data is relied upon for decision-making. Shared data will become the enterprise-wide "virtual single source" of data.

Data is Accessible


Sharing is meaningless if the shared data is not accessible - to the right user, at the right time.

Statement:


Data is accessible for users to perform their functions.

Rationale


Wide access to data leads to efficiency and effectiveness in decision-making, and affords a timely response to information requests and service delivery. Using information must be considered from an enterprise perspective to allow access by a wide variety of users. Staff time is saved and consistency of data is improved.

Implications


  • This is one of three closely-related principles regarding data: data is an asset; data is shared; and data is easily accessible. The implication is that there is an education task to ensure that all organizations within the enterprise understand the relationship between value of data, sharing of data, and accessibility to data.
  • Accessibility involves the ease with which users obtain information
  • The way information is accessed and displayed must be sufficiently adaptable to meet a wide range of enterprise users and their corresponding methods of access
  • Access to data does not constitute understanding of the data — personnel should take caution not to misinterpret information
  • Access to data does not necessarily grant the user access rights to modify or disclose the data This will require an education process and a change in the organizational culture, which currently supports a belief in "ownership" of data by functional units.

Data Trustee


When everyone is responsible, nobody is responsible. It is important to identify one accountable trustee for any data that we work with.

Statement


Each data element has a trustee accountable for data quality.

Rationale


One of the benefits of an architected environment is the ability to share data (e.g., text, video, sound, etc.) across the enterprise. As the degree of data sharing grows and business units rely upon common information, it becomes essential that only the data trustee makes decisions about the content of data. Since data can lose its integrity when it is entered multiple times, the data trustee will have sole responsibility for data entry which eliminates redundant human effort and data storage resources.

Note: A trustee is different than a steward — a trustee is responsible for accuracy and currency of the data, while responsibilities of a steward may be broader and include data standardization and definition tasks.

Implications


  • Real trusteeship dissolves the data "ownership" issues and allows the data to be available to meet all users’ needs This implies that a cultural change from data "ownership" to data "trusteeship" may be required.
  • The data trustee will be responsible for meeting quality requirements levied upon the data for which the trustee is accountable
  • It is essential that the trustee has the ability to provide user confidence in the data based upon attributes such as "data source"
  • It is essential to identify the true source of the data in order that the data authority can be assigned this trustee responsibility This does not mean that classified sources will be revealed nor does it mean the source will be the trustee.
  • Information should be captured electronically once and immediately validated as close to the source as possible Quality control measures must be implemented to ensure the integrity of the data.
  • As a result of sharing data across the enterprise, the trustee is accountable and responsible for the accuracy and currency of their designated data element(s) and, subsequently, must then recognize the importance of this trusteeship responsibility

Common Vocabulary and Data Definitions


Data is meaningless without the language used to code it. For a boundary-less information flow, it is important that the language is consistent - the terms mean the same across the enterprise.

Statement


Data is defined consistently throughout the enterprise, and the definitions are understandable and available to all users.

Rationale


The data that will be used in the development of applications must have a common definition throughout the Headquarters to enable sharing of data. A common vocabulary will facilitate communications and enable dialog to be effective. In addition, it is required to interface systems and exchange data.

Implications


  • We are lulled into thinking that this issue is adequately addressed because there are people with "data administration" job titles and forums with charters implying responsibility Significant additional energy and resources must be committed to this task. It is key to the success of efforts to improve the information environment. This is separate from but related to the issue of data element definition, which is addressed by a broad community — this is more like a common vocabulary and definition.
  • The enterprise must establish the initial common vocabulary for the business; the definitions will be used uniformly throughout the enterprise
  • Whenever a new data definition is required, the definition effort will be co-ordinated and reconciled with the corporate "glossary" of data descriptions The enterprise data administrator will provide this co-ordination.
  • Ambiguities resulting from multiple parochial definitions of data must give way to accepted enterprise-wide definitions and understanding
  • Multiple data standardization initiatives need to be co-ordinated
  • Functional data administration responsibilities must be assigned

Data Security


Data sharing, accessibility, etc sound great. But, they can be disastrous if not supported by concrete security.

Statement


Data is protected from unauthorized use and disclosure. In addition to the traditional aspects of national security classification, this includes, but is not limited to, protection of pre-decisional, sensitive, source selection-sensitive, and proprietary information.

Rationale


Open sharing of information and the release of information via relevant legislation must be balanced against the need to restrict the availability of classified, proprietary, and sensitive information. Existing laws and regulations require the safeguarding of national security and the privacy of data, while permitting free and open access. Pre-decisional (work-in-progress, not yet authorized for release) information must be protected to avoid unwarranted speculation, misinterpretation, and inappropriate use.

Implications


  • Aggregation of data, both classified and not, will create a large target requiring review and de-classification procedures to maintain appropriate control Data owners and/or functional users must determine whether the aggregation results in an increased classification level. Appropriate policy and procedures will be needed to handle this review and declassification. Access to information based on a need-to-know policy will force regular reviews of the body of information.
  • The current practice of having separate systems to contain different classifications needs to be rethought Is there a software solution to separating classified and unclassified data? The current hardware solution is unwieldy, inefficient, and costly. It is more expensive to manage unclassified data on a classified system. Currently, the only way to combine the two is to place the unclassified data on the classified system, where it must remain.
  • In order to adequately provide access to open information while maintaining secure information, security needs must be identified and developed at the data level, not the application level
  • Data security safeguards can be put in place to restrict access to "view only" or "never see" Sensitivity labeling for access to pre-decisional, decisional, classified, sensitive, or proprietary information must be determined.
  • Security must be designed into data elements from the beginning; it cannot be added later Systems, data, and technologies must be protected from unauthorized access and manipulation. Headquarters information must be safeguarded against inadvertent or unauthorized alteration, sabotage, disaster, or disclosure.
  • New policies are needed on managing duration of protection for pre-decisional information and other works-in-progress, in consideration of content freshness