5 Data Management
5.1 Data Governance
5.1.1
For the avoidance of doubt, the scope under consideration in this section includes the data employed for modelling and validation purposes, not the data employed for regular risk analysis and reporting. This section focuses on the construction of historical data sets for the purpose of modelling.
5.1.2
Accurate and representative historical data is the backbone of financial models. Institutions must implement rigorous and a comprehensive formal data management framework (“DMF”) to ensure the development of accurate models. Institutions must consider DMF as a structured process within the institution, with dedicated policies and procedures, and with the adequate amount of resources and funding. The DMF core principles are as follows:
(i) It must be approved by Senior Management and the Board, (ii) It must be thoroughly documented with indication of limitations and assumptions, (iii) Its coverage must include the whole institution and all material risk types, and (iv)
It must be independently validated.
5.1.3
The DMF must include, at a minimum, the following steps:
(i) Identification of sources, (ii) Regular and frequent collection, (iii) Rigorous data quality review and control, (iv) Secure storage and controlled access, and (v)
Robust system infrastructure. 5.1.4
The roles and responsibilities of the parties involved or contributing to the DMF must be defined and documented. Each data set or data type must have an identified owner. The owner is accountable for the timely and effective execution of the DMF steps for its data set or data type. The owner may not be responsible for performing each of the DMF steps, but she/he must remain accountable for ensuring that those are performed by other parties with high quality standards.
5.2 Identification of Data Sources
5.2.1
The DMF must include a process to identify and select relevant data sources within the institution for each type of data and model. If an institution recently merged or acquired another entity, it must carry out the necessary steps to retrieve historical data from these entities.
5.2.2
If internal sources are lacking in data quality or quantity, institutions may rely on external sources. However, if an institution decides to rely on external data for modelling, it must demonstrate that the data is relevant and suitably representative of its risk profile and its business model. External data sources must be subject to an identification and selection process. The DMF governance and quality control also apply to external data employed for modelling.
5.2.3
Once a source has been selected, institutions are expected to retain this source long enough to build consistent time series. Any change of data source for the construction of a given data set must be rigorously documented.
5.3 Data Collection
5.3.1
Each institution must collect data for the estimation of all risks arising from instruments and portfolios where it has material exposures. The data collection must be sufficiently granular to support adequate modelling. This means that data collection must be (i) sufficiently specific to be attributed to risk types and instrument types, and (ii) sufficiently frequent to allow the construction of historical time series.
5.3.2
The data collection process must cover, amongst others, credit risk, market risk (in both the trading and banking books), concentration risk, liquidity risk, operational risk, fraud risk and financial data for capital modelling. A justifiable and appropriate collection frequency must be defined for each risk type.
5.3.3
The data must be organised such that the drivers and dimensions of these risks can be fully analysed. Typical dimensions include obligor size, industries, geographies, ratings, product types, tenor and currency of exposure. For credit risk in particular, the data set must include default events and recovery events by obligor segments on a monthly basis.
5.3.4
The data collection must be documented. The data collection procedure must include clear roles and responsibilities with a maker-checker review process, when appropriate.
5.3.5
Institutions must seek to maximise automated collections and reduce manual interventions. Manual interventions must be avoided as much as possible and rigorously documented to avoid operational errors.
5.3.6
The data collection process must ensure the accuracy of metadata such as units, currencies, and date/time-stamping.
5.4 Data Quality Review
5.4.1
Prior to being used for modelling purposes, the extracted data must go through a cleaning process to ensure that data meets a required quality standard. This process must consider, at a minimum, the following data characteristics:
(i) Completeness: values are available, where needed, (ii) Accuracy: values are correct and error-free, (iii) Consistency: several sources across the institution lead to matching data, (iv) Timeliness: values are accurate as of the reporting date, (v) Uniqueness: values are not incorrectly duplicated in the same data set, and (vi)
Traceability: the origin of the data can be traced.
5.4.2
Institutions must put in place process to accomplish a comprehensive data quality review. In particular, the quality of data can be improved by, amongst others, replacing missing data points, removing errors, correcting the unit basis (thousands vs. millions, wrong currency, etc.) and reconciling against several sources.
5.4.3
Institutions must put in place tolerance levels and indicators of data quality. These indicators must be mentioned in all model documentation. Data quality reports must be prepared regularly and presented to Senior Management and the Board as part of the DMF governance, with the objective to monitor and continuously improve the quality of data over time. Considering the essential role of data quality in supporting risk management and business decisions, institutions must also consider including data quality measures in their risk appetite framework.
5.5 Data Storage and Access
5.5.1
Once a data set has been reviewed and is deemed fit for usage, it must be stored in a defined and shared location. Final data sets must not be solely stored on the computers of individual employees.
5.5.2
The access to a final data set must be controlled and restricted to avoid unwarranted modifications.
5.5.3
Appropriate measures must be taken to ensure that data is stored securely to mitigate operational risks such as cyber-attacks and physical damage.
5.6 System Infrastructure
5.6.1
Institutions must ensure that an appropriate IT system infrastructure is in place to support all the steps required by the DMF.
5.6.2
The system infrastructure must be sufficiently scalable to support the DMF requirements.
5.6.3
The system infrastructure must be in the form of strategic long-term solutions, not tactical solutions. Spreadsheet solutions must be not considered as acceptable long term solutions for data storage.
5.6.4
Employment of staff with data science knowledge and expertise is encouraged in order to undertake appropriate data management oversight.
5.6.5
Institutions must minimise key person risk related to the management of modelling data. They must ensure that several members of staff have the suitable technical expertise to fully manage data for modelling purposes.