6.3 Data Exploration

6.3.1	The data exploration phase must be used to confirm whether the data set is suitable for modelling purposes. The objective is to understand the nature and composition of the data set at hand and to identify expected or unusual patterns in the data. In this process, critical thinking and judgement is expected from the modelling team.
6.3.2	Descriptive statistics should be produced across both the dependent and independent variables. For instance, for credit risk modelling, such exploration is relevant to identify whether obligors have homogeneous features per segment and or market risk modelling, such exploration is relevant to assess whether the market liquidity of the underlying product is sufficient to ensure a minimum reliability of the market factor time series.
6.3.3	Institutions must clearly state the outcome of the data exploration step, that is, whether the data is fit for modelling or not. In the latter case, the development process must stop and additional suitable data must be sourced. Consequently, data unavailability must not excuse unreliable and inaccurate model output.
6.3.4	The exploration of data can lead to unusual, counterintuitive or even illogical patterns. Such features should not be immediately accepted as a mere consequence of the data. Instead, the modelling team is expected to analyse further these patterns at a lower level of granularity to understand their origin. Subsequently, either (i) the pattern should be accepted as a matter of fact, or (ii) the data should be adjusted, or (iii) the data set should be replaced. This investigation must be fully documented because it has material consequences on model calibration.