Assumptions & Limitations
IHP+Results’ reporting framework is based on a number of assumptions – listed below. For more detailed explanations please click the respective link.
Analysing and interpreting the data
Assumptions & Potential Limitations
- Different target statements – some ratings are of a point in time attainment, whilst others require demonstration of progress over time from a baseline.
- Self-selection of participants and limited sample size – only a partial picture of Development Partner progress in each country can be presented.
- Consistency in interpretations of:
- Terminologies, either in regards to their complexity or based on translation discrepancies, e.g. ‘capacity building’ in francophone countries
- Assumptions of specific Standard Performance Measures (SPMs)
- Use of country consultants to support country governments to provide data
- Data availability – Limited data on a number of indicators from a number of development partners
- Triangulation – The exercise has been largely self-reported.
- Risk of double-counting
- Use of different baselines – no conclusions can be drawn that X has been achieved in Y years, but one can draw a conclusion that X has been achieved by 2009 in recent years
- Inclusion of General Budget Support
- Limitations in the reporting framework to reflect certain agencies’ business models appropriately
Analysing and interpreting the data
The reporting framework includes both qualitative and quantitative Standard Performance Measures (SPMs). We have completed our analysis by aggregating data in various ways:
- An overall indication of whether progress has been made, and whether targets have been met. We have aggregated all Development Partner data, to show one value for baseline and one value for 2009, for each quantitative SPM (measures 2DPa, 2DPb, 2DPc, 3DP, 4DP, 5DPa, 5DPb and 5DPc).
- A per agency/country perspective on progress for each SPM, using the Partner and Country Scorecards. These provide a rating for each SPM (both qualitative and quantitative), aggregated across the countries in which the agency is active. For the Country Scorecards, there is no aggregation – only Government data have been used as the basis for rating progress. Ratings are derived from published criteria. Data from Development Partners were aggregated using the formula below. Where data were not provided, a question mark is used. In some cases agencies indicated that one or more SPMs are not applicable to their business model – in these cases a grey bar has been used. For Standard Performance Measure 5Ga and 5Gb, data were used from World Bank and OECD sources.
- Disaggregated data by development partner and by country. These tables present the results for each of the SPMs by development partner and by country. Some concerns have been raised about drawing comparisons from these tables, as development partner performance will vary from country to country according to the country context. It is also possible that the Development Partners within each of these different settings might have interpreted some of the technical terms and definitions differently. These variations will affect the comparability of results by country.
The formula for aggregating data presented in partner scorecard ratings (same as the approach used to calculate the indicator value (weighted average) in the Paris Survey 2008).
Aggregate Numerator¸ Aggregate Denominator = Result
Numerator (country 1) + Numerator (country 2) + Numerator (country 3)
——————————————————————————————————— = Result
Denominator (country 1) + Denominator (country 2) + Denominator (country 3)
The Paris Survey in 2008 also used an alternative approach to aggregation (unweighed average). This calculates the result (numerator / denominator) for each country and adds these results and divides the answer by the number of countries, IHP+Results has not had time to conduct an alternative analysis using unweighed aggregation, but would hope to do so during 2011.
Qualitative data are analysed and presented in two ways:
- In partner scorecards.
- In online graphs and charts.
It is important to note that there is limited qualitative information to fully analyse these indicators. Questions in the survey tool for these measures focused on, for example, the existence of a plan or the use of a national performance assessment framework and were interpreted as quantitative data. Supplementary questions were asked about quality or obstacles to use etc. (see sample survey tool). But answers were not mandatory for supplementary questions and consequently the quality and comprehensiveness of data limit our ability to provide a qualitative assessment of the quantitative and qualitative measures. This is particularly relevant for the consistent understanding and quality of compacts, national plans, performance frameworks and mutual accountability processes.
A number of stakeholders have stressed the importance of qualitative data in order to make sense of the SPMs, and in particular of the presentation of results through partner and country scorecards. The scorecards have been designed to provide a clear, easily accessible presentation of complex information mainly for a high-level political audience. In this regard, they are likely to be of limited use to those interested in country-level performance and consequently should be read with care and taking into consideration the limitations presented below. However, we have taken steps to address these concerns through providing space on the reverse side of the partner scorecard, where DPs can qualify the results reported in the scorecard ratings; and through providing disaggregated data for each development partner and country.
Civil society engagement is measured through a government Standard Performance Measures 8G and 8DP. Both are largely quantitative measures, but with some qualitative information, too. We also conducted a brief qualitative survey of local civil society organisations in each of the 10 countries. We aimed to gather completed questionnaires from 10 civil society organisations chosen, in consultation with the Ministry of Health and using contacts provided by IHP+ civil society representatives, because they were involved in the process. We received responses from Burkina Faso (9 returns), Burundi (5), DRC (5), Ethiopia (10), Nepal (11), Mozambique (7), Mali (6), and Niger (4). We did not attempt to survey a representative sample, as we felt it unreasonable to ask questions of CSOs not involved in health sector policy and coordination processes. This enabled us to get a sense of the quality of the engagement that Government and DP action has enabled.
The survey questions asked for a response to a statement on a four point scale:
- 1 = strongly disagree,
- 2 = disagree,
- 3 = agree, and
- 4 = strongly agree.
The responses were aggregated for all CSO respondents in a country. An aggregate of over 3.5 was given a green tick, between 2.2 and 3.5 a yellow arrow, and below 2.2 an orange exclamation mark. The results presented were largely extremely positive which may reflect the selection bias and brevity of the CSO survey. Further work to cross check with other ongoing CSO work would be valuable.
Assumptions & Potential Limitations
The reporting framework has a number of assumptions, which apply to the SPMs to varying degrees as set out in the table below. These assumptions are explained in more detail below and how they may limit interpretation of the data.
Some of the targets are stated as a point in time target (e.g. achieve 66% of something) while some are relative with a point in time target written in (e.g. achieve a reduction of 66% of something to at least 85%). The target statements need to be read carefully to ascertain whether the rating is of a point in time attainment, or a rating of a relative change.
The sample of IHP+ signatories that volunteered to participate in 2010 monitoring is limited, and there are key development partners who are not signatories to the IHP+. This means that only a partial picture of development partner progress in each country can be presented.
… of terminologies
In some cases key terms are interpreted differently by different participants to the survey, notably because of language – for example on mutual accountability processes, or on performance assessment frameworks. IHP+Results did produce detailed guidance but it is not always clear that this has not been closely followed, nor that guidance is sufficient. This was a particular concern raised by development partners in Burundi, who decided to sit and establish a common understanding of the key terms and to resubmit data. More information from Burundi is provided here.
The concepts most open to broad interpretation are:
- Reporting aid on-budget (on health budget or general budget and also information given to government or given and also confirmed as reported on budget by government),
- Programme Based Approach – the official definition of this is not straightforward to apply and many development partners do not collect data disaggregated by this;
- Capacity building – we provided a definition in our guidance but many people have their own views, and do not often collect data disaggregated by this;
- Mutual Accountability process – this is particularly subjective as it could be a single meeting, an integrated annual review, or a range of other events;
- Compact or equivalent – definition of the equivalent is open to interpretation;
- Performance Framework – definition open to interpretation depending on what partners feel constitutes a complete and comprehensive framework.
- Government budget allocation to health also varies – some governments report it included federal and state allocations, some just federal allocations. Some report it with and some without external assistance. We considered using WHS data, but this is only available up to 2007 and so does not provide two points for comparison since the IHP+ was launched in 2007. We therefore decided to present the reported data, but to state clearly that the data should not be used to make comparisons between countries (see p20 of the IHP+Results 2010 Performance Report ).
Francophone respondents may have, in some instances understood capacity building to mean all development assistance, perhaps unintentionally inflating their responses. Our initial analysis on this suggests that it has not proved a significant distortion, but steps should be taken for 2011 monitoring to limit this possibility.
There are two ways in which measure 4DP (on the percent of health sector aid disbursements released according to agreed schedules in annual or multi-year frameworks) can be interpreted. The first is as the proportion of planned funding that was actually disbursed in a given year by the development partner. The second is the proportion of actual disbursement in a year which was planned for that year. Both tell us something valid, albeit slightly different, on the predictability of health aid that a government receives. The IHP+Results survey was unintentionally ambiguous on this. Responses provided fit the second interpretation, while on reflection we believe that the working group had in mind the first interpretation. We have calculated and reported on the second interpretation and provided a clear provisional indicator statement to reflect this. We suggest that the desired meaning of this indicator is revised for future surveys.
The Paris target on PFM (our Standard Performance Measure 5DPb) includes a two-tiered target: 66% reduction in aid not using PFM systems in countries with a CPIA/PFM score of 5+, and 33% in countries with 3.5 of more. None of the 10 survey countries have a CPIA/PFM score of 5+; and 5 countries (Burkina Faso, Ethiopia, Mali, Mozambique and Niger) were scored with 3.5 or above. IHP+Results therefore used 33% as the target for ratings. IHP+Results guidance also stated that agencies should only provide data for those countries with a score of 3.5 or more. Whilst DPs have provided a significant amount of data for those countries with a CPIA/PFM score of less than 3.5, we decided to not count these data as it is not possible to know to what extent data was missing as a result of our guidance. Instead the disaggregated ratings for all agencies operating in these countries were marked as N/A (using the none symbol); and IHP+Results reporting on 5DPb is based on the five countries with strong PFM systems. This means that overall findings, ratings and graphs are biased towards good performance. The effect of excluding data for 5 countries with weaker PFM systems was to increase the overall proportion of DP funds using PFM systems (by 12% in baseline, and by 23% in 2009); consequently the increase in performance between baseline and 2009 also increased (from 7% to 18%). We have included some analysis of DP use of PFM systems in countries with weaker systems in the Annual Performance Report (see p25), but a more systematic analysis would be useful; we will make changes to our guidance and approach for 2011 to enable this.
The use of country consultants to support country governments complete the survey tool could have resulted in additional interpretation, or misinterpretation, of the terminologies and concepts involved.
Limited data on a number of indicators from a number of development partners could skew the overall rating of certain DPs and for specific indicators. In many cases the question has resulted in a “not applicable” response.
In some cases agencies indicated that one or more Measures are not applicable to their business model – in these cases N/A (using the none symbol) was used. In some cases agencies indicated that one or more Measures are not applicable to their business model – in these cases a grey bar (using the none symbol) has been used. This symbol was also used in a number of cases where action was not possible by Development Partners (DP). For example, DP use of a national performance assessment framework is only possible where one exists. In those countries where Governments reported that they did not have a national performance assessment framework in place, all DPs that provided data for any country where this applied were rated with a none symbol. This applies to SPMs 1DP, 6DP and 7DP; and to 5DPb (see above).
We have not been able to verify the use of N/As in all cases – but we will ask DPs for feedback of how they would improve completeness of data in a subsequent exercise. We are aware that sometime instances relate to a poor fit between the reporting framework and agencies’ business models, and in other cases sometimes data are not easily available. We cannot exclude the possibility that there is a reluctance to report data that show poor performance, but many DPs have already knowingly reported such data anyway. This is a limitation as it also reduces the generalisability / confidence in findings for some DPs drawn from smaller datasets.
The exercise has been largely self-reported, and it has been difficult to find opportunities to triangulate data without imposing significant transaction costs on Ministry of Health officials. This means that we have not been able to verify for example, whether a development partner participates in a mutual accountability process in a country, and if it is the same mutual accountability process that the government has reported exists. Nonetheless there is great value in the data that has been reported in the 2010 IHP+ survey because it provides a statement of what governments and development partners consider they have done, and the survey results are an invaluable tool and starting point for discussions of mutual accountability.
IHP+Results guidance documentation clearly set out that funds should be reported by the agency that completes the final disbursement. We are confident that this has been broadly followed, but the possibility of double-counting cannot be discounted. For example development partner X provides $200,000 to development partner Y to do capacity building in country Z. It is possible that in some instances both development partners X and Y have reported this funding.
DPs and countries could select baseline in keeping with IHP+ light touch principle, with suggestion to be 2007 or 2005. This table provides a summary of the chosen baselines for the key indicators requiring a measurement of change over time (2DPa, 2DPb, 2DPc, 5DPa and 5DPb). We chose not to nominate a particular baseline year because of the approximately equal split between development partner using either 2005 or 2007 in their reports. The major implication of this is that readers cannot draw a conclusion that X has been achieved in Y years. But one can draw a conclusion that X has been achieved by 2009 in recent years. We are not able to calculate how far on or off track the indicators are because we do not have a target date for attainment, so the lack of a precise baseline does not cause an issue in this regard.
Capturing information on this and incorporating them in the survey represents a number of challenges if the survey is to remain light touch. We have used the following assumptions.
- We have calculated the proportion of GBS for health based on the % of country budget allocated to health. We have assumed that this is reported on budget, that it is 100% through country procurement and financial management systems, and that it is 100% programme based approach. We have not included it for capacity building calculations or country procurement. It should be noted that this can lead to a decrease in health sector aid, which is due to factors outside the control of DP decision makers (ie drops in imputed GBS to health are not due to DP decisions). It should also be noted that the data provided by governments on their allocation of country budget to health differed (see above section on ‘Consistency of interpretation’) which affects the comparability of the findings.
- We have counted the use of national procurement systems through Sector Budget Support through a proxy measure. It is not possible to report the volume of procurement funding in SBS scenarios, but it is possible to report the proportion of funds that use procurement systems. We therefore asked DPs to confirm what mechanisms they used to deliver their health aid, what proportion of their health aid used those mechanisms, and whether each mechanism used national procurement systems. This enabled us to enter figures that represented the proportion, but not the volume of funds using country procurement systems. We have produced parallel scorecards for the countries that provide GBS – one scorecard that includes only health aid, and the other that includes health aid plus GBS for the relevant targets. For sector budget support we have included this with the regular health aid (it was reported as such on the survey tool) but with the assumption that it is 100% through country procurement and financial management systems and that it is 100% programme based approach.
We chose not to nominate a particular baseline year because of the almost even split between 2005 and 2007 that was used in development partner reports. Others includes a few 2006 and 2008 baseline data points. For the aggregate calculations for all DP performance we have grouped 2005-2007 data as baseline and put 2008-2009 data as latest. If we were given 2008 as baseline we have used this for scorecard ratings, taken that, plus corresponding latest point data, out of aggregates. We have referred to “baseline” throughout the document. The major implication of this is that readers can not draw a conclusion that X has been achieved in Y years. But one can draw a conclusion that X has been achieved by 2009 in recent years. We are not able to calculate how far on or off track the indicators are because we do not have a target date for attainment, so the lack of a precise baseline does not cause an issue in this regard.