Assumptions & Limitations

Summary

IHP+Results’ reporting framework is based on a number of assumptions – listed below. For more detailed explanations please click the respective link.

Analysing and interpreting the data

Assumptions & Potential Limitations

Detailed Explanations

Analysing and interpreting the data

The reporting framework includes both qualitative and quantitative Standard Performance Measures (SPMs). We have completed our analysis by aggregating data in various ways:

Quantitative data

To provide:

The formula for aggregating data presented in partner scorecard ratings (same as the approach used to calculate the indicator value (weighted average) in the Paris Survey 2008).

Aggregate Numerator¸ Aggregate Denominator = Result

Example:

Numerator (country 1) + Numerator (country 2) + Numerator (country 3)

——————————————————————————————————— = Result

Denominator (country 1) + Denominator (country 2) + Denominator (country 3)

The Paris Survey in 2008 also used an alternative approach to aggregation (unweighed average). This calculates the result (numerator / denominator) for each country and adds these results and divides the answer by the number of countries, IHP+Results has not had time to conduct an alternative analysis using unweighed aggregation, but would hope to do so during 2011.

Back to top

Qualitative data

Qualitative data are analysed and presented in two ways:

  1. In partner scorecards.
  2. In online graphs and charts.

It is important to note that there is limited qualitative information to fully analyse these indicators. Questions in the survey tool for these measures focused on, for example, the existence of a plan or the use of a national performance assessment framework and were interpreted as quantitative data. Supplementary questions were asked about quality or obstacles to use etc. (see sample survey tool). But answers were not mandatory for supplementary questions and consequently the quality and comprehensiveness of data limit our ability to provide a qualitative assessment of the quantitative and qualitative measures. This is particularly relevant for the consistent understanding and quality of compacts, national plans, performance frameworks and mutual accountability processes.

A number of stakeholders have stressed the importance of qualitative data in order to make sense of the SPMs, and in particular of the presentation of results through partner and country scorecards. The scorecards have been designed to provide a clear, easily accessible presentation of complex information mainly for a high-level political audience. In this regard, they are likely to be of limited use to those interested in country-level performance and consequently should be read with care and taking into consideration the limitations presented below. However, we have taken steps to address these concerns through providing space on the reverse side of the partner scorecard, where DPs can qualify the results reported in the scorecard ratings; and through providing disaggregated data for each development partner and country.

Back to top

Civil Society data and analysis

Civil society engagement is measured through a government Standard Performance Measures 8G and 8DP. Both are largely quantitative measures, but with some qualitative information, too.  We also conducted a brief qualitative survey of local civil society organisations in each of the 10 countries. We aimed to gather completed questionnaires from 10 civil society organisations chosen, in consultation with the Ministry of Health and using contacts provided by IHP+ civil society representatives, because they were involved in the process. We received responses from Burkina Faso (9 returns), Burundi (5), DRC (5), Ethiopia (10), Nepal (11), Mozambique (7), Mali (6), and Niger (4).  We did not attempt to survey a representative sample, as we felt it unreasonable to ask questions of CSOs not involved in health sector policy and coordination processes. This enabled us to get a sense of the quality of the engagement that Government and DP action has enabled.

The survey questions asked for a response to a statement on a four point scale:

The responses were aggregated for all CSO respondents in a country. An aggregate of over 3.5 was given a green tick, between 2.2 and 3.5 a yellow arrow, and below 2.2 an orange exclamation mark. The results presented were largely extremely positive which may reflect the selection bias and brevity of the CSO survey. Further work to cross check with other ongoing CSO work would be valuable.

Back to top

Assumptions & Potential Limitations

The reporting framework has a number of assumptions, which apply to the SPMs to varying degrees as set out in the table below. These assumptions are explained in more detail below and how they may limit interpretation of the data.

Different target statements

Some of the targets are stated as a point in time target (e.g. achieve 66% of something) while some are relative with a point in time target written in (e.g. achieve a reduction of 66% of something to at least 85%). The target statements need to be read carefully to ascertain whether the rating is of a point in time attainment, or a rating of a relative change.

Back to top

Self-selection and sample size

The sample of IHP+ signatories that volunteered to participate in 2010 monitoring is limited, and there are key development partners who are not signatories to the IHP+. This means that only a partial picture of development partner progress in each country can be presented.

Back to top

Consistency of interpretation

… of terminologies

In some cases key terms are interpreted differently by different participants to the survey, notably because of language – for example on mutual accountability processes, or on performance assessment frameworks.  IHP+Results did produce detailed guidance but it is not always clear that this has not been closely followed, nor that guidance is sufficient. This was a particular concern raised by development partners in Burundi, who decided to sit and establish a common understanding of the key terms and to resubmit data. More information from Burundi is provided here.

The concepts most open to broad interpretation are:

Back to top

… of terminologies due to language discrepancies

Francophone respondents may have, in some instances understood capacity building to mean all development assistance, perhaps unintentionally inflating their responses. Our initial analysis on this suggests that it has not proved a significant distortion, but steps should be taken for 2011 monitoring to limit this possibility.

Back to top

Assumptions on specific Standard Performance Measures

There are two ways in which measure 4DP (on the percent of health sector aid disbursements released according to agreed schedules in annual or multi-year frameworks) can be interpreted. The first is as the proportion of planned funding that was actually disbursed in a given year by the development partner. The second is the proportion of actual disbursement in a year which was planned for that year. Both tell us something valid, albeit slightly different, on the predictability of health aid that a government receives. The IHP+Results survey was unintentionally ambiguous on this. Responses provided fit the second interpretation, while on reflection we believe that the working group had in mind the first interpretation. We have calculated and reported on the second interpretation and provided a clear provisional indicator statement to reflect this. We suggest that the desired meaning of this indicator is revised for future surveys.

The Paris target on PFM (our Standard Performance Measure 5DPb) includes a two-tiered target: 66% reduction in aid not using PFM systems in countries with a CPIA/PFM score of 5+, and 33% in countries with 3.5 of more. None of the 10 survey countries have a CPIA/PFM score of 5+; and 5 countries (Burkina Faso, Ethiopia, Mali, Mozambique and Niger) were scored with 3.5 or above. IHP+Results therefore used 33% as the target for ratings.  IHP+Results guidance also stated that agencies should only provide data for those countries with a score of 3.5 or more.  Whilst DPs have provided a significant amount of data for those countries with a CPIA/PFM score of less than 3.5, we decided to not count these data as it is not possible to know to what extent data was missing as a result of our guidance.  Instead the disaggregated ratings for all agencies operating in these countries were marked as N/A (using the none symbol); and IHP+Results reporting on 5DPb is based on the five countries with strong PFM systems.  This means that overall findings, ratings and graphs are biased towards good performance. The effect of excluding data for 5 countries with weaker PFM systems was to increase the overall proportion of DP funds using PFM systems (by 12% in baseline, and by 23% in 2009); consequently the increase in performance between baseline and 2009 also increased (from 7% to 18%). We have included some analysis of DP use of PFM systems in countries with weaker systems in the Annual Performance Report (see p25), but a more systematic analysis would be useful; we will make changes to our guidance and approach for 2011 to enable this.

Back to top

Use of country consultants

The use of country consultants to support country governments complete the survey tool could have resulted in additional interpretation, or misinterpretation, of the terminologies and concepts involved.

Back to top

Data Availability

Limited data on a number of indicators from a number of development partners could skew the overall rating of certain DPs and for specific indicators.  In many cases the question has resulted in a “not applicable” response.

%n>

N/A ? Response Total
1DP 35 1 62 98
2DPa 7 <31 60 98
2DPb 35 19 44 98
2DPc 5 16 77 98
3DP 4 12 82 98
4DP 4 21 73 98
5DPa 21 32 45 98
5DPb 6 27 65 98
5DPc 11 15 72 98
6DP 25 4 <69 98
7DP 35 2 61 98
8DP 0 21 77 98

In some cases agencies indicated that one or more Measures are not applicable to their business model – in these cases N/A (using the none symbol) was used. In some cases agencies indicated that one or more Measures are not applicable to their business model – in these cases a grey bar (using the none symbol) has been used. This symbol was also used in a number of cases where action was not possible by Development Partners (DP). For example, DP use of a national performance assessment framework is only possible where one exists. In those countries where Governments reported that they did not have a national performance assessment framework in place, all DPs that provided data for any country where this applied were rated with a none symbol. This applies to SPMs 1DP, 6DP and 7DP; and to 5DPb (see above).

We have not been able to verify the use of N/As in all cases – but we will ask DPs for feedback of how they would improve completeness of data in a subsequent exercise. We are aware that sometime instances relate to a poor fit between the reporting framework and agencies’ business models, and in other cases sometimes data are not easily available. We cannot exclude the possibility that there is a reluctance to report data that show poor performance, but many DPs have already knowingly reported such data anyway. This is a limitation as it also reduces the generalisability / confidence in findings for some DPs drawn from smaller datasets.

Triangulation

The exercise has been largely self-reported, and it has been difficult to find opportunities to triangulate data without imposing significant transaction costs on Ministry of Health officials. This means that we have not been able to verify for example, whether a development partner participates in a mutual accountability process in a country, and if it is the same mutual accountability process that the government has reported exists. Nonetheless there is great value in the data that has been reported in the 2010 IHP+ survey because it provides a statement of what governments and development partners consider they have done, and the survey results are an invaluable tool and starting point for discussions of mutual accountability.

Back to top

Risk of double-counting

IHP+Results guidance documentation clearly set out that funds should be reported by the agency that completes the final disbursement. We are confident that this has been broadly followed, but the possibility of double-counting cannot be discounted. For example development partner X provides $200,000 to development partner Y to do capacity building in country Z. It is possible that in some instances both development partners X and Y have reported this funding.

Back to top

Use of different baselines

DPs and countries could select baseline in keeping with IHP+ light touch principle, with suggestion to be 2007 or 2005.  This table provides a summary of the chosen baselines for the key indicators requiring a measurement of change over time (2DPa, 2DPb, 2DPc, 5DPa and 5DPb). We chose not to nominate a particular baseline year because of the approximately equal split between development partner using either 2005 or 2007 in their reports. The major implication of this is that readers cannot draw a conclusion that X has been achieved in Y years. But one can draw a conclusion that X has been achieved by 2009 in recent years.  We are not able to calculate how far on or off track the indicators are because we do not have a target date for attainment, so the lack of a precise baseline does not cause an issue in this regard.

2005 2007 Other
On-budget 50 45 3
Capacity Development 56 40 2
PBA 56 40 2
Procurement 56 40 2
PFM 50 45

Back to top

Inclusion of General Budget Support

Capturing information on this and incorporating them in the survey represents a number of challenges if the survey is to remain light touch.  We have used the following assumptions.

Back to top

Limitations in the reporting framework

We chose not to nominate a particular baseline year because of the almost even split between 2005 and 2007 that was used in development partner reports. Others includes a few 2006 and 2008 baseline data points. For the aggregate calculations for all DP performance we have grouped 2005-2007 data as baseline and put 2008-2009 data as latest. If we were given 2008 as baseline we have used this for scorecard ratings, taken that, plus corresponding latest point data, out of aggregates. We have referred to “baseline” throughout the document. The major implication of this is that readers can not draw a conclusion that X has been achieved in Y years. But one can draw a conclusion that X has been achieved by 2009 in recent years. We are not able to calculate how far on or off track the indicators are because we do not have a target date for attainment, so the lack of a precise baseline does not cause an issue in this regard.

Back to top