The impact of trial stage, developer involvement and international transferability on universal social and emotional learning programme outcomes: a meta-analysis

Results Descriptive Characteristics

Yüklə 120,02 Kb.

səhifə	2/4
tarix	08.08.2018
ölçüsü	120,02 Kb.
	#61073

1 2 3 4

Results

Descriptive Characteristics

A final sample of 89 studies that met the aforementioned inclusion criteria was included in the analyses. Table 2 summarises the salient features of the included studies. These figures are broadly consistent with the characteristics of studies included in previous reviews (Durlak et al., 2011; Sklad et al., 2012). However, the majority of studies (69%) reported on implementation, a higher proportion compared to previous reviews.

***Table 2***
In relation to the specific criteria of the study hypotheses, most studies were considered to be reporting efficacy based trials (69%), with the majority including some element of developer involvement, either as lead (38%) or involved (28%). Unsurprisingly, the majority of studies were of ‘home’ programmes, (80%), mostly originating from the USA. These figures are shown in table 3.

***Table 3***

Main programme effects

***Table 4***
The grand means for each of the outcome variables of interest were statistically significant (all 95% CI’s did not pass zero). The magnitude of the effect varied by outcome type, with the largest effect for measures of social-emotional competence (0.53), and the smallest in attitudes towards self (0.17). However, this particular variable represents only a small number of studies (n=9). Heterogeneity amongst studies was high, with I² varying between 43%-97%, confirming that although SEL programmes can be seen, on average, to deliver intended effects, there was a high degree of variation amongst individual studies.
Stage of evaluation: Efficacy vs. Effectiveness

***Table 5***

We predicted that studies coded as efficacy would show greater effects compared to studies coded as effectiveness. This hypothesis was supported for 6 of the 7 outcome variables (mean difference (MD) between conditions = 0.13). However, comparisons of proportional overlap (Cumming & Finch, 2005) showed only 4 of these outcomes reached statistical significance (PSB, CP, ED, & AA). The largest differences were seen between outcomes measuring behaviour, specifically pro-social behaviour (MD =0.19) and conduct problems (MD =0.19). The smallest differences were detected for the outcomes variables of attitudes towards self (MD =0.06) and emotional competence only (mean difference =0.05), though these also reflect the outcome categories with the smallest number of included studies. Only outcomes classified as ‘social emotional competence’ showed greater effects when in the effectiveness condition, contrary to the stated hypothesis. Effect size and confidence intervals can be seen in figure 1. Of note is the high degree of heterogeneity across all outcome variables, as evidenced by the values of both the Q and I² statistics, with most studies being categorised as either ‘medium’ or ‘high’ (Higgins, Thompson, Deeks, and Altman, 2003), indicating a diversity of effect across studies.

***Figure 1***

Developer involvement

***Table 6***

We hypothesised that studies in which the developer had been identified as taking a lead would show greater effects in relation to independent studies. Taking into account the small sample size for attitudes towards self (n=4), and the very high degree of heterogeneity noted by the Q and I² statistics, the hypothesis was supported by only 2 of the 7 outcomes (‘attitudes towards self’ and ‘emotional distress’), though these were not seen to be statistically significant. The mean difference between developer led and independent for these outcomes was 0.2 and 0.02 respectively. The outcome variables for pro-social behaviour, academic achievement, and emotional competence only showed the greatest effects at ‘involved’, whereas social and emotional competence and conduct problems showed the highest mean effect at ‘independent’. Effect size and confidence in intervals can be seen in figure 2.

To further investigate Eisner’s (2009) high fidelity hypotheses (implementation of a given intervention is of a higher quality in studies in which the programme developer is involved, leading to better results), a cross-tabulated analysis (developer involvement vs. issues with implementation) was conducted for all studies which reported implementation (n=61). No significant association between developer involvement and issues with implementation was found (χ² (2, n=61) = .633, p = .718, Cramer’s V = .104). This suggests that differences in effect between categories of developer involvement is not explained by better implementation.

***Figure 2***
***Table 7***

Cross-cultural transferability

***Table 8***

We hypothesised that studies implemented within the same country they were developed would show greater effects than those transported abroad. This hypothesis was supported in 4 of the 7 outcome variables: (SEC (MD = 0.5); ATS (MD = 0.11); PSB (MD = 0.19); & ED (MD = 0.09). All four were seen to be statistically significant. Only one study reporting ‘attitudes towards self’ qualified as ‘away’ and therefore fit statistics were not available. For the conduct problems, academic achievement and emotional competence, more favourable effects were seen for studies coded as ‘away’. Both the Q and I² statistics show a very large degree of inconsistency between studies, as well as a very small n for some of outcome variables. This is a likely explanation for the large confidence intervals demonstrated in figure 3.

***Figure 3***
Discussion

The purpose of the current study was to empirically investigate previously hypothesised factors that may explain differential programme effects in universal, school-based SEL programmes, specifically: 1) stage of evaluation (efficacy or effectiveness); 2) involvement from the programme developer in the evaluation; and 3) whether the programme was implemented in its country of origin. Findings from the current study present a more complex picture than that hypothesised in previous literature. These findings necessitate new thinking about the way these (and other) factors are examined, and highlight important implications in the world-wide implementation of these interventions. Each hypothesis is discussed in turn, followed by a consideration of the limitations of the current study and consideration of future directions for research.

1) Studies coded as ’efficacy’ will show larger effect sizes compared to those coded as ‘effectiveness‘

Greater programme effects under efficacy conditions (consistent with the first hypothesis) were shown for all but one outcome variable (social-emotional competence (SEC)), (though only 4 of the 7 outcomes were statistically significant). Results indicate a trend towards greater effects when additional support, training, staff or resources are provided. This is consistent with previous findings from Beelmann and Lösel (2006) who found an increased effect (d = 0.2), if a programme had been delivered by study authors or research staff. This finding has implications for the scaling up of programmes as this implies a large scale over-representation of expected effects in ‘real world’ settings, especially as 69% of studies were coded as efficacy trials.

These findings may be interpreted as indicating that the higher levels of fidelity produce higher outcomes, and has underpinned the argument that 100% fidelity is to be strived for and adaptations to be avoided in order to achieve outcomes (Elliott & Mihalic, 2004). Whilst not entirely unreasonable, this involves two assumptions that should be considered before dismissing the considerable literature that supports the utility of some types of adaptations. Firstly, it assumes that the salient characteristics of schools that later adopt an intervention mirror those of the school where an efficacy trial took place. Such an assumption rejects the inherent diversity of the school environment. Natural variation in contexts is to be expected (Forman et al., 2013), and adaptations to the programme or the way in which it is implemented may be necessary to achieve the same ‘goodness-of-fit’ as seen in an efficacy trial. Research consistently demonstrates that such adaptations are to be expected when school-based interventions are adopted and implemented more broadly (Ringwalt et al., 2003). Secondly, there is the assumption that at efficacy stage an intervention is implemented with 100% fidelity. As a primary aim of an efficacy trial is to demonstrate the internal validity of an intervention, and that the context of implementation is optimised to maximise the achievement of outcomes, then it is possible that either, or both, the context and the programme are adjusted, however slightly, to support the demonstration of impact.

Such considerations do not account for the contrary result for the SEC outcome, which shows larger effects under the effectiveness condition. One promising explanation for this conflicting finding is offered by our current understanding of self-efficacy in programme delivery. Self-efficacy is underpinned by knowledge, understanding and perceived competence, which has been shown as a factor in promoting achievement of outcomes (Durlak & DuPre, 2008; Greenberg et al., 2005). Indeed, there is evidence to suggest that, for some interventions, greater effects are achieved when the programme is delivered by external facilitators, when compared to teachers (Stallard, 2014). Therefore, it is possible that an effectiveness trial can outperform efficacy conditions, only when there is a high degree of implementer confidence and/or skill. If this is the case, then there may be a differential ease by which universal, school-based SEL interventions are seen as achievable by the implementers (i.e. school staff). Programmes featuring promotions of general socio-emotional competency (that incorporate both the emotional and relational aspect of Denham’s model), may be viewed as the most acceptable and therefore the programmes that are hypothesised to most benefit from inevitable adaptation. In conjunction with previous paragraph, this might imply that adaptation is preferable to fidelity, only when there is sufficient confidence and understanding of the intervention. As literature indicates that multiple factors may be a factor in explaining a reduction in effects, (Biggs, Vernberg, Twemlow, Fonagy, & Dill, 2008; Stauffer, Heath, Coyne, & Ferrin, 2012), this is a clear steer towards a closer consideration of differing aspects when evaluating programme implementation, requiring a broader application of research methodologies to investigate.

2) Studies in which the developer has been identified as leading or being involved will show larger effect sizes in relation to independent studies

The study hypothesis was supported by 2 of the 7 outcomes (‘attitudes towards self’ and ‘emotional distress’), though these were not seen to be statistically significant. Dependent on the outcome variable measured, effects favour either involvement (pro-social behaviour, academic achievement, emotional competence) or independence (social-emotional competence, conduct problems). Therefore, consideration of developer involvement alone is not sufficient to explain variation in programme outcomes. This result is at odds with findings from allied disciplines such as Psychiatry (Perlis et al. (2005), and Criminology (Petrosino & Soydan, 2005), which demonstrate clear difference in effect when considering developer involvement. For instance, Perlis et al. (2005) found that studies that declare a conflict of interest were 4.9 times more likely to report positive results. However, attributing effects directly to the involvement of programme developers is questionable (Eisner, 2009). This is because developer involvement is an indirect indicator of other aspects of study design (e.g. experimenter expectancy effects (Luborsky et al., 1999)) and implementation quality (Lendrum, Humphrey, & Greenberg, in press).

A possible explanation for the inconsistent findings in the current study is the failure to account for the temporal aspect of programme development and evaluation. For instance, studies led by a developer may indicate an earlier or more formative stage of programme evaluation (see Campbell et al., 2000), where critical elements are still being trialled and modified. In this instance, it would be hypothesised that the ‘involved’ or ‘independent’ categories would begin to show greater effects, as the programme elements are finalised and the evaluations become more summative than formative (this would be conceptualised as increasing effects, similar to the pattern of results for SEC and CP in figure 2). However, this also suggests a limitation with the study methodology (specifically, a lack of independence between the two constructs of stage of evaluation and developer involvement). Also indicated is broader limitation with the current the status of the field. The majority of programme in the field are in relatively early stages of development and evaluation. (as the current findings show, approximately 69% are in the efficacy-based). Therefore interpretation of any other factors affecting programme success (e.g. developer involvement) are limited by the over-representation of this category. This limits the conclusions that can be drawn from beyond the preliminary exploration presented here.

This evidence does not preclude other hypotheses or future investigation of the potential effects of developer involvement in further research. Indeed, Eisner (2009) identifies a number of possible causal mechanisms to explain the general reduction of effects in independent prevention trials, and draws together a checklist by which consumers of research (and those researching these effects directly) can consider the extent to which these factors may influence results. Examples include cognitive biases, ideological position, and financial interests. Such an approach would aid clarity, as the further investigation of the phenomenon is currently limited by difficulty in establishing the precise role of the developer in individual trials when coding studies. To be able to examine the cynical / high fidelity views (see literature review) more thoroughly, studies need to more precisely report the extent and nature of the developer’s involvement in an evaluation. Additional to this would be the consideration of implementation difficulties, as a significant minority of trials included in the present study did not report implementation. This would allow more comprehensive testing of the ‘high fidelity’ as a specific hypothesis beyond the cross tabulation analysis in table 7, which although not supportive of implementation quality as a moderator related to developer involvement, is a relatively blunt analysis (e.g. only containing studies which reported on implementation). Results from the current study tentatively suggest that the SEL field seems immune to the potential biases suggested by Eisner (2009), but there is little evidence to indicate why this would be so. Therefore, there is sufficient cause to further explore this issue, potentially using factors identified by Eisner as a starting point.

3) Studies implemented within the country of development will show larger effect sizes than those adopted and implemented outside country of origin

The hypothesis that programmes developed and implemented within their own national boundaries would show greater effects than those transported to another country was supported by 4 of the 7 outcomes (social-emotional competence, attitudes towards self, pro-social behaviour, emotional distress) all of these variables were statistically significant. Of particular note is the size of the effects between categories, with some programmes showing almost no impact at all when transferred internationally.

Extant literature provides some guidance in explaining these effects. Several authors note the challenges associated with the implementation of programmes across cultural and international boundaries (Emshoff, 2008; Ferrer-Wreder, Sundell, & Mansoory, 2012; Resnicow, Soler, Braithwaite, Ahluwalia, & Butler, 2000), and therefore it is not surprising that the ‘away’ category would show reduced effects. For instance, the lack of critical infrastructure (e.g. quality supervision, site preparation, and staff training) has been an often cited reason for implementation failure (Elliott & Mihalic, 2004; Spoth et al., 2004). In this way, programmes may still be considered internally valid, but are not able to be implemented within a new context - flowers do not bloom in poor soil. This is a relatively optimistic interpretation of the data, as this would imply that the only limiting factor in successful implementation is a lack of more established ‘ground work’ in key areas (such as those identified by Elliott and Mihalic (2004), prior to the introduction of the programme. However, independent of infrastructure concerns, Kumpfer, Alvarado, Smith, and Bellamy (2002) note that interventions that are not aligned with the cultural values of the community in which they are implemented are likely to show a reduction in programme effects. This is consistent with earlier considerations in the literature. Wolf (1978) draws a distinction between the perceived need for programme outcomes (i.e. reduced bullying) and the procedures and processes for achieving this goal (i.e a specific taught curriculum). In practice, this would be consistent with the transportation of programmes which may be appropriate, but not congruent, with educational approaches or pedagogical styles. Contrary to the lack of infrastructure argument which requires school-based changes, it is programme adaptation that is required to address these needs. Berman and McLaughlin (1976) suggest that the likely answer is somewhere in the middle, with ‘mutual adaptation’ of both programme and implementation setting required for best results. Although these ideas are far from new, results from the current study suggest further understanding of the processes of cultural transportability of programmes is still undoubtedly required.

Neither infrastructure nor cultural adaptability fully explains why certain outcome variables show larger effect sizes in the ‘away’ category (i.e. why adapted programmes should outperform home versions). It may be that certain programmes require less external infrastructure, or may be more amenable to surface-level adaptations which do not interfere with change mechanisms and are therefore easier to transport. However, this would account for roughly equivalent, rather than enhanced, effects compared to home programmes. A partial explanation is offered when considering the temporal aspect of programme development alongside existing theories. For outcomes where larger effects are seen in the away category, it is possible that these programmes have an established history of development and evaluation in a broader range of contexts within the original country of development, resulting in greater external validity and subsequently fewer issues for transferability when exported. However, it is difficult to fully assess this hypothesis within the current design. This acknowledges a need for methodological diversity in in investigating these phenomena.

Following from this argument, the data may be representative of a much more cyclical (rather than linear) framework of design and evaluation as proposed by Campbell et al. (2000), by which large scale, ‘successful’ interventions are returned to a formative phase of development when implemented in a ‘new’ context with a new population (either within or across international boundaries). This is consistent with the idea of ‘cultural tailoring’ (Resnicow, et al., 2000), which is used to describe adaptations in interventions that are specifically targeted towards new cultural groups. Variation in programme outcomes may be representative of the ease and/or extent to which cultural tailoring requires re-validation of a programme, in line with Campbell et al.’s (2000) framework. In this way, the findings of the current study fail to fully capture this temporal, cyclical element, sampling from individual interventions at potentially each stage of development within its new cultural climate.

Limitations of the current study

The most pressing limitation of the current study is that of diversity, both in regards to ‘methodological diversity’ (variability in study design), and ‘clinical heterogeneity’ (differences between participants, interventions and outcomes) (the results of which are indicated by the Q and I² statistics (Higgins & Green, 2008)). This suggests that the current categorisation of studies by the selected variables (trial stage, development involvement and cultural transportability) warrant further consideration in relation to their fit with the obtained data. i.e. they explain some, but not all, of the variability in outcomes.

The identified heterogeneity is in no small part due to the expansive definition by which SEL programmes are identified (Payton et al. 2008). This raises questions about the utility of such broad definitions within the academic arena as it currently precludes more precise investigations of specific issues. For instance, the inconsistency by which relevant variables are directly related to an intervention’s immediate or proximal outcome masks potential moderating effects when using meta-analytic strategies. As noted by Sklad et al. (2012), direct effects for some programmes (i.e. improved self-control) are considered as indirect by others (i.e. an intermediary part of a logic model in which pro-social behaviour is the intended outcome). This is an issue at theory level, contingent upon the logic models which underpin the implementation strategies of individual programme, but has serious implications for outcome assessment. For instance, lesser gains would be expected from distal outcomes, and therefore should not be assessed alongside proximal outcomes.

A further consideration is that the current study examined simple effects only; the potential moderating effects of stage of trial, development involvement, and cultural transportability as independent of one another. Table 3 demonstrates that the current field is not evenly balanced in relation to the factors, and therefore small cell sizes precluded the reliable examination of additive or interaction effects between these variables. This presents an intriguing avenue of enquiry beyond this preliminary investigation – e.g. To what extent could these factors inter-relate? Although theorised as independent constructs, there are hypothetical scenarios that would suggest these factors can relate to one another, several of which have already been presented in this paper. For instance, we may hypothesise that there an interrelationship between trial stage and development involvement, when also considering the temporal aspect of programme development. However, additional theoretical work is required to further map the nature of these constructs and their relationships, as some combinations of factors are far less likely to originate in the field and may also prove counter-intuitive to established frameworks for programme development (e.g. Campbell et al, 2000). For instance, it is very unlikely to find a developer led effectiveness trail being delivered outside of its country of origin. Such considerations preclude ‘simple’ analyses such as cross tabulated frequencies, as these can be easily misinterpreted without further substantive theorisation and discussion.

It is argued that such theorisation should accompany further empirical work. For instance, future studies may consider the application of regression frameworks in which factors such as those already identified are used to predict outcomes, which may help address the overlap (or ‘shared variance’) between the constructs. This paper serves (in part) as a call for this kind further work in this area, (both theoretical and empirical). However, such approaches require further maturation and development within the field. As noted above, there are still relatively small numbers of programmes progressing to efficacy stage, and similarly very little cultural transportability. As more trials more into ‘later stages’ of development, more data and understanding will hopefully be forthcoming, furthering the basis of the preliminary exploration presented here.

The methodological limitations in the current study mirror the wider difficulties in the field; specifically the failure for some of the more commonly adopted methods to capture the complexities and nuances in relation to the unique ecology of the implementation context, most notably due to an emphasis on randomised control trial (RCT) methodology (and its variants), specifically incorporating ‘Intent to Treat’ (ITT) assessment (Gupta, 2011). It has been argued that RCT methodology is a limited approach, as trials can fail to explore how intervention components and their mechanisms of change interact (Bonell, Fletcher, Morton, Lorenc, & Moore, 2012). For instance, how an intervention interacts with contextual adaptation, which has argued to be both inevitable and essential for a successful outcome (Ringwalt et al., 2003). This limitation is translated into the current study design, for instance it is worth considering the relatively blunt (yet effectively dichotomised) measure of cultural transferability used within the current study. Only international borders are considered, which does not take into account cultural variability within countries. In relation to the practice of cultural tailoring (Resnicow, et al., 2000), there is little methodological insight to help represent the diverse ecologies of classroom culture

In relation to the’ ideal’ amount of adaption for optimal outcomes, positive outcomes on certain measures are very much dependent on the context and composition of individual classes. For instance, although there is certainly evidence for universal gains for some outcomes (e.g. social-emotional competence), there are differential gains for others (e.g. pupils need to demonstrate poor behaviour before being able to improve on measures of conduct problems). Although there have been calls for RCTs comparing adapted versions of a programme to its ‘generic’ counterpart, (Kumpfer, Smith, & Bellamy, 2002), ITT and meta-analytic strategies (including the current study) are not optimally equipped to detect these subtleties (Oxman & Guyatt, 1992). An alternative ‘middle-ground’ is suggested by Hawe, Shiell and Riley (2004) regarding the need for flexibility in complex interventions. They suggest that for complex interventions (defined loosely as an intervention in which there is difficulty in precisely defining ‘active ingredients’ and how they inter-relate), it is the function and process that should be standardised and evaluated, not the components themselves (which are free to be adapted).

Future Directions and recommendations

The findings from the current study are evidence that the stakes continue to be high for the adoption of universal social and emotional learning (SEL) programmes. Although the field has firmly established that SEL can potentially be effective in addressing serious societal concerns of social-emotional wellbeing and behaviour, there is comparatively limited understanding in how positive effects can be consistently maintained. As there is little caution in the speed of acceptance and roll out of SEL programmes internationally, despite these gaps in knowledge, findings of the current study have a global significance and present an opportunity to shape future directions and address several key lines of enquiry.

As SEL is a global phenomenon, the importance of additional work in understanding the significance of cultural validity specifically becomes increasingly important, given that results from the current study suggest that SEL programmes identified as successful can be rendered ineffective when transported to other countries. Aside from revising expectations of the likely effects that can be generated by an exported programme, there is arguably a wider methodological issue to be addressed when designing studies to assess transported programmes. For instance, additional work is needed in examining the potential importance of prior infrastructure across sites (such as those identified by Elliott and Mihalic (2004), and types and number of adaptations made (Berman & McLaughlin, 1976; Castro, Barrera, & Martinez, 2004; Hansen et al., 2013) occurs within a trial.

Addressing the recommendations above will require new thinking in methodological approaches in order to address the complexities of SEL interventions. The current study highlights both the strengths and weaknesses of meta-analytic approaches and therefore, a parallel but no less important, recommendation is for additional consideration of the complexity and heterogeneity of interventions using a full range of methodologies. Further meta-analytical approaches (e.g. by grouping studies into ‘clinically meaningful units’ (Melendez-Torres, Bonell, & Thomas, 2015) of function and process (Hawe, Shiell, & Riley, 2004) (e.g. mode of delivery)) alongside more ‘bottom-up’ approaches to examine the unique ecologies of individual classroom practices in more detail are advised.

There is an additional concern to better understand the internal logic models of individual programmes, ie. the active ingredients’ and how they inter-relate. More clearly specifying the ‘how’ and ‘why’ of programmes, would allow researchers to identify how various outcomes or ‘ingredients’ of SEL programmes are linked (Dirks, Treat, & Weersing, 2007). This is a daunting task, partially because difficulty in precisely defining ‘active ingredients’ is what makes an intervention complex. However, the methods employed should be guided by the substantive questions of the field. As literature is now addressing, not ‘does SEL work?’ (results have answered in the affirmative (Durlak et al., 2010; Sklad, 2012), but questions of ‘how does SEL work (or, why does it fail)?’ new methodological thinking is required to answer these. This meta-analysis represents the first of many large steps required to address this question.

Yüklə 120,02 Kb.

Dostları ilə paylaş:

1 2 3 4