Research and Evaluation blog's

Kamis, 09 Juni 2011

Evaluation Analysis Data

The time to think about how data will be analyzed and reported is
early in the evaluation planning. Conceptualizing what the audience for an
evaluation will desire in terms of analytical sophistication and precision can
help evaluators select among the many techniques available. Mapping out
what the end product should look like provides some of the structure needed
to guide planning of analysis procedures.
Constraints on evaluators’ choices among analytical options go beyond
what their clients will expect in reports, however. Time and resources will
affect the types of data collected, and thus the sorts of analytical techniques
that can be used. In many cases, evaluators must rely on data that others have
collected, or on the formats that others prefer for further data collection
efforts. Evaluators’ skills in effectively applying and reporting analytical techniques
may also limit the possibilities for analysis of evaluation data.
The chapters in Part Three present techniques for analyzing data collected
in evaluation efforts. The four chapters cover (1) analysis and interpretation
of data collected through qualitative data collection techniques
such as interviews and site visits; (2) selection, application, and reporting of
inferential statistics; (3) the application and interpretation of regression
analysis; and (4) the use of cost-effectiveness and cost-benefit techniques in
program evaluation.
The authors of these chapters describe analytical techniques in nontechnical
terms to clarify the relative advantages and disadvantages of the various
options. In each chapter, the authors describe the purpose of the analytical strategies and the types of evaluation questions that are most
amenable to application of each; the assumptions or requirements of the
data and the data collection methods that must be met to use each analytical
technique effectively; the sorts of information that should be provided in
reports about application of each technique; and the possible limitations that
may accompany application of the techniques.
Sharon Caudle, in Chapter Fifteen, discusses strategies for analyzing
data collected through observation, examination of documents, and interviews.
Data analysis activities discussed include content analysis, abstracting
and transforming raw data during the data collection process, developing
data displays organizing the data, and drawing and verifying conclusions during
and after data collection. She explains how to accomplish each of these
qualitative data analysis activities and lists references that provide further
guidance. Caudle suggest several approaches that evaluators can use to
strengthen the credibility, generalizability, and objectivity of qualitative evaluation
efforts—for example, triangulation, peer debriefing, informant feedback,
and the use of auditors to assess the evaluation process and product.
Kathryn Newcomer and Philip Wirtz, in Chapter Sixteen, describe a
variety of statistical techniques available to evaluators. They identify the most
important issues that evaluators should address when applying statistical
techniques to strengthen the conclusions drawn from the findings. They
describe basic distinctions among statistical techniques, outline procedures
for drawing samples and applying statistical tools, provide criteria for evaluators
to use in choosing among the data analysis techniques available, and
offer guidance on reporting statistics appropriately and clearly. Illustrations
of the application of the chi-square test and the t test are provided, along with
other guidance especially pertinent to the analysis of variables measured at
the nominal and ordinal levels of measurement.
Dale Berger, in Chapter Seventeen, demonstrates how regression
analyses can be applied to evaluate the results of a program. Berger introduces
the basic regression model and defines all of the basic concepts in clear
terms. The use of regression to analyze program data is illustrated for two
treatment groups with and without pretests. An extension of regression,
mediation analysis, which is appropriate when there is an intervening variable
that mediates the relationship between the program intervention and
outcome, is also explored. The chapter interprets regression analysis as it is
provided in SPSS computer output and then provides guidance on audiencefriendly
presentation of regression analysis.
James Kee, in Chapter Eighteen, offers guidance on the application of
cost-effectiveness and benefit-cost techniques in program evaluation. He outlines
opportunities to apply the various options, along with the issues evaluators
must address should they select one of these techniques. Kee provides guidance to evaluators as he describes cost-effectiveness analysis and its capabilities,
differentiates among the various types of benefits and costs that
should be arrayed in any benefit-cost analysis, offers suggestions on the valuation
of benefits, identifies common problems surrounding the measurement
of costs, and provides guidance on presenting cost-effectiveness and
benefit-cost information to decision makers.
The chapter authors carefully delineate the issues evaluators should
address as they select analytical techniques and report the results of analyses.
They discuss factors affecting such decisions and the potential threats to the
validity of results provided in evaluation reports. Replicability with the assurance
of consistent results is the hallmark of valid and appropriate data analysis.
Evaluators need to acknowledge analytical choices and unanticipated
obstacles to help ensure that results are interpreted appropriately.

Qualitative Data Analysis
(Sharon L. Caudle)

Qualitative analysis means making sense of relevant data gathered
from sources such as interviews, on-site observations, and documents and
then responsibly presenting what the data reveal. Often the journey from raw
data to what the data reveal is challenging, when, as Patton (2002, p. 431)
notes, “Analysis finally makes clear what would have been most important to
study, if only we had known beforehand.”
This chapter’s focus is on analytical strategies and practices that are
easy to use, low cost, and flexible enough to apply to a wide range of routine
program evaluation qualitative analysis tasks. Qualitative analysis, of course,
occurs in large-scale, formal program evaluations. But it also comes into play
in ad hoc, quick-turnaround analyses. For example, analysts are often called
on to analyze documents quickly or present findings on a policy or program
issue. For these shorter-term, high-impact qualitative analyses, following a
qualitative analytical strategy and supporting practices are as important as
for a large-scale, formal program evaluation.
Preanalysis Elements
The separation between research design, data sources and data collection,
analysis, and presentation of findings is never clear-cut, or wanted, in qualitative
research. The power of qualitative research comes in large part from
the ability to move between, explore, and enhance the design, data analysis,
and findings as the study proceeds. Analysis “works on” data. The quality of analysis in a program evaluation is particularly affected by certain preanalysis
elements that occur before the main analytical effort: research design and
data targeting, data collection and documentation, a data organization system,
and analyst team skills and knowledge.
Research Design and Data Targeting
One important element is a well-crafted research design and appropriate
data targeting to respond to the evaluation’s research questions and the
research design’s plan for analysis. Miles and Huberman (1994) say that study
design questions can be seen as analytical—an anticipatory data reduction—
as they constrain later analysis by ruling out certain variables and relationships
and attending to others. The analysis decisions, says Maxwell (1996),
should influence, and be influenced by, the rest of the research design. For
example, as Mason (2002) describes, when the analyst team determines the
research’s sampling strategy, the team will need to think ahead to what analysis
the team will likely conduct, ensuring a direct link among the sampling
strategy, the data analysis, and the presentation of the findings. The program
evaluation’s research design generally identifies the purpose of the evaluation,
the major research questions, and the strategy for data analysis.
The analysis plan identifies data sources and analysis methods. The
research questions must be translated to detailed questions for data collection
instruments. In the beginning, the detailed questions may be relatively
unstructured and then become more focused and structured toward the end
of the study as the evaluation defines key areas and findings are tested. In
addition, program events may change, forcing revisions to the research questions
and data collection instruments. The quality of analysis will be hampered
if the data targeting by the data collection instruments is not well done,
contains serious gaps, or drifts from the research questions over time.
One strategy to ensure good data targeting is to keep the evaluation
purpose and research questions close at hand as data collection instruments
are designed. I normally take each research question and formulate detailed
subquestions directly linked to the research question. These subquestions
then become converted to areas of coverage or questions for interviews and
document selection and analysis.
Data Collection and Documentation
A second element is adequate collection and documentation of relevant data.
Program evaluation data most often come from field interviews; documents
such as legislation and plans; other media such as videos, presentations, and
pictures; and direct or unrestricted observation, where the analyst takes note of settings, interactions, and events. Relevant data from these sources have
to be collected and documented to the maximum possible extent to be of
any analytical use. Too often, analyst teams take short-cuts that short-circuit
the analytical process.
For example, analysts may stop writing or recording as an interview
proceeds, making immediate choices about what is relevant or not. The
interview notes or recordings should be complete even if material does not
appear directly relevant. Once the interview is complete, write-ups should be
done immediately if they are from interview written notes and should be as
complete as possible, even if some material initially appears to be irrelevant.
The write-ups or transcripts should not be categorized or interpreted as they
are done. Analysts who selectively decide what to document or process data
into categories while writing up interview notes can make subsequent analysis
difficult. Lost in this approach are the words of the interviewee and the
detail of what was said. Later, if analysis indicates that one area is emerging
as an important finding, there may be no way to capture similar evidence in
earlier interviews if what appears to be irrelevant information has not been
collected or formally documented or if categories have already reduced the
data prematurely.
Data Organization System
The data organization system should organize program evaluation research
design and decision-making information, collected field data, analytical commentaries,
and observations and drafts from the beginning of the evaluation.
Qualitative research normally translates to what seems to be mountains of
interview write-ups, hard copy documents, summaries of preliminary findings,
presentations, and the like. For the beginning analyst, data organization frequently
is the task that will be done later. However, without a data management
system, the analyst and his or her team will constantly have data
identification, access, and retrieval problems.
Data organization can be fairly simple. One approach is to organize
the data by purpose and source in binders, with tabs for individual documents
or documents that can be bundled together. For example, one binder
can hold research design and analytical documentation, such as decision
rules, definitions, sampling strategies, and contact information. A second
binder contains interview write-ups, and a third, documents with their summaries.
Another binder contains analytical work, such as data displays, coding
and categories, and data summaries. A final binder contains documents
presenting findings, including early vignettes and summaries, presentations,
and draft reports. Feedback on the documents is normally filed with the
related document under the same binder tab. The documents are filed in chronological order with notations to cross-reference the source with other
data sources.
Material can also be electronically stored in databases, which are particularly
useful to share with the entire analyst team over a network. Whether
the analyst team uses a manual or electronic system, or a combination, the
team will need to keep records protected and secure if there are confidential
issues.
Analyst Team Skills and Knowledge
A final element is the analyst team itself. According to Patton (2002), no abstract
analytical processes can substitute for the skill, knowledge, experience,
creativity, diligence, and work of the qualitative analyst. Each analyst, whether
working as a single researcher or one of many, brings different skills and knowledge
to the analytical task. Strauss and Corbin (1998) emphasize that analytical
insights happen to “prepared minds”—using what analysts bring to the data in
a systematic and aware way to derive meaning without forcing analyst explanations
on data. Being prepared means having skills in forming questions,
selecting and sampling the components to be studied, identifying data sources,
collecting data, selecting data segments for analysis, and seeing patterns and
themes in the data. A skilled analyst will know how to ask an interview question,
listen to the answer, interpret its meaning and relevance, frame another
question to respond to the answer, and keep track of what needs to be
explored during the remainder of the interview, all at virtually the same time.
In addition, each analyst brings unique knowledge of theories, concepts,
models, and approaches to the evaluation. At the operational level,
some analysts will have conducted prior research at program sites, bringing
an understanding of context and relationships that will move the analysis
along. For example, the analyst will have valuable knowledge of program history,
key actors, standard operating procedures, and decision-making structures.
Moreover, analysts bring their own theoretical concepts, beliefs, and
assumptions to the evaluation—a strength if properly focused but a weakness
if they bring bias to the analysis. Overall, these skills and knowledge contribute
immensely to collecting good data ready for analysis.

Analytical Subprocesses and Practices
Qualitative data analysis is a complex set of intertwined processes and practices.
Data analysis has been described as the interplay between raw data, the
procedures used to interpret and organize the data, and the emerging findings
(for a fuller description, see Huberman and Miles, 1998; Patton, 2002; Strauss and Corbin, 1998; Yin, 1989; Maxwell, 1996). Data analysis consists
of the two major subprocesses: (1) data reduction and pattern identification
and (2) producing objective analytic conclusions and communicating those
conclusions.
The first subprocess of data reduction and pattern identification examines,
categorizes, tabulates, compares, contrasts, or otherwise recombines
and reduces the data in sifting trivia from significance. Data summaries, coding,
finding themes, clustering, and writing stories help identify patterns.
Tools such as data displays organize and compress the data set. The second
subprocess produces objective and compelling analytic conclusions that
address a study’s initial propositions, rule out alternative explanations, and
then communicate the essence of what the data reveal. The analyst is looking
for relevant and significant findings in the identified patterns, including
interpreting those findings. Techniques such as triangulation, looking for
negative cases, and checking results with respondents aid in addressing objective
and validity concerns.
Information technology can greatly facilitate qualitative analysis, generally
known as computer-aided qualitative data analysis (CAQDAS). (A good
discussion of using computers in qualitative research is provided by Patton,
2002, and Richards and Richards, 1998, including cautions regarding their
use.) The most commonly used CAQDAS packages are QSR Nvivo, QSR versions
of the earlier product known as NUD*IST, Ethnograph, ATLAS, and
Hypersoft. The software packages can aid data storage, coding, retrieval, and
comparison, particularly with large data sets where manual manipulation is
difficult and time-consuming. For example, QSR N6 operates on two data
sets: a document system that holds the documents and research notes and a
“node” system that holds the topics and categories for analysis. The two systems
are related by coding. N6 has text search and node search tools that
look for words or phrases and code those passages and compare coding and
relationships in various ways (QSR International, 2002).
For smaller evaluations, or where the software is not available or the
analyst team does not have sufficient experience in comfortably using the
packages, word processing and presentational software such as Microsoft
Word and PowerPoint are good alternatives. For example, Word can be used
to set up formal interview write-up and document summary formats, tables
for reducing data, and more structured matrices as the data are reduced.
PowerPoint can be used for developing figures, such as network data displays.
Further reduced data displays can also be imported into Word documents
for the final presentation of findings.
The following sections discuss key analytical practices and examples:
coding basics, memos and remarks, and data displays.

Wolcott (1990) observes that the critical task in qualitative research is not
accumulating all the data possible, but getting rid of most data that have
been accumulated. Content analysis facilitates sorting through the data by
identifying, coding or categorizing, clustering, and labeling to identify primary
themes or patterns. (For a full description and additional examples,
see Maxwell, 1996; Miles and Huberman, 1994; Strauss and Corbin, 1998;
Mason, 2002; Wolcott, 1990.)
Codes or categories are simply labels that assign meaning or themes
to the evaluation data. An analyst defines a code for a segment of data, labels
the segment, and then labels similar segments of data with the same code.
Coding therefore breaks data down into discrete elements, such as events,
relationships, or processes. Events could be specific activities, relationships
could be connections between subjects in a study, and processes could be
related steps or changes. The use of discrete element coding allows analytical
comparison for similarities and differences.
There are different options for units of analysis for coding purposes—
for example:
• Line-by-line analysis, closely examining phrase by phrase and sometimes
word by word
• Examining a whole sentence or paragraph
• Examining an entire document and determining what makes the document
the same as, or different from, other coded documents
The unit of analysis should be the same in an individual evaluation.
Although not every segment of data requires coding, the analyst should be
highly critical in identifying relevant information.
Analysts generally develop codes or categories in two ways, often in
combination. One way is creating precodes—codes or categories the analyst
team brings to the evaluation before collecting data, drawing from existing
program theory, the evaluation’s theories and hypotheses, research questions,
and program variables. The team might develop and test an initial set of
codes or phrases that sorts all available data. For example, as part of study of
homeland security issues, I drew on current literature and reports to develop
key concepts that were used in analyzing the data.
A second way is deriving the codes or categories from the program
being evaluated. Codes are derived inductively during the data analysis, pointing
to acts, activities, meanings, participation, relationships, settings, and
methods. These codes reflect reoccurring concept phrases or practices overtly
stated by program people being studied or those named by the analyst if those studied do not concretely name concepts. I participated in a study to provide
information on individual and cross-cutting federal agency approaches to create
and sustain effective state relationships in managing for results. One question
was how and why state performance goals, measures, or strategies differed
from those of the federal program. Exhibit 15.1 shows an excerpt of an interview
write-up with coding derived from the data.
Normally, codes or categories are considered as either descriptive or
interpretative (interpretative coding might also be called inferential). Descriptive
codes name things, such as processes, actors, or events. For example, a
descriptive code might be STPLAN, for “strategic planning.” Interpretive or
inferential codes provide additional meaning or identify emerging patterns,
themes, or explanations. For example, an interpretive code might be GOMO,
for organizations that are “going through the motions” in conducting strategic
planning. In larger evaluations, the coding is generally one word or
acronym, such as the STPLAN or GOMO illustrations. In smaller evaluations,
a phrase might be used as the code, such as “formal strategic planning principles.”
This is a practice I generally follow when working alone on a smaller
project since using phrases often will not require formal coding definitions,
and I normally use the language provided in interviews or documents.
Qualitative Data Analysis 423
Exhibit 15.1. Coding Concepts Developed from the Data
Interview Excerpt Possible Coding Concepts
Mr. Smith said that as he looked at the Federal Highway Awareness of federal goals
Administration [FHWA] goals on the Web site, he was
chagrined to realize the state had never taken DOT
[federal Department of Transportation] objectives
into account in the SWOT [strengths, weaknesses,
opportunities, and threats] analysis that the state goes Federal-state planning
through in its planning process. His thought was that process congruence
maybe the federal and state goals should be congruent.
The strategic planning process in the state department Strength of state planning
has been driven in the past by its directors, and the process
internal process is not as strong as it should be. If
the state had been more aware of federal goals and State direction to match
targets, it might have changed its own direction and federal goals
strategies. That just didn’t happen in terms of alignment.
Hopefully, the state will get to the same point as FHWA,
or at least in the same ballpark in many cases. But the State direction driving state
state would not have the same position as FHWA or other goals
states in many cases, because specific goals would depend
on the direction of the state. The state might have the Federal-state goals match;
same general goals but different strategies. However, the strategies differ
FHWA-state dialogue never happened.

The same segment of data might be labeled with both descriptive and
interpretive codes or phrases, with descriptive codes generally being applied
first, then the interpretative. Patton (2002) makes the point that there should
be careful separation of description from interpretation. In his view, interpretation
involves explaining findings, answering “why” questions, attaching
significance to particular results, and putting patterns into an analytical
framework. Interpreting the data should come after major descriptive questions
are answered.
In managing codes, the analyst should make sure the code name closely
matches the concept or practice it describes, be clearly defined, and have
instructions for its application. This management concept facilitates codes’
being applied consistently by individual analysts. The evaluation’s research
design should including testing the consistent application of codes. In addition,
data management should include the thoughtful testing of the codes’ ongoing
usefulness. The coding scheme should be adjusted as the analysis progresses to
include new codes, remove nonrelevant codes, and refine existing codes, such
as further differentiating codes that capture too broad a category of data. The
analyst then revisits already coded data to test previously assigned codes or apply
newer codes. Later, coded data can be clustered into more general categories.
The first cut of qualitative data analysis starts with the first interview or
observation or document. Depending on what data are available, the analyst
reads or listens to interviews and examines documents. Any memos or notes
written during the course of collecting the information are also part of the
analytical mix. At some point, the evaluation will turn to what Patton (2002)
calls confirmatory data collection—deepening insights into and confirming
or disconfirming patterns.
As the analyst team goes through the information, each analyst looks for
recurring elements and records data concepts, topics, insights, and potential
coding categories and relationships. These recordings might be done directly
on the data documents, such as in the margins, with sticky notes that can be easily
removed later, or in a binder of observations. Each analyst works back and
forth between the emerging coding categories and the data to define a coding
system that appears to cover the existing data. After the first review, the team
compares notes and builds the initial coding system, deciding on data segments
that appear similar or related in a consistent way, can be differentiated from
other categories, and appear to account for all, or virtually all, the data.
Memos and Remarks
Memos and other recording of analytical remarks during data analysis can
help capture the analyst team’s thinking about the data, such as questions to
ask, surprises, links to other data segments, potential coding or categorizing options, or tentative interpretations. They are intended to stimulate the
analysis process as data collection proceeds. The memos or remarks, says
Maxwell (1996), can range from a brief marginal comment on an interview
transcript or a theoretical idea recorded in a field journal to a full-fledged
analytical essay. Memos can also serve as an initial draft of information that
can be used in presenting findings.
My general practice is to place observations within the text as I transcribe
interviews or summarize documents, using brackets, as shown in Exhibit
15.2. Later thoughts can be shown as marginal notes. I generally include
analytical remarks in messages to the others of the analytic team.
Data Displays
Coding and memos and remarks are necessary, but not sufficient, for moving
analysis forward. Data displays can aid further data sorting and organization.
(For more information, see Maxwell, 1996; Huberman and Miles,
1998; Mason, 2002; Miles and Huberman, 1994; and Patton, 2002. Miles and
Huberman, 1994, have extensive examples.) Data displays present information
in visual formats such as matrices (essentially the crossing of two lists
using rows and columns) and networks (which display nodes or points with
links between them showing relationships). They are filled in as the analyst
works back and forth from the table to the data, adjusting rows and columns
and other labels.
Data displays are analytically powerful. They present a full data set’s
concepts and relationships in one location, opening the information to critical
thinking, confirming what is displayed, or considering new relationships
and explanations. Early rough data displays lead to more sophisticated displays
and related elaboration of emerging findings. The analyst team can
look for alternative ways to organize the data and illuminate new or variations
in patterns. Matrices are especially helpful in seeing distributions in the data—how often a particular code or category occurs. The choice of what data
to display is an analytical one, and the analyst team should closely anchor the
displays to the evaluation’s research questions and reflect the concepts and
relationships clarified by coding. As Miles and Huberman (1994) note, counting
goes on in the background when judgments of data qualities are made.
When the analyst team identifies a theme or pattern, the team is isolating a
data segment that is occurring a number of times and in a consistent manner.
For example, in the study of federal-state results management, I took
each interview question and cut and pasted from the interviews into tables
by program by state in the first step of data reduction. Exhibit 15.3 provides
the excerpt from the overall data display for just one question.
The next step of data reduction uses coding segments derived from the
data set of all programs for this one question, but applied to each program
area, and each state that matches the code pattern, shown in Exhibit 15.4.
Finally, the state observations were merged into one data set, allowing
the development of distributions and comments about outliers. In qualitative research, Mason (2002) says the analyst must ensure that the
data are appropriate for the research questions and that recording and analysis
of the data has not been careless and slipshod.
Threats to Objectivity and Validity
Although there are many threats to objectivity and validity of the evaluation
research and its findings, several are particularly important (for a fuller
description, see Mason, 2002, Maxwell, 1996, Huberman and Miles, 1998,
and Miles and Huberman, 1994).
Inaccurate or Incomplete Data. With inaccurate or incomplete data, what
the analyst team collects covers only part of story or is not accurate, for several
reasons. The team might miss data, selecting what are considered relevant
data, ignoring and thus not documenting other data. The team might
fall in love with, and thus overvalue, particularly good data sources. Or the
team might get rushed or fatigued and not go through data such as voluminous
documents or be able to conduct interviews objectively as the evaluation
proceeds. For example, over the course of a few months, I did over two
hundred phone interviews, each lasting approximately one hour. As I did
more and more interviews, the descriptions in large part seemed to represent
the same thing—“I have heard that before”—and it was a constant battle
to take full notes and transcribe them accurately.
Misinterpreting the Data’s Meaning or Meanings. The individual analyst—
or entire team—will interpret and present data according to the analyst’s disciplinary
background, training, and experience instead of representing the
point of view of program participants and their context. In addition to presenting
an analyst point of view, the analyst team might have a different understanding of the meaning of terms or concepts emerging in the research
and not test them within the evaluation’s context. Or the analyst might
“invent” data or misrepresent the perspective presented by the data source,
such as an interviewee. An analyst team might take documents as factual,
legitimate data, instead of viewing them as being constructed for particular
purposes and agendas. Often in my work, other members of the team might
discount interview descriptions because they run counter to what is said in
formal documents available from the Internet. I remind the team that the
formal document might be true, but alternative descriptions certainly are
worth the effort in checking out what is factual and what is constructed.
Discounting Data. People tend to overweight facts they believe in or
depend on, ignore or forget data not consistent with the direction of their
reasoning, and see confirming instances far more easily than disconfirming
instances. Furthermore, the team might come up with questionable causes
and suppress other evidence. For example, a pithy quote may not reflect a
clear pattern in the data and may be relevant to only a minor issue.
Failure to Sufficiently Document the Chain of Evidence. This refers to the
chain of evidence concerning the evaluation’s scope and methodology, key
analytical decisions, and cautions about what the analysis allows a team to
present, or not present, as findings. Instead, the analyst team may provide
only cursory information in the formal presentation of findings, or place a
jumble of information that is difficult to understand or recreate in the data
management system.

Countering the Threats
The team can counter these threats by several practices:
• Quickly and completely transcribe field notes or interviews and ask
for and collect all related program documents.
• Assess the available data and determine what data seem to be inconsistent
or missing based on other research or the analyst’s experience.
• Use triangulation to collect information from diverse sources with a
variety of data collection methods.
• Solicit feedback from others on methods, data sources, and preliminary
findings.
• Compare the emerging findings with similar research.
• Be alert to and rigorously follow up on unexpected data or data
relationships.
• Actively look for rival or competing themes or explanations that
might fit the data.
• Use negative case analysis (the active search for and examination of
cases that do not fit the pattern that appears to be emerging in the evaluation).
For example, in doing “best practice” research, it might be helpful to
contrast the context and characteristics of those cases with best practices with
others that are not known for best practices. These negative cases provide a
rich source of information to discern practices that can be touted and the
context in which they work.
• Use extreme cases to help verify and confirm conclusions and serve
as a way to explore key factors and variables. In best practice research, this
might entail rigorous study of the best organization and its practices. Unexpected
data, negative cases, and extreme cases should be clearly accounted
for in the analysis and the presentation of findings.
• Make sure that quotes come from a wide range of data sources, not
just from sources who state issues particularly well.
• Be rigorous in examining data relationships and connections. This
would involve questioning common understandings, the absence or presence
of other factors that could be affecting the relationships, and if the relationships
are supported by sufficient data.
• Minutely examine words and phrases that appear to be repeating in
the data, and use the data to point to actual meaning versus the team‘s interpreting
the meaning. This helps to guard against misinterpretations.
• Use team debriefings. Each analyst reviews available data, such as interview
write-ups and documents, and identifies themes. In the debriefings,
varying themes are reconciled or highlighted for further data collection and
analysis. This process also ensures that data content is shared across the team
to give individuals knowledge of the full data set and allow for different perspectives
to be brought to the data analysis.
• Use coding practices to increase validity and objectivity. Having the
same analyst recode material and another analyst independently code the same
material can increase coding accuracy. However, this is very time intensive, and
doing interim debriefings and analytical checks might serve just as well.
• Prepare documentation of the analysis and key decisions as if it will
be rigorously audited by an external party.
Interpreting the Data and Presenting the Findings
Analysis begins with the first data source and continues to the very end.
There is little distinction between data collection and the analysis process.
Analysis includes the data displays and related text explanations, but it can
also be analytical memos that are written during the course of the evaluation.
These would document meetings of the analytical team as well as analytical
insights that should become more sophisticated and abstract as the evaluation
proceeds. As findings begin to emerge, the analyst team should think
about interim and final data presentations.
Interim Findings
As data are collected, analytical insights will occur, such as patterns and
themes. At least in the early part of the evaluation, no data gathering and
analytical doors are closed as the emphasis centers on complete and accurate
data and developing findings. As data collection and analysis move into
full swing, the analyst team should continually produce analytical products
that can be presented to various audiences, such as funders and decision
makers, and secure feedback to improve the evaluation. “Early and often”
analysis and presentation continually informs and develops strategies for the
research design, data collection instruments, sampling strategy to fill in data
gaps, consideration of other methods, and testing preliminary findings
against current theory or theory construction. It also serves to tease out alternative
explanations. Relationships between segments of data can be explored,
such as correlation or apparent cause and effect. I normally develop short
paragraphs with examples of emerging findings early on that present a short
analytical story. I also develop interim briefings and presentations to hone
the analysis and the emerging findings.
Miles and Huberman (1994) recommend interim products such as case
summaries and vignettes. Interim case summaries present what the analyst
team knows about the case and what remains to be found out. These summaries
present a review of findings, evaluation of the data quality supporting them, and the agenda for the next data collection. A vignette describes a series
of events believed to be representative or typical of the data, done with a narrative
or story approach. For example, as part of a research team, I examined
how certain leading organizations approached information technology performance
management, studying practices of both public and private sector
organizations. The team used case study illustrations to illustrate key practices.
For example, one practice was assessing performance maturity and developing
complete performance definitions. An illustration of gaining experience
with fundamental measures and then expanding, shown in the box, survived
as a vignette during the interim analysis and became a text box used in the
final report.

Example Case Vignette
Kodak is one organization that is systematically defining the maturity of
each measure it plans to use in its balanced scorecard. Kodak categorizes
measure maturity as fundamental, growing, or maturing. Established indicators
are considered as fundamental. Growing measures are evolved from
the fundamental, but are not the best they can be. Maturing measures are
defined as best-in-class for whatever they are measuring. For example, for
internal performance, a fundamental measure is to meet all service-level
agreements, a growing measure is information delivery excellence, and a
maturing measure is defect-free products and services. Kodak believes it is
important to build the right fundamental practices first in developing an
initial information technology performance management system

Presenting the Final Findings
The final step of qualitative analysis is formally presenting the final findings
(for a fuller description, see Yin, 1989, Patton, 2002, and Wolcott, 1990). Early
and frequent analysis should make this final step less daunting as it focuses
material for composing the finding presentation. In fact, Wolcott (1990) says
that writing is a form of thinking that will help the analytical process.
The analyst team should have a clear strategy and responsibilities for
key presentation events and processes.
Setting and Adhering to Milestones. This is a key element in presenting
findings. Bounding the time of the evaluation from beginning to final delivery
date is important, but more important is setting key deliverable dates and
keeping to them during the course of the analysis. The analyst team should
specify an evaluation’s research plan that is realistic about access to data sources, time for data collection, analysis efforts, internal review and clearance,
and securing feedback or comments from external sources.
I am always amazed when a research design ignores vacations, holidays,
and other events that will steal time and expertise away from the analysis and
presentation of findings. These can be internal to the analyst team, such as
planned vacation time. However, they often affect data collection. For example,
the two weeks around Christmas and New Year’s are deadly for conducting
interviews or other data collection in government settings. If data
sources are involved in budget formulation, then budget season may delay
data collection.
Determining Audience Needs. The analyst team will need to keep in
mind the possible audiences and what each specific audience will want in
terms of the analysis. For example, Yin (1989) says that colleagues will be
interested in relationships between the findings and other research. Decision
makers will want to know action implications for action. The analyst
team likely will need to devise multiple products for diverse audiences that
will respond to their analytical needs or make conscious decisions of what
audience will not be satisfied at all or in part.
Organizing the Analytical Products. By the end of the analysis process,
which has occurred since the beginning of the evaluation, the analyst team
will have many data displays, summaries of interview findings, document summaries
and implications, examples that can illustrate findings, and feedback
comments. These analytical information sources should be organized,
indexed, and protected so they can be easily retrieved and controlled as writing
starts. This is particularly important if several members of an analyst team
are responsible for writing and need access to the information, which they
can use to illustrate draft findings. Control also comes out of organizing. This
means making sure that a single data source does not overpopulate the findings
and that there is balance across data sources. For example, in doing studies
of homeland security, one data source was particularly robust, and each
analyst wanted to use it as an example. If one data source is the primary
source of examples, then it soon appears that the findings are of the one data
source, not the total data set.
Outlining and Sequencing the Report Presentation. Wolcott (1990) says
that one key to writing up qualitative research is a detailed written outline
that will clearly identify major and subordinate points and assess if the structure
will accommodate the data and an appropriate presentation sequence.
In fact, he advocates writing a draft before beginning fieldwork as it will
remind the analyst team about format, sequence, space limitations, and
focus. In addition, the draft writing will surface and document analyst team
beliefs, including biases and assumptions. Wolcott says that organizing the
report has no set answers, such as if to organize the report as events occurred.

While writing a draft before collecting data is not a practice I have
used, I have found preparing an outline about midway through data collection
and analysis to be helpful. Decisions about the audience for the bulk of
reporting should have been made, and there are generally considerable analytical
products available. The outline is continually revised, with points
added or deleted, as analysis comes to a conclusion.
Composing and Tightening the Findings. This element includes deciding
what goes in the report and what writing style will be followed. As Patton
(2002) says, reporting findings is the final step in data reduction, but decisions
have to be made about what finds its way into the report. For example,
the analyst team will have to decide how much description to include, such
as what direct quotations, if any, should be woven into the report and what
balance there will be between description, analytical products, and interpretation.
In many reports I have worked on, developing tight descriptive
and inferential information, collapsing data displays, and deciding on the
most important examples is often difficult, but it must be well done if the
report is to be well received.
For example, I developed several themes and suggested practices for
the study regarding federal-state results management. The box contains an
abbreviated excerpt of a paper presenting findings that integrated data from
several questions regarding state involvement in and impact on federal program
performance decisions and recommendations for improvement. It
includes a sense of the number of officials who recommended improvements,
illustrative paraphrased comments from officials, areas of disagreement, and
recommendations.
Data displays are also a tool I favor in final reports. For example, I
conducted research on how financial institution regulatory agencies could
improve their annual performance plans. The information came from federal
and state organizations that were identified as using or planning to use
a variety of useful practices. One data display that was used identified different
types of performance comparisons to set performance targets:
• Redefined performance expectations
• Future performance levels or changes in levels to be achieved at a later date
• Best practice benchmarks from other organizations
• Program implementation milestones
An abbreviated data display from the full data set from the final report
is shown in Table 15.2.
Patton (2002) advises focusing by determining the essence—what is substantively
significant—and providing enough detail and evidence, including

Reporting the Findings: An Example
A second federal agency practice was involving states as full partners in federal
Government Performance and Results Act (GPRA) decision making.
Interviews indicated federal agencies should educate state officials
about GPRA, including its measurement of federal and state performance.
A significant number of state officials were only vaguely aware of GPRA,
and others did not have any knowledge of GPRA. Many state officials did
not see the connection between GPRA and the state delivery of federal programs.
However, there was some state disagreement if Congress intended
GPRA to measure state performance or federal performance in delivering
federal programs. Some saw GPRA as intended to measure federal responsibilities,
not state responsibilities. Others believed federal agencies could
not mandate federal performance goals, measures, targets, or strategies,
but could hold states accountable for individual state goals. State officials
often became concerned if federal agencies moved beyond high-level statements
to stipulate state or local goals, measures, targets, or strategies.
Many state officials involved in the GPRA decision-making process
believed they had little voice in the actual formulation or finalization of
national goals for programs under their responsibility. Some saw their comments
on proposed national goals as largely a pro forma federal activity
allowing federal officials to say they had involved the states in GPRA decision
making. Most state officials believed state input could inform federal
officials about goals important to states, the impact of policy decisions, and
performance implementation options and capabilities. The state officials
generally recommended ongoing federal-state consensual GPRA decision
making. Federal officials should work with state officials or national state
association officials to formulate national GPRA goals, annual performance
targets, measures, or strategies, depending on the latitude in the program
legislative performance requirements. Overall, the interviews indicated that
if states were involved in and had impact on formulating GPRA decisions,
then there was more state acceptance and ownership of the final decisions
and overall GPRA process.

sufficient context, to make the case, yet not including everything that could be
described. However, the initial composition of the report should include multiple
examples and illustrations—more than you know you will need—and
comprehensive analysis or interpretation. The analyst team should not rush
too quickly to delete or synthesize material. Through additional drafting, the
initial composition can be more thoroughly examined, holes filled in, and
“extra” material deleted or folded into a tighter description. As the composition develops in the final stages, the best examples and illustrations can be
retained.
Identifying What Objectivity and Validity Tests Will Involve the Draft Findings.
This final element includes the use of the tests in composing the final
report. The analyst team has both internal and external choices for securing
feedback on the draft findings. Internally, the team can impose a formal
review of the report and its conclusions. However, the formal review should
be done not by the team itself but by others as cold readers. If the team conducts
its own review, too often the team members zero in on their own material,
testing yet again for substance and writing style but not thinking about
threats to objectivity and validity. Externally, the team asks for a review by program
officials and colleagues that can add to the objectivity and validity of
the findings. Externally, the team can also use an advisory committee to review
and comment on draft materials.
Highlighting a Few Key Points
I am often asked how I see relevant patterns in what others may see as a hopeless
morass of data. No one is born with an inner eye and instinct to conduct
qualitative data analysis. This chapter has highlighted many elements, tools, and issues in qualitative analysis efforts that help in seeing relevant patterns.
These are summarized in Table 15.3 in the form of practices.
These practices define a rigorous and comprehensive approach to
qualitative data analysis. However, they also should be applied within a philosophical
framework that values a rigorous and comprehensive approach.
Qualitative data analysis can be enhanced by the analyst team’s valuing a
philosophical framework that stresses certain key points.
Analytical Knowledge Development. This means being a serious student
of program evaluation methods, including analytical techniques and practices.
While some knowledge comes from formal graduate courses, much
more comes from subscribing to journals and purchasing books—and reading
them. Too often, analysts get locked into techniques they have used in
the past or do not refresh their knowledge. Keeping up with the literature is
a constant reminder of what is possible and what are new approaches.
Obstinate Attention to the Research Questions. This means not drifting off
the path of the evaluation’s purpose, from a large-scale evaluation to providing
comments on a document. I continually revisit the research questions
and keep assessing if the data collection and analysis are answering the evaluation
purpose, as stated in the research questions. It is frequently very easy
to start pursuing data sources and collection that may be personally interesting
but have little, if anything, to do with the intent of the evaluation. The research questions keep the analyst tethered to the main goal. Changing
events that make the research questions moot or in serious need of revision
should necessitate a change in the evaluation’s purpose.
Theoretical and Context Preparation. This means developing knowledge
about two things. One is relevant theory, especially other research, that can
help the research design and what might be possible avenues for analytical
insights. The other is the evaluation context, most often program details,
organizational settings, and governmental relationships. This knowledge
leads to a better understanding of what is or could be happening in the program.
Even in doing a document review, I try to understand the context and
history of the document and bring that to the analytical exercise.
Insisting on Thick Data Collection and Description. An analyst is only as
good as the data he or she is analyzing. If data sources are minimal or not
forthcoming with description or if the description is not captured in evaluation
work papers, then the analysis is on a slippery slope. The outcome is generally
speculation or superficial analysis that is questionable for decision
making. The bottom line is to work very hard to collect all relevant data.
Listening to and Recycling the Data. I continually try to look at different
perspectives and possible alternatives as I listen to the data and recycle the
data through iterative analysis as more data are collected or I start understanding
what the data seem to be saying. The saying, “It’s not over ‘til it’s
over,” is a good principle to follow here.
Experience. Years of doing qualitative analysis build skills in designing
evaluations, preparing for and conducting interviews, organizing the data,
crafting data displays, and developing interim findings. Experience also
brings sensitivity to looming trouble, such as slipping milestones and findings
that seem to overreach when compared to the actual data sources.
Conclusion
The final judge of qualitative analysis is the evaluation product’s reception and
use. I am always proud of products where the findings are meaningful and relevant
to the audience, are factually correct and fully supported by the data, are
well presented, and meet project time lines. Each qualitative analyst needs to
develop a personal scorecard to judge the quality of his or her analysis. The
strategies and techniques presented in this chapter will lead to fine ratings.

References
Huberman, A. M., and Miles, M. B. “Data Management and Analysis Methods.”
In N. K. Dezin and Y. S. Lincoln (eds.), Collecting and Interpreting Qualitative
Materials. Thousand Oaks, Calif.: Sage, 1998.
Mason, J. Qualitative Researching. (2nd ed.) Thousand Oaks, Calif.: Sage, 2002.
Maxwell, J. A. Qualitative Research Design: An Interactive Approach. Thousand
Oaks, Calif.: Sage, 1996.
Miles, M. B., and Huberman, A. M. Qualitative Data Analysis: An Expanded
Sourcebook. (2nd ed.) Thousands Oaks, Calif.: Sage, 1994.
Patton, M. Q. Qualitative Research and Evaluation Methods. (3rd ed.) Thousand
Oaks, Calif.: Sage, 2002.
QSR International Pty Ltd. N6 Reference Guide. Doncaster, Victoria, Australia:
QSR International Pty Ltd., Mar. 2002.
Richards, T. J., and Richards, L. “Using Computers in Qualitative Research.”
In N. K. Dezin and Y. S. Lincoln (eds.), Collecting and Interpreting Qualitative
Materials. Thousand Oaks, Calif.: Sage, 1998.
Strauss, A., and Corbin, J. Basics of Qualitative Research: Techniques and Procedures
for Developing Grounded Theory. (2nd ed.) Thousand Oaks, Calif.: Sage,
1998.
U.S. General Accounting Office. Managing for Results: Strengthening Regulatory
Agencies’ Performance Management Practices. Washington, D.C.: U.S. Government
Printing Office, Oct. 1999.
Wolcott, H. F. Writing Up Qualitative Research. Thousand Oaks, Calif.: Sage,
1990.
Yin, R. K. Case Study Research: Design and Methods. Thousand Oaks, Calif.: Sage,
1989.

Sabtu, 04 Juni 2011

Program Evaluation in Language Education

1.1 Introduction
Evaluation has many meanings in language programs. It is part of the
novice teacher’s checklist to guide the development of initial lesson plans and
teaching practice, a process of determining learning achievements or student
satisfaction, and a dimension of the analysis of data in a formal evaluation
or research study. It refers to judgements about students by teachers and by
external assessors; the performance of teachers by their students, program
managers and institutions; and programs, departments and institutions by
internal assessors, external monitors and inspectors. Evaluation is about the
relationships between different program components, the procedures and
epistemologies developed by the people involved in programs, and the
processes and outcomes which are used to show the value of a program –
accountability – and enhance this value – development.
This chapter provides an overview of this territory. It identifies themes
and notions examined in their historical context in Part 1, in the case studies
in Part 2, and in the ways forward for language program evaluation in Part 3.
In this chapter we outline three characteristics of language program evaluation
as a field of study (section 1.2). Then we set out five challenges for
evaluation – themes which together constitute a framework for developing
the theory and practice in relation to aspects of language programs informed
by Applied Linguistics on the one hand, and by the fields of education and
management on the other.
1.2 Three features of evaluation
The study and practice of evaluation has developed in diverse ways over recent
decades. These developments are driven by issues from within evaluation
and aspects of the wider socio-political context. Three features of evaluation
theory and practice illustrate the complexity of these developments and the
difficulties inherent in the task of mapping achievements and directions.

First, there is the question of definition; evaluation is a form of enquiry,
ranging from research to systematic approaches to decision-making. Our
account of the history of evaluation over the past several decades in
chapter 2 illustrates a progression from reliance on stripped-down statistical
representations of a program to inclusive, multi-perspective approaches.
The common thread – the making of judgements in a shared context –
gives a problematically wide-ranging basket of activities. Thus, in the context
of an innovative language program, evaluation might include periodic
reviews of the budget, staff appraisal and decisions relating to professional
development, iterated classroom observation for professional development
of teachers or for quality assurance purposes, narratives of experience from
participants, as well as a one-off study to inform on the success of the
innovation.
Second, there are two perspectives on evaluation research. It is viewed, on
the one hand, as a type of study which has both research functions – rolling
back the frontiers of knowledge – and evaluation functions – providing
information for judgements or decision-making; and, on the other, as research
into the processes of evaluation. The former perspective has been significant
in language program evaluations, as evidenced by edited collections such
as those by Alderson and Beretta (1992) Rea-Dickins and Lwaitama (1995)
and Rea-Dickins and Germaine (1998). In the latter perspective evaluation
research can be seen as analogous to the research which has for decades
underpinned the validity and reliability of language testing processes. (For
a recent account of the processes and issues here, see Weir 2004 in this
series.) There is no doubt that evaluation processes require such epistemological
and methodological underpinning – Lowe (1995) and Dornyei
(2003), for example, examine in detail the issues involved in questionnaire
design and completion; while Alderson and Beretta (1992) and Saville and
Hawkey (2004) note that validation procedures in test design have tended
to be much more extensive than in the design of other evaluation instruments.
Without an understanding of the data types which are most appropriate for
the different uses of evaluation, there may be a tendency to inefficient
scrutiny of all practices, documents and perspectives, with constant doubts
regarding the extent to which they actually evidence the success or otherwise
of the program in question.
Third, many accounts of evaluation do not reach the public domain. For
a range of reasons, some proper, others less so, evaluation processes and
findings remain either insufficiently documented or unpublished. One outcome
of this feature of evaluation is the difficulty of mapping theory and
practice when some of the terrain is obscured from view. Evaluations of social
programs are for the most part funded from the public purse. In addition,
they involve aspects of people’s lives in a way that profiles legal and ethical
issues. Thus, there are contending forces for transparency and confidentiality,
which means that the issue of publishing and not publishing evaluations
is difficult. The case studies and discussion in Part 2 illustrate in particular
evaluation contexts both potential conflicts in, and principled resolutions
to, managing accountability and anonymity in evaluation practice. In Part 3
we revisit these issues in the context of guidelines for practice in, and research
into, evaluation processes.
Together, these features present difficulties, but also opportunities. In this
book we bring together perspectives from published evaluations, unpublished,
but researched evaluations, and from the wider discourses of evaluation
in particular fields. The case studies in Part 2 address issues of evaluation
purpose and design, the role of evaluation in program decision-making and
policy development; the roles of stakeholders in evaluations, and of evaluation
in the lives of stakeholders, evaluation and learning in language programs, and
evaluation as a procedure for quality management in programs, departments
and institutions. In each case study, we pay particular attention to the construct
of evaluation – what the data represent and how they correspond to the stated
objectives on the one hand, and the wider purposes of the program on the
other. In Part 3 we explore options for future development in terms of research
into evaluation policy positions within programs, frameworks and guidelines
for practice and methodological orientations. In addition, we examine research
possibilities into the cross-cutting issues of stakeholder evaluation, ethicality
and fairness, and ‘learning to do’ evaluations. This broad-based perspective
on language program evaluation as the examination of situated language
programs can complement, on the one hand, the more theoretical orientations
to understanding language learning in instructed settings in Applied Linguistics
and, on the other, the local development of teaching skills, learning materials
and other program components in schools, universities and ministries of
education world-wide.
To develop this analysis of the potential of evaluation, we set out five
challenges. These are reflections of the features outlined above in two
ways: first, they have proved enduring issues in the development and practice
of evaluation in recent decades (we explore the issues here more fully
in the following chapters and in Part 2); and second, they represent areas
for evaluation theory and for evaluators in different program settings to
engage with.
1.3 Five challenges for evaluation
There are five challenges which we see as characterising the theoretical
orientation and practice of evaluation. The challenge in each case is to understand
and communicate the issues involved in the following dimensions of
evaluation:
1. The purpose of evaluation in its social and political context.
2. The informants who people programs and evaluations
3. The criteria which generate evaluation frameworks, instruments and
ultimately judgements.
4. The data which validate these approaches and instruments, and complete
the construction of judgements.
5. The use of evaluation findings in managing social programmes.
Evaluation purpose: The challenge of evidence-based public policy
development
The ideas which shape public and social policy in this period of late or
postmodernity represent a shift away from ideology-driven programs derived
from philosophical positions and grand theories. New perspectives on the
social aspects of our human nature, such as evolutionary psychology, activity
theory and game theory, give a view of individual and social behaviour
which is infinitely complex. This complexity combines with reassessments
of the success or appropriateness of the social projects of high modernity to
generate a need to move beyond debates focused on nature/nurture, social/
individual and public/private in defining and developing the role of the
state, and of public programs in the lives of citizens. These debates have
been characterised in Western democracies over the last decade by new
syntheses of Right and Left in public sector programs relating to health
care, social welfare and education, such as the Third Way (Giddens 1998).
In education in particular, the task has shifted from universal provision to
effectiveness for particular groups in particular settings (in England and
Wales this educational debate has been characterised by the issue of ‘bogstandard
comprehensives’ – should there be one national approach to the
structure, resourcing and curriculum of secondary schools, or should there
be a diversity of approaches as determined by local stakeholders and factors?).
In Applied Linguistics, understanding second language acquisition and the
teaching strategies which best facilitate this are engaging with diverse
social and personal factors rather than focusing on universals of cognition:
Cook (2000), Block (2003), Lantolf (2000) and Kramsch (2002), for example,
explore those social and cultural dimensions of language learning which
generate new perspectives on the roles of context and identity in language
learning.
The focus on what works in such policy development implies a strong role
for evaluation. Patton (1997: 192–4) lists 58 types of evaluation, all of which
involve understanding the impact of these programs on the problems to be
resolved or the situation to be improved. The unifying theme in these different
purposes is their shared platform of evidence, and the ways in which it can
serve to inform on programs and policies. The verbs – appraise, assess, audit,
examine, monitor, review, etc. – all suggest judgements based on empirical
scrutiny of the program in operation.

WAJAR 9 Tahun

Penuntasan Wajib Belajar Pendidikan Dasar 9 Tahun ditargetkan selesai pada tahun
2008/2009. Indikator utama penuntasan Wajar Dikdas adalah pencapaian Angka Partisipasi
Kasar (APK) SMP secara nasional mencapai 95% pada tahun 2008/2009. Dari sisi jumlah
siswa, pemerintah bersama masyarakat harus mampu menyediakan layanan pendidikan
terhadap sekitar 1.9 juta anak usia 13 – 15 tahun yang selama ini belum memperoleh
kesempatan belajar di SMP/MTs/ yang sederajat. Penuntasan Wajar Dikdas 9 Tahun harus
merupakan program bersama antara pemerintah, swasta dan lembaga-lembaga sosial serta
masyarakat. Upaya-upaya untuk menggerakkan semua komponen bangsa melalui gerakan
nasional dengan pendekatan budaya, sosial, agama, birokrasi, legal formal perlu dilakukan
untuk menyadarkan mereka yang belum memahami pentingnya pendidikan dan menggalang
partisipasi masyarakat untuk mensukseskan program nasional tersebut.
Apa Tujuan Penuntasan Wajar 9 Tahun ?
Tujuan utama dilaksanakannya gerakan nasional penun¬tasan Wajib Belajar Pendidikan Dasar
9 Tahun adalah :
1. Mendorong anak-anak usia 13-15 agar masuk sekolah baik di SMP, MTs maupun
pendidikan lainnya yang sederajat.
2. Meningkatkan angka partisipasi anak untuk masuk sekolah SMP/MTs terutama di daerah
yang jumlah anak tidak bersekolah SMP/MTs masih tinggi.
3. Menurunkan angka putus sekolah SMP/MTs atau yang sederajat
4. Meningkatkan peran serta masyarakat dalam mensukses¬kan penuntasan Wajib Belajar
Pendidikan Dasar 9 Tahun.
5. Meningkatkan peran serta organisasi kemasyarakat¬an dalam mensukseskan gerakan
nasional penun¬tasan Wajib Belajar Pendidikan Dasar 9 Tahun.
6. Meningkatkan peran, fungsi dan kapasitas pemerin¬tah pusat, pemerintah propinsi,
kabupaten/kota dan kecamatan dalam penuntasan wajib belajar di daerah masing-masing.
Siapa Saja Sasaran Penuntasan Wajar 9 Tahun ?
Sasaran gerakan nasional penuntasan Wajib Belajar 9 Tahun ini adalah untuk :
1. Anak usia SMP/MTs atau yang sederajat (13 – 15 tahun) yang belum belajar di SMP/MTs
atau yang sederajat
2. Anak kelas VI SD yang karena alasan ekonomi dikhawatirkan tidak dapat melanjutkan ke
SMP/MTs atau yang sederajat
3. Anak putus sekolah SMP/MTs atau yang sederajat
Di mana Kita Harus Belajar ?
Untuk belajar di SMP/MTs atau yang sederajat, anak-anak usia SMP dapat memilih sekolah
yang sesuai dengan pilihan dan kesempatan yang dimiliki, seperti:
1. SMP Negeri atau SMP Swasta Biasa
2. SD-SMP Satu Atap
3. SMP Terbuka
4. MTs Negeri atau MTs Swasta atau sekolah lainnya yang sederajat
5. Pondok Pesantren Salafiyah yang menyelenggarakan program Wajib Belajar
Apa Kemudahan yang Diperoleh Kalau Sekolah ?
Anak usia 13 – 15 tahun yang sekolah dapat memperoleh bantuan keuangan untuk mengikuti
pendidikan sebagai berikut :
1. Semua anak SMP/MTs atau yang sederajat dapat memperoleh Bantuan Operasional
Sekolah (BOS) dengan prioritas kepada siswa yang tidak mampu, sebesar Rp.
324.500,-/siswa/tahun. BOS diserahkan pengelolaannya kepada sekolah.
2. Beasiswa retrieval, sebesar Rp. 1.000.000,- /siswa/ tahun untuk tahun pertama dan Rp.
500.000,-/siswa/ tahun bagi anak putus sekolah SMP/MTs
3. Beasiswa transisi bagi siswa kelas VI SD/MI atau yang sederajat yang karena alasan
ekonomi terancam tidak dapat melanjutkan pendidikan ke SMP/MTs. Besar beasiswa transisi
adalah Rp. 1.000.000,-/siswa/tahun.
4. Beasiswa untuk siswa SMP Terbuka, sebesar Rp. 240.000,-/siswa/tahun
Siapa Saja Yang Terlibat dalam Penuntasan Wajar 9 Tahun ?
Penuntasan Wajib Belajar 9 Tahun adalah program nasional. Oleh karena itu, untuk
mensukseskan program itu perlu kerjasama yang menyeluruh antara:
1. Pemerintah Pusat (Menko Kesra, Mendiknas, Mendagri, Menkeu, Menpan/Ketua
Bappenas, Menag, Mensos, Menteri Pertanian, Menteri Kehutanan, Menteri Menteri Kelautan
dan Perikanan, Menteri Perindustrian, Menakertrans, Menteri Hukum dan HAM, Menteri
Kominfo, Menneg Lingkungan Hidup, Menneg Pemberdayaan Perempuan, Menneg
Pembangunan Daerah Tertinggal, Menneg Pemuda dan Olahraga, Menneg BUMN, Kepala
Badan Pusat Statistik)
2. Pemerintah Propinsi (Dinas Pendidikan Propinsi)
3. Pemerintah Kabupaten/kota (Dinas Pendidikan Kabupaten/ kota)
4. Pemerintah Kecamatan (Kantor Cabang Dinas Pendidikan Kecamatan)
5. Kelurahan
Di samping itu, masyarakat dan organisasi-organisasi sosial kemasyarakatan, seperti Dharma
Wanita, PKK, Bhayangkari, Dharma Pertiwi dan lainnya diharapkan tetap meningkatkan
partisipasinya dalam penuntasan Wajib Belajar 9 Tahun.
Apa Manfaat BOS dalam Penuntasan Wajar 9 Tahun ?
Bantuan Operasional Sekolah (BOS) adalah dana dari pemerintah pusat yang didistribusikan
melalui pemerintah daerah ke SMP/MTs/yang sederajat melalui rekening sekolah untuk
membantu kegiatan operasional sekolah dalam rangka penuntasan Wajib Belajar Pendidikan
Dasar 9 Tahun. BOS dihitung berdasarkan jumlah siswa, sehingga sekolah yang jumlah
siswanya lebih banyak dalam penuntasan Wajib Belajar akan menerima BOS lebih besar. BOS
sebagai bagian dari dana penyelenggaraan pendidikan digunakan untuk membantu sekolah
dalam hal-hal berikut :
1. Pembiayaan seluruh kegiatan dalam rangka Penerimaan Siswa Baru
2. Pembelian buku teks pelajaran dan buku penunjang untuk dikoleksi di perpustakaan.
3. Pembelian bahan-bahan habis pakai seperti ATK, bahan praktikum, buku induk siswa,
buku inventaris, langganan koran, dan kebutuhan sehari-hari di sekolah.
4. Pembiayaan kegiatan kesiswaan
5. Pembiayaan ulangan harian, ulangan umum, ujian sekolah dan laporan hasil belajar
siswa.
6. Pengembangan profesi guru: pelatihan, KKG/MGMP dan KKKS/MKKS.
7. Pembiayaan perawatan sekolah seperti pengecatan, perbaikan atap bocor, dan
perawatan lainnya.
8. Pembiayaan langganan daya dan jasa: listrik, air, telepon
9. Pembayaran honorarium guru dan tenaga kependidikan honorer sekolah yang tidak
dibiayai Pemerintah dan/atau Pemerintah Daerah.
10. Pemberian bantuan biaya transportasi bagi siswa yang tidak mampu.
11. Khusus untuk pesantren salafiyah dan sekolah keagamaan non Islam, dana BOS dapat
digunakan untuk biaya asrama/pondokan dan membeli peralatan ibadah.
12. Pembiayaan pengelolaan BOS: ATK, penggandaan, surat menyurat dan penyusunan
laporan.
13. Bila seluruh komponen di atas telah terpenuhi dari BOS dan masih terdapat sisa dana
maka sisa dana BOS tersebut dapat digunakan untuk membeli alat peraga, media
pembelajaran dan meubelair sekolah.
Penggunaan dana BOS untuk transportasi dan uang lelah bagi guru PNS diperbolehkan hanya
dalam rangka penyelenggaraan suatu kegiatan sekolah selain kewajiban jam mengajar.
Besaran/satuan biaya untuk keperluan di atas harus mengikuti batas kewajaran. Ke Mana untuk
Bertanya tentang Penuntasan Wajib Belajar 9 Tahun? Pertanyaan lebih mendetail tentang
penuntasan Wajib Belajar 9 Tahun dapat ditujukan kepada pihak-pihak yang terlibat dalam
penuntasan Wajar 9 Tahun seperti tersebut di atas.

Impact Evaluation Programs

What is impact?
The heart of your EARS report
Identifying and fleshing out true public impact in your extension programs is
essential to developing effective impact reporting. Demand for impact
information mounts as lawmakers, the public and our partners all want to
know about the return on investments in land-grant university extension,
research and teaching. Understanding the essence of true public impact, or
benefit is key. We often report things like attendance figures, what people like
about an event, the number of meetings held or acres served, a new grant as
impact. While some of this information provides context, none is impact.
Defining impact
􀂄 Basically, impact is the reportable, quantifiable difference, or potential difference, that
your project or program is making in real people’s lives. It reports payoffs and
benefits to society. The focus is on public – not internal or personal – benefit.
􀂄 Impact is change or potential change in one or more key areas:
• Economic.
• Environmental.
• Social.
• Health and well-being.
Reporting impact
􀂄 An impact statement is a brief summary, in lay terms, that:
• Highlights the difference your program is making for the public good.
• Concisely summarizes what you did to achieve this difference.
• Clearly states payoffs to society.
• Answers key questions: So what? Who cares? Why?
􀂄 An impact statement is not:
• Just more paperwork.
• A long, detailed report.
• Numbers of people reached, meetings held, acres served. These provide
context but alone, they don’t capture the element of change essential to good
impact.
• A detailed description of the process or what’s been done.
• A list of additional grants, honors, recognition for organizers.
􀂄 Be specific. Report economic, environmental, social or health/well-being impact in
terms of:
• Knowledge gained and how that knowledge is applied.
• Behavior or attitude changes.
• Practice or situations changes.
• Results of those behavior, attitude, practice or situation changes.
􀂄 Effective impact statements:
• Provide quantifiable evidence of change or difference the program made.
(Money is the gold standard. Audiences want to know the return of investment.)
• Give other evidence, such as testimonials or anecdotes.
• Realistically project potential benefit for work in progress.
• Provide only enough detail to be easily understood.
• Highlight public benefits, outcomes, payoffs.
􀂄 To consistently show real impact, you must program to produce it.
• Know what you want to measure and figure out how to measure it.
• Build around issues, not events.
• Follow up to find out if people made the changes they predicted they would.
• Report overall program outcomes, not individual events, activities.
Impact audiences
􀂄 Write impact statements for:
• State and federal decision makers (reporting needs).
• Local decision makers, supporters, general public.
• Taxpayers, stakeholders, commodity groups.
• Current and potential funders or partners.
Impact tips and tricks
􀂄 Write a strong “why” or issue/problem statement:
• Do a Google search to quantify the problem.
• Use reliable sources – Centers for Disease Control, EPA, USDA, etc.
• Highlight “why” details in grant proposals.
􀂄 For difficult impacts – basic research, emerging issues, 4-H, FCS, academics – try:
• Testimonials
• Anecdotes
• If x then y statements – potential impacts

Evaluasi Sertifikasi Guru di Indonesia

BAB I

PENDAHULUAN

A. Latar Belakang Evaluasi Kebijakan

1. Pemerintah telah membuat kebijakan sertifikasi guru dan menyelenggarakannya sejak tahun 2007 sampai sekarang dan akan berlanjut ke depan untuk perbaikan mutu pendidikan.

2. Kebijakan ini didasarkan pada beberapa permasalahan yang ada dalam dunia pendidikan khususnya yang berkaitan dengan guru. Beberapa masalah tersebut antara lain sebagai berikut.

a. Berdasarkan data Balitbang Depdiknas, jumlah guru yang dinilai layak mengajar masih di bawah 70 %, dan mendapatkan skor yang sangat rendah untuk tes mata pelajaran yang diampu.

b. Berdasarkan catatan Human Development Index (HDI), terdapat 60% guru SD, 40% SMP, 43% SMA, 34% SMK dianggap belum layak untuk mengajar di jenjang masing-masing. Selain itu, 17,2% guru atau setara dengan 69.477 guru mengajar bukan pada bidang studinya. Dengan demikian, kualitas SDM guru kita adalah urutan 109 dari 179 negara di dunia. Untuk itu, perlu dibangun landasan kuat untuk meningkatkan kualitas guru dengan standardisasi rata-rata bukan standardisasi minimal.

c. Berdasarkan ujian kompetensi yang dilakukan terhadap tenaga kependidikan tahun 2004 lalu, secara nasional, penguasaan materi pelajaran oleh guru ternyata tidak mencapai 50 persen dari seluruh materi keilmuan yang harus menjadi kompetensi guru.

d. Skor mentah yang diperoleh oleh guru untuk semua jenis pelajaran juga memprihatinkan. Guru PKn, sejarah, bahasa Indonesia, bahasa Inggris, matematika, fisika, biologi, kimia, ekonomi, sosiologi, geografi, dan pendidikan seni hanya mendapatkan skor sekitar 20-an dengan rentang antara 13 hingga 23 dari 40 soal.

3. Kebijakan sertifikasi guru telah berjalan selama tiga tahun (2007- 2010). Diharapkan kebijakan ini dapat membawa perubahan menuju ke arah yang diinginkan yakni perbaikan mutu guru yang akan berdampak pula pada perbaikan mutu pendidikan.

4. Evaluasi terhadap kebijakan tersebut dipandang penting untuk mengetahui sejauh mana hasil dan dampak kebijakan sertifikasi guru terhadap konteks yang terkait dengan kebijakan tersebut.

5. Oleh karena itulah, evaluasi kebijakan sertifikasi guru ini dilaksanakan.

B. Rumusan Masalah

Rumusan masalah dalam evaluasi kebijakan sertifikasi guru adalah sebagai berikut.

1. Bagaimana hasil implementasi kebijakan sertifikasi guru bagi peningkatan kompetensi guru?

2. Bagaimana dampak kebijakan sertifikasi guru dalam konteks peningkatan mutu pendidikan di Indonesia?

C. Tujuan Evaluasi Kebijakan

Tujuan evaluasi kebijakan sertifikasi guru adalah sebagai berikut.

1. Mengetahui hasil/output implementasi kebijakan sertifikasi guru bagi peningkatan kompetensi guru.

2. Mengetahui dampak kebijakan sertifikasi guru dalam konteks peningkatan mutu pendidikan di Indonesia.

D. Manfaat Evaluasi Kebijakan

1. Secara teoritis, untuk kajian lebih dalam mengenai evaluasi kebijakan dalam dunia pendidikan khususnya mengenai sertiifikasi guru.

2. Secara praktis, untuk dasar dan bahan perbaikan kebijakan serta bahan rumusan kebijakan yang terkait.

BAB II

LANDASAN TEORI

A. Dasar Hukum

1. UU No. 20 Tahun 2003 tentang Sistem Pendidikan Nasional

2. UU No. 14 Tahun 2005 tentang Guru dan Dosen ( Depdikbud, 2005) yang mewajibkan guru memiliki kualifikasi akademik, kompetensi, sertifikat pendidik, sehat jasmani dan rohani, serta memiliki kemampuan untuk mewujudkan tujuan pendidikan nasional.

3. Peraturan Menteri Pendidikan RI Nomor 18 tahun 2007 tentang Sertifikasi Guru dalam Jabatan.

4. Peraturan Menteri Pendidikan RI Nomor 40 tahun 2007 tentang Sertifikasi Guru dalam Jabatan Melalui Jalur Pendidikan.

5. Peraturan Pemerintah Nomor 19 tahun 2005 tentang Standar Nasional Pendidikan

6. Fatwa/Pendapat hukum Menteri Hukum dan Hak Asasi Manusia Nomor I.UM.01.02-253.

B. Tujuan Sertifikasi

A. Tujuan Sertifikasi Guru

Menurut DIKTI (2006) tujuan diadakannya sertifikasi guru, yaitu: (1) menentukan kelayakan seseorang dalam melaksanakan tugas sebagai agen pembelajaran; (2) peningkatan mutu proses dan hasil pendidikan; dan (3) peningkatan profesionalisme guru.

B. Tujuan Sertifikasi Guru melalui jalur Pendidikan (PPG)

Menurut Pedoman Pelaksanaan Sertifikasi melalui jalur Pendidikan menerangkan tentang tujuan Pendidikan Profesi Guru memalui Jalur Pendidikan :

Mengacu pada pasal 3 Undang-Undang Nomor 20 Tahun 2003, tujuan umum program PPG adalah menghasilkan calon guru yang memiliki kemampuan mewujudkan tujuan pendidikan nasional, yaitu mengembangkan potensi peserta didik agar menjadi manusia yang beriman dan bertakwa kepada Tuhan Yang Maha Esa, berakhlak mulia, sehat, berilmu, cakap, kreatif, mandiri, dan menjadi warga negara yang demokratis serta bertanggung jawab.

Tujuan khusus program PPG seperti yang tercantum dalam pasal 2 Permendiknas Nomor 8 Tahun 2009 adalah untuk menghasilkan calon guru yang memiliki kompetensi dalam merencanakan, melaksanakan, dan menilai pembelajaran; menindaklanjuti hasil penilaian, melakukan pembimbingan, dan pelatihan peserta didik serta melakukan penelitian, dan mampu mengembangkan profesionalitas secara berkelanjutan.

C. Tujuan Pendidikan dan Latihan Profesi Guru

Menurut Pedoman dan Rambu-Rambu Pelaksanaan PLPG Sertifikasi Guru dalam Jabatan, Pendidikan dan Latihan Profesi Guru (PLPG) memiliki tujuan sebagai berikut.

1. Untuk meningkatkan kompetensi dan profesionalitas guru peserta sertifikasi yang belum mencapai batas minimal skor kelulusan melalui penilaian portofolio.

2. Untuk menentukan kelulusan peserta sertifikasi guru melalui uji kompetensi di akhir PLPG.

Sedangkan tujuan sertifikasi guru menurut UU, Permen, PP dan Fatwa Menteri Hukum dan HAM dijabarkan sebagai berikut :

1. Menentukan kelayakan seseorang dalam melaksanakan tugas sebagai agen pembelajaran.

2. Peningkatan mutu proses dan hasil pendidikan.

3. Peningkatan profesionalisme guru

4. Meningkatkan mutu dan kualifikasi guru sebagai tenaga terdidik

5. Meningkatkan Kesejahteraan guru secara Nasional

6. Meningkatkan kompetensi guru

7. Meningkatkan kinerja atau performa guru di Indonesia

D. Kompetensi Guru

Menurut PP RI No. 19/2005 tentang Standar Nasional Pendidikan Pasal 28, pendidik (guru) adalah agen pembelajaran yang harus memiliki empat jenis kompetensi, yakni kompetensi pedagogik, kepribadian, profesional, dan sosial. Dalam konteks itu, maka kompetensi guru dapat diartikan sebagai kebulatan pengetahuan, keterampilan dan sikap yang diwujudkan dalam bentuk perangkat tindakan cerdas dan penuh tanggung jawab yang dimiliki seseorang calon guru untuk memangku jabatan guru sebagai profesi.

1. Kompetensi Kepribadian

Kompetensi kepribadian merupakan kemampuan personal yang mencerminkan kepribadian yang mantap, stabil, dewasa, arif, dan berwibawa, menjadi teladan bagi peserta didik, dan berakhlak mulia. Secara rinci setiap elemen kepribadian tersebut dapat dijabarkan menjadi subkompetensi dan indikator esensial sebagai berikut.

a. Memiliki kepribadian yang mantap dan stabil. Subkompetensi ini memiliki indikator esensial: bertindak sesuai dengan norma hukum; bertindak sesuai dengan norma sosial; bangga sebagai pendidik; dan memeiliki konsistensi dalam bertindak sesuai dengan norma.

b. Memiliki kepribadian yang dewasa. Subkompetensi ini memiliki indikator esensial: menampilkan kemandirian dalam bertindak sebagai pendi-dik dan memiliki etos kerja sebagai pendidik.

c. Memiliki kepribadian yang arif. Subkompetensi ini memiliki indikator esensial: menampilkan tindakan yang didasarkan pada kemanfaatan peserta didik, sekolah, dan masyarakat dan menunjukkan keterbukaan dalam berpikir dan bertindak.

d. Memiliki kepribadian yang berwibawa. Subkompetensi ini memiliki indikator esensial: memiliki perilaku yang berpengaruh positif terhadap peserta didik dan memiliki perilaku yang disegani.

e. Memiliki akhlak mulia dan dapat menjadi teladan. Subkompetensi ini memiliki indikator esensial: bertindak sesuai dengan norma religius (imtaq, jujur, ikhlas, suka menolong), dan memiliki perilaku yang diteladani peserta didik.

2. Kompetensi Pedagogik

Kompetensi pedagogik merupakan kemampuan yang berkenaan dengan pemahaman peserta didik dan pengelola pembelajaran yang mendidik dan dialogis. Secara substantif kompetensi ini mencakup kemampuan pemahaman terhadap peserta didik, perancangan dan pelaksanaan pembelajaran, evaluasi hasil belajar, dan pengembangan peserta didik untuk mengaktualisasikan berbagai potensi yang dimilikinya. Secara rinci masing-masing elemen kompetensi pedagogik tersebut dapat dijabarkan menjadi subkompetensi dan indikator esensial sebagai berikut.

a. Memahami peserta didik. Subkompetensi ini memiliki indikator esensial: memamahami peserta didik dengan memanfaatkan prinsip-prinsip perkembangan kognitif; memahami mpeserta didik dengan memanfaatkan prinsip-prinsip kepribadian; dan mengidentifikasi bekal-ajar awal peserta didik.

b. Merancang pembelajaran, termasuk memahami landasan pendidikan untuk kepentingan pembelajaran. Subkompe-tensi ini memiliki indikator esensial: menerapkan teori belajar dan pembelajaran; menetukan strategi pembel-ajaran berdasarkan karakteristik peserta didik, kompetensi yang ingin dicapai dan materi ajar; serta menyusun rancangan pembelajaran berdasarkan strategi yang dipilih.

c. Melaksanakan pembelajaran. Subkompetensi ini memiliki indikator esensial: menata latar (setting) pembelajaran; dan melaksanakan pembelajaran yang kondusif.

d. Merancang dan melaksanakan evaluasi pembelajaran. Subkompetensi ini memiliki indikator esensial: melaksana-kan evaluasi (assessment) proses dan hasil belajar secara berkesinambungan dengan berbagai metode; menganalisis hasil penilaian proses dan hasil belajar untuk menentukan tingkatketuntasan belajar (mastery learning); dan meman- faatkan hasil penilaian pembelajaran untuk perbaiakan kualitas program pembelajaran secara umum.

e. Mengembangkan peserta didik untuk mengaktualisasikan berbagai potensi yang dimilikinya. Subkompetensi ini memiliki indikator esensial: memfasilitasi peserta didik untuk pengembangan berbagai potensi akademik; dan memfasilitasi peserta didik untuk mengembangkan ber-bagai potensi nonakademik.

3. Kompetensi Profesional

Kompetensi professional merupakan kemampuan yang berkenaan dengan penguasaan materi pembelajaran bidang studi secara luas dan mendalam yang mencakup penguasaan substansi isi materi kurikulum matapelajaran di sekolah dan substansi keilmuan yang menaungi materi kurikulum tersebut, serta menambah wawasan keilmuan sebagai guru. Secara rinci masing-masing elemen kompetensi tersebut memiliki subkompetensi dan indikator esensial sebagai berikut.

a. Menguasai substansi keilmuan sosial dan ilmu lain yang terkait bidang studi. Subkompetensi ini memiliki indikator esensial: memahami materi ajar yang ada dalam kurikulum sekolah; memahami struktur, konsep dan metode keilmuan yang menaungi atau koheren dengan materi ajar; memahami hubungan konsep antar mata pelajaran terkait; dan menerapkan konsep-konsep keilmuan dalam kehidupan sehari.

b. Menguasai langkah-langkah penelitian dan kajian kritis untuk menambah wawasan dan memperdalam pengetahuan/materi bidang studi.

4. Kompetensi Sosial

Kompetensi sosial berkenaan dengan kemampuan pendidik sebagai bagian dari masyarakat untuk berkomunikasi dan bergaul secara efektif dengan peserta didik, sesama pendidik, tenaga kependidikan, orangtua/wali peserta didik, dan masyarakat sekitar. Kompetensi ini memiliki subkompetensi dengan indikator esensial sebagai berikut.

a. Mampu berkomunikasi dan bergaul secara efektif dengan peserta didik. Subkompetensi ini memiliki indikator esensial: berkomunikasi secara efektif dengan peserta didik.

b. Mampu berkomunikasi dan bergaul secara efektif dengan sesama guru dan tenaga kependidikan.

c. Mampu berkomunikasi dan bergaul secara efektif dengan orang tua/wali peserta didik dan masyarakat sekitar.

Empat kompetensi di atas pada dasarnya tidak terpisah secara ekplisit, tetapi menyatu menjadi suatu kompetensi guru. Hal lain yang perlu diperhatikan adalah kompetensi seseorang termasuk guru tidak tetap tetapi adakalanya mengembang tetapi adakalanya menurun. Untuk itu, guru harus selalu berusaha untuk meningkatkan kompetensinya.

C. Relevansi Kebijakan

Saat ini jumlah guru dalam jabatan ada sekitar 2306015 orang yang direncanakan akan disertifikasi secara bertahap selama sekitar 10 tahun (Depdiknas, 2008). Ini berarti, betapa berat beban dan banyaknya biaya yang harus dikeluarkan oleh Pemerintah Indonesia untuk meningkatkan kualitas pendidikan. Ironisnya, usaha Pemerintah itu akan sia-sia manakala kinerja guru yang telah disertifikasi (guru profesional) tidak menjadi lebih baik apabila dibandingkan dengan kinerja guru sebelum disertifikasi. Hal ini dapat terjadi bila setelah disertifikasi, kinerja guru menurun karena merasa tidak dinilai, dan tidak ada sanksi. Oleh karena itulah perlu disusun model evaluasi kinerja guru yang telah disertifikasi.

Kebijakan sertifikasi guru semestinya dibuat berdasarkan kebutuhan akan kebijakan tersebut untuk mencapai tujuan yang telah ditentukan. Relevansi kebijakan merupakan kajian yang penting untuk melihat apakah kebijakan tersebut memang sesuai dengan konteks yang melatarbelakanginya, capaian hasil yang tercipta sesuai dengan tujuan, dan dampak yang dimunculkan dapat memberikan sumbangsih baik untuk bidang-bidang terkait.

Relevansi kebijakan untuk pra kebijakan dilihat dari konteks yang melatarbelakangi dibuatnya kebijakan. Karena kebijakan yang akan dievaluasi adalah kebijakan sertifikasi guru, maka konteks yang dilihat dibatasi pada konteks yang terkait dengan guru, yakni kajian terhadap mutu guru sebelum sertifikasi.

Capaian hasil dari kebijakan akan dievaluasi dengan melihat keseluruhan hasil yang didapatkan dari kebijakan sertifikasi guru. Hasil ini meliputi beberapa hal, antara lain skor kompetensi dari beberapa jalur sertifikasi guru (portofolio, PLPG, PPG).

Dampak yang dihasilkan dari kebijakan sertifikasi guru dilihat dari dua hal yaitu dampak untuk bidang-bidang terkait serta dampak bagi elemen-elemen pendidikan. Bidang-bidang yang terkait meliputi bidang pendidikan, sosial, budaya, politik. Sementara itu, untuk elemen-elemen pendidikan akan dilihat dampaknya bagi guru sendiri, bagi siswa, bagi kepala sekolah, bagi masyarakat sekitar, dan masyarakat pada umumnya.

D. Kajian Penelitian yang Relevan

Penelitian Setya Raharjo, dkk tentang kinerja guru profesional (2008), menemukan bahwa (1) upaya atau aktivitas guru yang telah lulus sertifikasi dan telah menerima tunjangan profesi dalam rangka mengembangkan dirinya melalui mengikuti diklat, mengikuti forum ilmiah belum menunjukkan upaya yang cukup menggembirakan, meskipun ada sebagian guru yang dengan gigih mencari informasi diklat atau forum ilmiah yang mungkin diikuti. Hal ini ditunjukkan oleh sebagian besar guru masih belum aktif mengikuti diklat dan forum ilmiah baik yang dibiayai oleh sekolah atau pemerintah maupun dengan biaya sendiri, (2) upaya atau aktivitas guru pasca lulus sertifikasi untuk meningkatkan kemampuan akademik yang banyak dilakukan oleh sebagian besar guru adalah membimbing siswa mengikuti lomba atau olimpiade, sedangkan aktivitas yang lain masih perlu perhatian secara serius, antara lain penulisan karya tulis ilmiah dan kursus Bahasa Inggris, dan (3) upaya atau aktivitas guru untuk mengembangkan profesi yang banyak ditekuni oleh sebagian guru adalah membuat modul dan membuat media pembelajaran, sedangkan yang berkenaan dengan penulisan artikel, penelitian, membuat karya seni/teknologi, menulis soal UNAS, serta mereview buku baru dilakukan oleh sebagian kecil guru.

E. Kerangka Pikir

Program Sertifikasi guru yang telah dilaksanakan sejak tahun 2007 dapat dievaluasi dengan mengumpulkan informasi kesenjangan. Evaluasi dampak ditujukan untuk mencari informasi tentang hasil keluaran (output) yaitu kinerja (performance) guru dan evaluasi dampak untuk mencari informasi yaitu : (1) Dampak langsung; pada lingkungan sekolah yaitu mutu pendidikan di sekolah (2) Dampak tidak langsung; pada kesejahteraan guru. Informasi kesenjangan tersebut didapatkan dari pembandingan standar program dengan kinerja/performance dari guru tersebut. Dari informasi yang dicari akan difokuskan pada ketercapaian standar mutu guru dalam kebijakan sertifikasi guru yaitu dengan melihat komponen-komponen yang dievaluasi yaitu: kinerja guru pasca sertifikasi, aktivitas mengajar dikelas, sikap dan minat guru dalam mengembangkan mutu pendidikan, pengembangan potensi pendidik dan tenaga kependidikan.

Setelah informasi didapatkan maka kemudian evaluator membandingkan antara implementasi kinerja guru dengan standar kompetensi guru yang telah ditetapkan. Untuk lebih jelas digambarkan pada bagan dibawah ini :

Gambar 1. Bagan Kerangka Pikir Evaluasi Outcome Program Sertifikasi Guru.

F. Pertanyaan Penelitian

1. Bagaimana hasil implementasi kebijakan sertifikasi guru bagi peningkatan kompetensi guru?

a. Bagaimana implementasi kebijakan sertifikasi guru memberikan kepuasan hasil yang sesuai dengan tujuan kebijakan?

b. Apa saja pencapaian kebijakan sertifikasi guru untuk meningkatkan kualitas pendidikan?

c. Siapa saja yang mendapatkan manfaat dari kebijakan sertifikasi guru?

2. Bagaimana dampak kebijakan sertifikasi guru dalam konteks peningkatan mutu pendidikan di Indonesia?

a. Bagaimana manfaat jangka pendek, menengah dan jangka panjang kebijakan sertifikasi pada konteks peningkatan mutu pendidikan?

b. Bagaimana pembiayaan dalam kebijakan sertifikasi guru?

c. Apakah biaya dan manfaat atau capaian kebijakan sudah sesuai?

d. Bagaimana persepsi para pendidik, orang yang terlibat dan masyarakat terhadap hasil dari kebijakan sertifikasi guru?

BAB III

METODE PENELITIAN

A. Model Evaluasi

Model evaluasi yang akan diterapkan pada kebijakan sertifikasi guru adalah dengan menggunakan model evaluasi discrepancy atau kesenjangan oleh Malcolm. Menurut Stufflebeam (2002:98) dalam mengukur dan mengidentifikasi outcome atau dampak dari suatu program terdapat enam hal yang perlu diperhatikan yaitu:

1. Istilah yang diterapkan untuk kegiatan yang dirancang terutama untuk mengukur efek atau hasil dari program-program, daripada masukan atau proses.

2. Sejak lebih dari pengukuran diperlukan jika suatu kegiatan harus dianggap sebagai evaluatif, keputusan ke mana produk terletak sehubungan dengan standar yang dibuat.

3. Kisaran hasil yang telah digunakan dalam evaluasi hasil cukup besar.

4. Efek atau hasil yang adalah fokus dari hasil evaluasi dapat diamati pada titik-titik yang bervariasi dalam selama program berjalan sampai selesai, atau yang lebih baru pada waktunya untuk menilai efek jangka panjang berfokus pada hasil penyelesaian program.

5. Tidak biasa dalam evaluasi hasil untuk mencari untuk menggambarkan atau menentukan apa yang sebenarnya terjadi dalam sebuah program, meskipun jenis informasi yang diperoleh akan jelas, secara umum setidaknya, dipilih untuk mencerminkan kegiatan program.

Model evaluasi kesenjangan adalah membandingkan evaluasi perencanaan dan evaluasi implementasi di lapangan dengan cara membandingkan standar (S) yang telah ditetapkan dengan kinerja/performance (P) yang menghasilkan suatu informasi/discrepancy information (D). Dengan model evaluasi kesenjangan maka dapat diketahui dampak dari program sertifikasi guru dengan menganalisis informasi secara eksplisit melalui komponen dan variabel kinerja yang diamati. Dapat digambarkan pada bagan dibawah ini:

Bagan 2. Model Evaluasi Discrepancy Malcolm

B. Indikator Keberhasilan

Evaluasi dengan menggunakan model Discrepancy ini adalah membandingkan target yang dengan capaian yang dihasilkan, maka dapat dijelaskan pada uraian dibawah :

1. Target yang ingin dicapai:

a. Seluruh guru pasca sertifikasi yang menerima tunjangan profesi dalam rangka mengembangkan dirinya melalui mengikuti diklat dan forum ilmiah baik yang dibiayai oleh sekolah atau pemerintah maupun dengan biaya sendiri.

b. Upaya atau aktivitas guru pasca lulus sertifikasi untuk meningkatkan kemampuan akademik yang banyak dilakukan oleh sebagian besar guru adalah membimbing siswa mengikuti lomba atau olimpiade, sedangkan aktivitas yang lain masih perlu perhatian secara serius, antara lain penulisan karya tulis ilmiah dan kursus Bahasa Inggris.

c. Upaya atau aktivitas guru untuk mengembangkan profesi yang banyak ditekuni oleh sebagian guru adalah membuat modul dan membuat media pembelajaran, sedangkan yang berkenaan dengan penulisan artikel, penelitian, membuat karya seni/teknologi, menulis soal UNAS, serta mereview buku baru.

2. Kenyataan sebenarnya:

a. Sebagian besar guru yang telah menerima tunjangan sertifikasi guru telah aktif dalam mengikuti diklat dan forum-forum ilmiah dalam mendukung upaya pengembangan diri guru.

b. Masih sangat sedikit guru yang melakukan upaya mengembangkan potensi diri dengan menyusun karya ilmiah dan kursus bahasa Inggris.

c. Masih sedikit guru yang memiliki kesadaran untuk membuat modul dan membuat media pembelajaran, sedangkan yang berkenaan dengan penulisan artikel, penelitian, membuat karya seni/teknologi, menulis soal UNAS, serta mereview buku baru.

C. Populasi dan Sampel

Populasi dalam penelitian evaluasi kebijakan sertifikasi guru ini adalah semua guru di Indonesia yang mengikuti sertifikasi

Sampel dalam penelitian ini yaitu guru-guru yang telah tersertifikasi di tiap propinsi yang dipilih secara random. Sampel dari masing-masing propinsi ini dipilih berdasarkan presentase guru yang sudah tersertifikasi yakni 10%.

D. Teknik Pengumpulan Data

Data-data dalam penelitian ini diperoleh melalui beberapa cara, antara lain

Jenis data		Sumber informasi	Metode
Performa guru		Guru subjek/sampel	Observasi PBM, wawancara, angket
		Kepala sekolah	Wawancara dan angket
		Guru sejawat	Wawancara dan angket
		Komite sekolah	Angket
		Siswa	Angket
Baseline data pra setifikasi		Portofolio guru ybs	Dokumentasi
Baseline data pra setifikasi		Dokumen administrasi lainnya	Dokumentasi
Lingkungan sekolah		Civitas akademika	Observasi, wawancara
Dampak secara umum	Guru, siswa, Kepala Dinas, Komite		Angket, wawancara

E. Instrumen Pengumpulan Data

1. Untuk data observasi, instrumen yang digunakan adalah pedoman observasi, lembar observasi, lembar checklist.

2. Untuk data wawancara, instrumen yang digunakan adalah pedoman wawancara

3. Untuk data angket, instrumen yang digunakan adalah kisi-kisi angket.

4. Untuk dokumentasi, instrumen yang digunakan adalah dokumen-dokumen terkait.

F. Teknik Analisis Data

Teknik analisis data dalam penelitian ini menggunakan teknik analisis deskriptif kuantitatif dan kualitatif. Analisis deskriptif kuantitatif digunakan untuk menganalisis data dari hasil angket, dan observasi. Sementara itu, teknik analisis deskriptif kualitatif digunakan untuk menganalisis data hasil wawancara, observasi, dan dokumentasi.

DAFTAR PUSTAKA

Ditjen Dikti. (2008). Teacher in Certification in Indonesia : A strategi for teacher quality improvement. Jakarta : Depdiknas.

Peraturan Pemerintah RI Nomor 19 tahun 2005 tentang Standar Nasional Pendidikan.

Peraturan Pemerintah RI Nomor 74 Tahun 2008 tentang Guru.

Pedoman Sertifikasi bagi Guru dalam Jabatan untuk LPTK, Dinas Pendidikan Provinsi, Dinas Pendidikan Kabupaten Kota. Dirjen Pendidikan Tinggi

Undang-Undang RI nomor 20 tahun 2003 Tentang Sistem Pendidikan Nasional

Undang-Undang RI nomor 14 tahun 2005 Tentang Guru dan Dosen.

Stufflebeam, D.L, Madaus G.F & Kellaghan T. (2002). Evaluastion Models. New York, Boston, Dordrecht, London, Moscow : Kluwer Academic publisher.