Preprint.

Similar documents
PROJECT MANAGEMENT AND COMMUNICATION SKILLS DEVELOPMENT STUDENTS PERCEPTION ON THEIR LEARNING

Programme Specification. MSc in International Real Estate

Student Course Evaluation Class Size, Class Level, Discipline and Gender Bias

PROJECT DESCRIPTION SLAM

A cautionary note is research still caught up in an implementer approach to the teacher?

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Exploring the Development of Students Generic Skills Development in Higher Education Using A Web-based Learning Environment

PUBLIC CASE REPORT Use of the GeoGebra software at upper secondary school

Institutional repository policies: best practices for encouraging self-archiving

Referencing the Danish Qualifications Framework for Lifelong Learning to the European Qualifications Framework

Audit Documentation. This redrafted SSA 230 supersedes the SSA of the same title in April 2008.

ACCOMMODATIONS MANUAL. How to Select, Administer, and Evaluate Use of Accommodations for Instruction and Assessment of Students with Disabilities

1 Use complex features of a word processing application to a given brief. 2 Create a complex document. 3 Collaborate on a complex document.

National Survey of Student Engagement

EDUCATIONAL ATTAINMENT

BASIC EDUCATION IN GHANA IN THE POST-REFORM PERIOD

b) Allegation means information in any form forwarded to a Dean relating to possible Misconduct in Scholarly Activity.

AQUA: An Ontology-Driven Question Answering System

ACS THE COMMON CORE, TESTING STANDARDS AND DATA COLLECTION

Rottenberg, Annette. Elements of Argument: A Text and Reader, 7 th edition Boston: Bedford/St. Martin s, pages.

TOEIC Bridge Test Secure Program guidelines

HDR Presentation of Thesis Procedures pro-030 Version: 2.01

PSYC 620, Section 001: Traineeship in School Psychology Fall 2016

Baker College Waiver Form Office Copy Secondary Teacher Preparation Mathematics / Social Studies Double Major Bachelor of Science

UPPER SECONDARY CURRICULUM OPTIONS AND LABOR MARKET PERFORMANCE: EVIDENCE FROM A GRADUATES SURVEY IN GREECE

Success Factors for Creativity Workshops in RE

EDUCATIONAL ATTAINMENT

Conceptual Framework: Presentation

Postprint.

A Guide to Supporting Safe and Inclusive Campus Climates

Final. Developing Minority Biomedical Research Talent in Psychology: The APA/NIGMS Project

Library Consortia: Advantages and Disadvantages

Educational system gaps in Romania. Roberta Mihaela Stanef *, Alina Magdalena Manole

Course Syllabus MFG Modern Manufacturing Techniques I Spring 2017

Purpose of internal assessment. Guidance and authenticity. Internal assessment. Assessment

Motivation to e-learn within organizational settings: What is it and how could it be measured?

PROMOTION and TENURE GUIDELINES. DEPARTMENT OF ECONOMICS Gordon Ford College of Business Western Kentucky University

Program Change Proposal:

Mathematics subject curriculum

General syllabus for third-cycle courses and study programmes in

Fieldwork Practice Manual- AHSC 435

Iowa School District Profiles. Le Mars

University of Michigan - Flint POLICY ON STAFF CONFLICTS OF INTEREST AND CONFLICTS OF COMMITMENT

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

HISTORY COURSE WORK GUIDE 1. LECTURES, TUTORIALS AND ASSESSMENT 2. GRADES/MARKS SCHEDULE

General rules and guidelines for the PhD programme at the University of Copenhagen Adopted 3 November 2014

Computer Emergency Response Team (CERT)

School Inspection in Hesse/Germany

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Programme Specification. BSc (Hons) RURAL LAND MANAGEMENT

GRADUATE APPLICATION GRADUATE SCHOOL. Empowering Leaders for the Fivefold Ministry. Fall Trimester September 2, 2014-November 14, 2014

University of Cambridge: Programme Specifications POSTGRADUATE ADVANCED CERTIFICATE IN EDUCATIONAL STUDIES. June 2012

Metadiscourse in Knowledge Building: A question about written or verbal metadiscourse

Massachusetts Department of Elementary and Secondary Education. Title I Comparability

Thesis and Dissertation Submission Instructions

VI-1.12 Librarian Policy on Promotion and Permanent Status

10.2. Behavior models

San Francisco County Weekly Wages

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

CONFERENCE PAPER NCVER. What has been happening to vocational education and training diplomas and advanced diplomas? TOM KARMEL

Tools to SUPPORT IMPLEMENTATION OF a monitoring system for regularly scheduled series

LODI UNIFIED SCHOOL DISTRICT. Eliminate Rule Instruction

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Supervision & Training

MBA 5652, Research Methods Course Syllabus. Course Description. Course Material(s) Course Learning Outcomes. Credits.

Advancing the Discipline of Leadership Studies. What is an Academic Discipline?

Internship Department. Sigma + Internship. Supervisor Internship Guide

Information Event Master Thesis

DICE - Final Report. Project Information Project Acronym DICE Project Title

Evaluation Report Output 01: Best practices analysis and exhibition

Oklahoma State University Policy and Procedures

BEFORE THE ARBITRATOR. In the matter of the arbitration of a dispute between ADMINISTRATORS' AND SUPERVISORS' COUNCIL. And

BSM 2801, Sport Marketing Course Syllabus. Course Description. Course Textbook. Course Learning Outcomes. Credits.

WP 2: Project Quality Assurance. Quality Manual

University of Michigan - Flint POLICY ON FACULTY CONFLICTS OF INTEREST AND CONFLICTS OF COMMITMENT

Empirical research on implementation of full English teaching mode in the professional courses of the engineering doctoral students

Upward Bound Program

MPA Internship Handbook AY

User education in libraries

Executive summary (in English)

Capitalism and Higher Education: A Failed Relationship

2 di 7 29/06/

Analysis: Evaluation: Knowledge: Comprehension: Synthesis: Application:

prehending general textbooks, but are unable to compensate these problems on the micro level in comprehending mathematical texts.

BSW Student Performance Review Process

A European inventory on validation of non-formal and informal learning

CLINICAL TRAINING AGREEMENT

TU-E2090 Research Assignment in Operations Management and Services

Testimony to the U.S. Senate Committee on Health, Education, Labor and Pensions. John White, Louisiana State Superintendent of Education

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Summary results (year 1-3)

Presentation of the English Montreal School Board To Mme Michelle Courchesne, Ministre de l Éducation, du Loisir et du Sport on

Ministry of Education General Administration for Private Education ELT Supervision

PROPOSED MERGER - RESPONSE TO PUBLIC CONSULTATION

Reference to Tenure track faculty in this document includes tenured faculty, unless otherwise noted.

University of Arkansas at Little Rock Graduate Social Work Program Course Outline Spring 2014

ANALYSIS: LABOUR MARKET SUCCESS OF VOCATIONAL AND HIGHER EDUCATION GRADUATES

IEP AMENDMENTS AND IEP CHANGES

Anglia Ruskin University Assessment Offences

PCG Special Education Brief

Transcription:

http://www.diva-portal.org Preprint This is the submitted version of a paper presented at Privacy in Statistical Databases'2006 (PSD'2006), Rome, Italy, 13-15 December, 2006. Citation for the original published paper: Carlson, M., Jansson, I., Lindkvist, H. (2006) Bridging the Gap Between Theory and Practice: Experiences from Statistics Sweden in Applying SDC Methodology. In: N.B. When citing this work, cite the original published paper. Permanent link to this version: http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-96579

Bridging the Gap Between Theory and Practice: Experiences from Statistics Sweden in Applying SDC Methodology Michael Carlson 1, Ingegerd Jansson 1, Helen Lindkvist 1 1 Statistics Sweden, Sweden {michael.carlson, ingegerd.jansson, helen.lindkvist}@scb.se Abstract Statistics Sweden has identified a need to increase the level of knowledge in the SDC field among methodologists and also to develop a unified view and strategy on application of SDC methodology. A project was thus initiated with the goal to spread knowledge but also to collate current practices and experiences within the agency. The project resulted in a course on SDC given at Statistics Sweden and this paper provides a brief first report on the work so far, reviewing the background and describing the course and possible future activities. Furthermore, several practical examples were discussed during the course, two of which are briefly described here; the Industrial production index and Structure of wages and earnings in the private sector. Keywords: Application, Education, Official statistics, Statistical disclosure control, Tables. 1 Introduction Developing Statistical Disclosure Control (SDC) methodology and implementing it to a statistics production environment at a national statistical agency is in deed an important but also challenging task. At Statistics Sweden a major project is presently in progress with the aim to unify and standardize most areas of the various production processes within the agency. SDC issues are naturally included within this work. However, there is a need to enhance and disseminate knowledge about SDC methodology and application among the agency s methodologists, production statisticians and other experts. Furthermore there is a need to collect and document methods already being used and to describe situations and problems that arise, both general and product specific, in order to obtain a foundation for future development in this direction. This paper gives a brief and first report on the efforts of a project aimed at bridging the gap between theory and practice when it comes to SDC in tables. However, since the work is still ongoing a complete evaluation is not given.

In sections 2 and 3 we briefly describe the background at Statistics Sweden concerning SDC. In section 4 we describe the outline of a course that was given at Statistics Sweden and in section 5 two practical examples that have come up to discussion during the course are presented. Finally, in section 6, a short account of forthcoming activities within Statistics Sweden concerning SDC is given. 2 Background In Sweden confidentiality issues with regard to official statistics are regulated in several legal provisions. The Secrecy Act gives limitations within the statistical field to release data concerning personal or economical conditions that can be referred to an individual (person or business). However, identifiable data may be released to such agencies that are embraced by the same legal regulations as Statistics Sweden since these agencies are responsible for maintaining the confidentiality according to the Secrecy Act. For example, microdata may be released to a research facility at a state university under certain conditions. In principle, Statistics Sweden release micro data to such agencies only. Therefore SDC issues concerning tables are of most relevance to Statistics Sweden. Statistics Sweden has a rather decentralized organization. Decisions concerning methodological issues are taken locally and are not necessarily coordinated between surveys or organizational units. Practical questions about different parts in the production process are handled by people close to production and clients. This is apparent in particular when it comes to issues concerning SDC. Thus when results are disseminated, it might occur, that e.g. tables that are similar in design and content with respect to type of variables, sensitivity of variables, cell contents are treated differently when it comes to SDC, depending mainly on the responsible department. There is also a central methodology unit where issues of more comprehensive character are handled. Thus it falls on the central methodology unit to make an effort to bridge the gap between theory and practice in SDC, and to make the application of SDC techniques as homogenous as possible within Statistics Sweden. Furthermore, Statistics Sweden is the coordinator of official statistics in Sweden. As such, Statistics Sweden is expected to support the other government authorities in their effort of producing official statistics. SDC is an issue

where several government authorities glance at Statistics Sweden, since it is expected of Statistics Sweden to provide guidelines and good examples. 3 Identifying a need for training In 2001 a handbook on SDC was released (SCB, 2001). It consists of an introduction to and a basic overview of SDC, in particular for tabular data. Also, this handbook consists of rough recommendations when dealing with SDC issues such as how to choose an appropriate safety rule and masking method. The handbook was a great contribution for those dealing with these issues. However, it has been found that the handbook has to be supplemented with more practical examples that are applicable to Statistics Sweden. A report on the need to enhance and develop competence among methodologists, both the young and the more experienced, at Statistics Sweden was presented some years ago. The results and suggestions in this report were partly based on a survey among methodologists where they were asked about their knowledge and experiences in a number of fields within statistics. One field that was highlighted as being neglected and in need of special attention was SDC. A great part of the methodologists working at Statistics Sweden have a vast theoretical knowledge and experience in e.g. survey sampling and editing. However, knowledge about and interest in SDC methodology has typically been limited. As in many other countries there is no course on SDC at the universities and it is often an entirely unknown field to most students in statistics leaving university. Thus we have seen a necessity of increased knowledge of SDC methods at Statistic Sweden, combined with a need for an extended handbook with practical applications and even more guidelines. In order to meet these needs, a project was initiated in 2006. The main task was to give a course on SDC methods for tables. 4 The course The purpose of the course was thus twofold; to spread and to increase knowledge of SDC at Statistics Sweden and to get a foundation for further documentation containing practical examples to supplement the current SDC handbook. In order to achieve both of these goals it was decided at an early

stage that the course to a large extent would build on the participants own work and experiences. Those participants already involved in a certain survey or product would be asked to bring their surveys to the course to be used as practical examples and case studies. The course was organized and presented by staff members from the Central methodology unit. Also, two legal experts at Statistics Sweden were invited to give presentations and to participate in the discussions. The textbook by Willenborg and de Waal (2001) was used as the main course literature, focusing on the chapters pertaining to tabular data. The t-argus manual was also made available to the participants as this would be the main software tool. The course was scheduled to cover four full working days with a couple of weeks between each occasion to give the participants time to work on assignments applied to their own examples. The four occasions having four sessions each, comprised the following topics: 1. Legal matters and risk measures, 2. SDC methods and demonstration of τ-argus, 3. Loss of information and more τ-argus and finally 4. Review and extra focus on the participants examples. Except for the first occasion, a couple of sessions on each occasion focused on the participants examples where the participants gave presentations followed by a discussion on the topic that was covered the occasion before. The staff members/teachers were available to the participants throughout for guidance and to discuss their cases with. In order to get a good representation across Statistics Sweden, all departments were asked to send two participants each. The participants could be either a methodologist or a production statistician or other expert working with production, preferably one of each from every department. The intention was to get a wide range of examples of statistical surveys from different fields, to get a good mix of people with various backgrounds and for the departments to get at least two would-be SDC experts within their own department in return. The draft resulted in 15 participants from all but one department. Examples of surveys that the participants presented and worked on are Production of commodities and industrial services, Industrial production index, Population statistics, Structure of wages and earnings in the private sector, Producer and import price index and Educational attainment of the population. The examples covered a wide range of fields and types of surveys since both statistics concerning businesses and individuals and both sample surveys and statistics from registers were represented. The course was successful in several aspects. First, to gather people from different fields of statistics resulted in very fruitful discussions. Several participants appreciated the fact that they were not alone to have encountered spe-

cific SDC issues in their daily work and that they could easily relate other participants problems to their own situation. Second, a pool of practical examples has been collected through the surveys brought to the course. The examples cover a variety of surveys and will hopefully work very well for the next part of the project, the extended documentation on SDC. Furthermore, τ-argus was only recently designated as the recommended software tool for handling SDC problems at Statistics Sweden. Various manual procedures have often been used in the past and are still in use for assigning secondary suppressions. This course has been the first occasion at Statistics Sweden where τ-argus has been introduced on a larger scale to people that will continue to deal with these issues in the future. Finally, the course has brought together a group of people that will hopefully continue to work together as an informal (or perhaps soon to be formal) network for SDC issues within Statistics Sweden. The course was however less successful in at least two aspects. First, the assignments were too general and not very specific. Initially it was anticipated that it would be difficult to formulate specific tasks due to the varying backgrounds of the participants; e.g. properties of statistics concerning businesses and individuals may differ quite substantially. However, the participants reported that they occasionally were unsure about what was expected from them. Secondly, the participants expectations were not entirely met. Although the technical aspects of SDC were appreciated, this is not always the major concern in their daily work. Many of the difficulties that the participants deal with tend to rise from policy issues such as deciding acceptable levels of risk, what variables should be considered as especially sensitive and so on. These issues were however only briefly considered. 5 Two examples During the course, when the participants cases were discussed in class and when trying to apply the proposed risk measures and SDC methods, a number of issues were brought up as being problematic to handle. In some cases the problems, to our knowledge, have no complete theoretical solutions yet, for example how to treat general tables with negative contributions, or how to judge the risk with linked or semi-linked tables. In other cases, there are theoretical solutions or at least suggestions, but they are difficult to apply in practice since concrete proposals appear to be lacking, for example how to deal with situations with non-response and imputed data.

We will only briefly describe two cases where methodological problems were encountered with regard to assessing disclosure risk. These examples seem to fall outside the usual realms that are treated in the SDC literature. 5.1 Industrial production index The Industrial production index is published monthly and gives the development of the production of the Swedish industry. The figures are used in the calculation of the Swedish GNP. The index is published divided on kind of activity by using standard classification. For smaller businesses (less than 500 employees) a sample of businesses is taken. If there is non-response, values are imputed. This gives in itself a certain protection against disclosure, but disclosure can still occur. From the figures that are published, it is not possible to recalculate the exact value of the production of a single business. However, it is in some cases possible to conclude that the development of the index must pertain to a certain business and thus calculate the development of the production of a single business. This problem is particularly severe for large businesses that dominate within an activity. Standard risk measures and SDC methods for magnitude tables can be applied on tables where a sum of a magnitude, i.e. a response variable, is given. But it is far from straight-forward how these measures and methods should be used in the present example. They can only be applied to tables where the sum of production value is given, i.e. tables that are the basis of the calculation of the index. This, in combination with the index being partly based on a sample survey, makes it difficult to decide if the published figures are enough protected, over-protected, or if there still might be a risk of disclosure. 5.2 Structural wages and salaries in the private sector Structural wages and salaries in the private sector is a yearly sample survey that aims at describing mean salaries and number of employees in businesses that act in the private sector. A one-stage cluster design is used where businesses are sampled and then asked about (all) their employees wages. Explanatory variables in the presented tables are for example region, industry, sex and type of employment. Thus, some of the explanatory variables are for the business level and others for the employee level. This means that one business can belong to more than one cell in the table. To protect the data from disclosure, it is necessary to apply safety rules on the business level as

well as on the employee level. If the business level is not taken into account we may end up with cells that are disclosive regarding the business in that cell. This may happen if there for example are only a few businesses that act within a certain cell. There are several problems that need to be solved here. First, how to take into consideration that one business can belong to more than one cell. Second, how to apply the safety rules on the business- as well as on the employee level and at the same time take into account the fact that it is a sample survey and not a complete enumeration. Proposed safety rules for magnitude tables, e.g. the dominance rule and the p %-rules and other closely related safety rules, are basically proposed for the situation when there is complete enumeration of the elements that the table is supposed to give information on. 6 Forthcoming activities The course was recently completed. However, supplementary work will follow. The statistical products that have been presented and discussed during the course will be documented. The conditions and the problems, together with descriptions on how risk measures, SDC methods and loss of information are handled with, will be included in the documentation. In the end this will be included in a documentation that will be available for producers of official statistics in Sweden. This documentation will be a useful base for further work on SDC and in particular for the preferred work on attaining a more effective production process. A large-scale project aiming at the use of effective methods and common tools for all parts of the statistical production process has recently been launched at Statistics Sweden. Since SDC is an important part of the production process, there will at Statistics Sweden be focus on how to handle SDC issues in standardized or at least similar ways for all statistical products. The aim of this work is to attain the following: 1. Those who handle SDC issues in a satisfying way will get support for their work, 2. Those who handle SDC issues in a less satisfying way will get support in finding better ways, 3. Those who more or less do not handle SDC issues at all will get support in finding satisfying ways, 4. Those parts of SDC methodology where there does not at the moment exist satisfying solutions are highlighted, and 5. The way of handling SDC issues at Statistics Sweden will be more uniformed. Our hope is to keep the group of participants who has followed the course as a reference group or expert panel to call for in further work on SDC. This group

of people, together with others who are knowledgeable in SDC at Statistics Sweden, will in one way or another be engaged in forthcoming projects on SDC. Other government authorities that are responsible for producing official statistics have signaled an interest in SDC issues and will perhaps be sending their own staff to such a course. Therefore, a modified version of the course might be given in future. References 1. SCB (2001). Statistisk röjandekontroll av tabeller, databaser och kartor. CBM. In Swedish. Statistics Sweden. 2. Willenborg, L., de Waal, T. (2001). Elements of Statistical Disclosure Control. Springer-Verlag, New York.