How to use the Split tool

The Split tool allows a respondent to a questionnaire to be split into a number of sub-respondents. Typically, a questionnaire can have two or more levels of data such as a household level and a personal level. All the information on the household level is common to all household members, but each household member has individual data associated with them. A typical questionnaire will then have a set of questions regarding the household (type of house, location, gross income, etc) and then several sets of questions for each household member (age, gender, individual income etc).

If we were interested in performing an analysis at the household member level, we might desire a questionnaire that appears to have been given individually to each household member. This new structure typically copies all the data from the top level (household level) and creates one set of questions for each household member. The original interview data is then split into a series of individual interviews (sub-respondents).

The Split tool, like the Clean and Define tool, is designed to allow experts to manipulate the underlying XML structure of your data sets directly using an XML editing window and tools.

XML Syntax Used

Job

The <job> element defines certain properties of the split job. It wraps the other elements described later.

<job type=’split’ splitkey=’N’ consitencycheck=’YES|NO’ postfilter=’FILTER’>

Attributes are as shown in the table:

Name

Description

type

Always ”split” for this use

splitkey = ”N”

Defines the number of sub-respondents to create. It also determines the count for the various questionnaire structures that the system should ”concatenate” into a single structure. For example, if splitkey=’4’ then there should be 4 questions about gender (one for each household member) that will be concatenated into just one question in the split dataset.

consitencycheck

If set to ”true”, a check should be performed after the split is run checking that all questions that ‘claim’ to be answered, really have an answer.

postfilter

Defines an optional filter that will remove records (sub-respondents) after the split has been executed. Typically an original respondent only answered some of the  lower level questions. For example, maximum of 10 houshold members is possible, but a given household only had 3 members. Seven sub-respondents can then be removed by specifying the postfilter accordingly.

Split

The split element defines a multiple structure of questions, sub questions, rows and columns that we want to split into a single structure. The number of elements in the dimension we split on much match the splitkey number given. Multiple split elements are allowed.

<split addr=’DIM\ADDRESS_EXPRESSION’ qno=’QNO’>
<text lang=’LANGUAGE’>TEXT</text>
</split>

DIM specify against which dimension the split should be done.

  • Q : the ADDRESS_EXPRESSION lists a series of questions all with the same structure, and with N number of questions total. For example: addr=’Q\1,,4’ or addr=’Q\1,4,7,10’. Sub-questions, rows and columns may be restricted or rearranged if desired. Such as in addr=’Q\1,,4.A.1,,8’ which will split questions 1,2,3,4 but only sub-question A and rows 1 through 8. The expression addr=’Q\1,3,2,4.C,A,B.5,1,2,3,4’ re-arranges the questions, sub-questions and rows.
  • S: the ADDRESS_EXPRESSION lists a single question with the same number of sub-questions as the splitkey. All sub-questions must of the same type. As an example, addr=’S\12.A,,D’ will create a new question with only one sub question. Re-arranging or restricting sub-questions, rows or columns is allowed as in addr=’S\12.A,C,B,D.1,,4,8,5,6,7 8[5,4,3,1,2 9]
  • R: the ADDRESS_EXPRESSION lists one or more questions, each having the same number of rows. Sub-questions of type single or multiple are not allowed. The number of rows specified must match the splitkey number. Re-arranging and restricting questions, sub questions, rows and columns is allowed.
  • H: the ADDRESS_EXPRESSION refers to a single sub-question of a single grid or multi-grid. The number of <h> elements (columns) must correspond to the splitkey number. The generated split question will be multiple choice.

ADDRESS_EXPRESSION lists the questions and/or sub-questions with the same structure that should be split. See description of DIM as to restrictions to the address.

QNO optionally specifies the new question number to be used.

TEXT specifies an optional override text to be used. If DIM is Q|S|H, then this text replaces the question text. If DIM is R this text replaces the row text.

Note: Multiple split elements are allowed.

Duplicate

The duplicate element is used to specify one or more question structures that we want to copy. The same respondent data will be copied onto every sub-respondent.

<duplicate addr=’\ADDRESS_EXPRESSION ’/>

ADDRESS_EXPRESSION specifies a list of any questions that you want to duplicate. Re-arrangement of questions, sub questions, rows and columns is allowed. To select questions with different structures, use multiple duplicate elements.

Detailsplit

The detail split element can be used to individually select question structures for each sub respondent. It must contain the same number of select elements as the splitkey number.

<detailsplit>
<template addr=’\ADDRESS_EXPRESSION’/>
<copy addr=’\ADDRESS_EXPRESSION’/>
<copy addr=’\ADDRESS_EXPRESSION’/>
……
</detailsplit>

  • The <template> address specifies the question(s) that will be used as the basis for creating the sub respondent data structure. Any list of questions and re-arranging of questions, sub-questions, rows and columns is allowed.
  • All <copy> address expressions will be compared against the template, and the structure must match. In the <copy> expression you may use “!” to signify missing components. For example, use <copy address=’\10.A,1,,4,!,5,,10’ /> if question 10 is missing row 5 compared to the template.
  • If the question structure varies, use multiple detailsplit elements.
  • The number of <copy> elements + the one <template> element must match the splitkey number.

Any number of <split>, <duplicate> and <detailsplit> elements in any order is allowed.

Index

You may insert a numeric question that contains two index of the sub-respondent.

  • Sequence Index is the ‘loop’ counter of the split, it will show from which iteration of the split the sub-respondent has been generated, values will be 1,2,3… up to the splitkey value.
  • Record Index will differ if a postfilter has been given. It will indicate which record in sequence that this sub-respondent represents. So if e.g. a splitkey of 3 was given, and for a given original respondent the second sub-respondent was removed by the post filter, the Sequence Index would be 1,3, where as the Record Index would be 1,2.

<index qno=’QNO’>
<stext><text lang=’LANG’>TEXT</text></stext>
<rectext><text lang=’LANG’>TEXT</text></rectext>
<seqtext><text lang=’LANG’>TEXT</text></seqtext>
</index>

QNO specify the questionnaire number to be used for the index question.

  • <stext> specify the text to be used as the question text.
  • <rectext> specify the text to be used for the record text.
  • <seqtext> specify the text to be used for the sequence text.

Frequency

You may insert a numeric question that contains the total number of records (sub-respondents) “split” from a given input respondent. Unless a postfilter has been specified, this will equal the splitkey value. If a postfilter has been specified, the number will be reduced to indicate the actual number of sub-respondents generated.

<frequency qno=’QNO’>
<stext><text lang=’LANG’>TEXT</text></stext>
<rtext><text lang=’LANG’>TEXT</text></rtext>
</frequency>

QNO specify the questionnaire number to be used for the frequency question.

<stext> specify the text to be used as the question text.

<rtext> specify the text to be used for the row text.

Inverse Frequency

You may insert a numeric question that contains the inverse number of ‘split’ records (sub-respondents) total from a given input respondent. Unless postfilter have been specified, this will equal the inverse of the splitkey value. If however postfilter has been specified, the number will be adjusted to indicate the inverse of the actual number of sub-respondents generated.

<invfrequency qno=’QNO’>
<stext><text lang=’LANG’>TEXT</text></stext>
<rtext><text lang=’LANG’>TEXT</text></rtext>
</invfrequency>

QNO specify the questionnaire number to be used for the inverse frequency question.

<stext> specify the text to be used as the question text.

<rtext> specify the text to be used for the row text.

Serial

You may insert a numeric question that contains the serial of the original respondents. The values will be generated sequentially starting with 1 unless the optional “start” attribute is specified. If the start attribute is specified, numbers will be generated starting with this value.

<serial qno=’QNO’ start=’N’>
<stext><text lang=’LANG’>TEXT</text></stext>
<rtext><text lang=’LANG’>TEXT</text></rtext>
</serial>

Split syntax – an example

<job type='split' splitkey='3' consitencycheck='YES' postfilter='\t1='>
<duplicate addr='\gender,age,region'/>
<split addr='Q\p1,,p3'/>
<detailsplit>
<template addr='\t1,t12,t13'/>
<copy addr='\x1,x12,x13'/>
<copy addr='\y1,y12,!' />
</detailsplit>
<duplicate addr='\house_income,date'/>
<index qno='index'>
<stext><text lang='en'>Index values</text></stext>
<seqtext><text lang='en'>Sub respondent sequence number</text></seqtext>
<rectext><text lang='en'>Sub respondent record number</text></rectext>
</index>
<frequency qno='freq'>
<stext><text lang='en'>Frequency</text></stext>
<rtext><text lang='en'>Total sub-respondents</text></rtext>
</frequency>
<invfrequency qno='infreq'>
<stext><text lang='en'>Inverse frequency</text></stext>
<rtext><text lang='en'>Inverse total sub-respondents</text></rtext>
</invfrequency>
<serial qno='serial'>
<stext><text lang='en'>Master serial</text></stext>
<rtext><text lang='en'>Serial</text></rtext>
</serial>
</job>