Merge Data Spss

This workshop will walk you through the steps in SPSS to help you merge datasets, create new variables, and recode variables in your current dataset. Many of these functions are easily performed in Excel, but I’d like to show you how to take advantage of some of the data manipulation options available to you in SPSS. Pick the method that is the most comfortable for you, when working with your data.

When you have two data files, you may want to combine them by stacking them one on top of the other (referred to as concatenating files). Below we have a file called dads and a file containing moms. How to combine two datasets in SPSS. Does anyone know of syntax I can use in SPSS to merge these two variables into one variable in SPSS so that the new variable, e.g. Question 1, would contain the responses from both males and females? Merging Data Files In SPSS You have two data files. Each case in the one file corresponds to one case in the other file. In both files each case has an identifier, and the identifiers match across cases. You want to match up records by identifiers. First, you must sort the records in each file by the identifier. This can be done by clicking Data. Add Cases Open at least one of the data files that you want to merge. If you have multiple datasets open, make one of the datasets. From the menus choose: Data Merge Files Add Cases. Select the dataset or external IBM SPSS Statistics data file to merge with the active dataset.

Let’s start by downloading a dataset to use for this workshop, a dummy dataset I created in Excel. Download the file and save it on your computer. There are 4 worksheets in this file. Weight measures were taken on 25 individuals in January, February, and March. For the month of April we have an additional individuals with weight measures taken in January and April. Our goal is to create one SPSS dataset that contains all 4 Excel worksheets.

First, let’s 0pen each worksheet in SPSS and save them using the dataset names of January, February, March, and April. Remember that although the Excel file can hold several worksheets with different data, statistical packages such as SPSS and SAS, cannot bring in the entire file at the same time, you will need to bring in each worksheet separately.

You should now have 4 SPSS datasets saved on your computer called: January.sav, February.sav, March.sav, and April.sav

Merging Datasets

There are 2 ways to merge datasets in SPSS: Add Cases or Add Variables. I really like how SPSS states these, since it makes it clear as to how you are adding the data from one file to the next. Add Cases – means that you will be adding more observations to the current dataset. If you think about this, this means that you would be adding more respondents, more animals, more plots, etc… Essentially adding more observations that are unique to what is in your current dataset.

Add Variables means you have the same set of observational units, so respondents, animals, plots, etc… and you have new measurements that you want to add to your current dataset.

Adding Variables

Let’s start with our datasets – adding variables. We have 2 files January, February, and March that have the same 25 individuals and weight measures taken in 3 months. We want all of these to be in the same SPSS dataset so we can do an analysis across these 3 months.

What’s the first thing we need to do before adding February data to January? If we were to try it as it is – but could happen?

HINT: look at the variable names for the 2 files

Data

Make the appropriate changes in the Variable View of SPSS.

Before we can merge any files, we need to ensure that all the datasets are sorted. It looks like they are currently sorted, but let’s double-check by getting SPSS to run a sort anyway. Sort all the datasets so we are all set for the next steps.

  • Data
  • Sort Cases
    • Put ID into the Sort by: Box
    • Sort Order – Ascending
  • OK

Let’s start with adding February data to January:

Make sure you are in the January file

  • Data
  • Merge Files
    • Add Variables
    • Since all of our datasets are open – you should see a list of the currently open files. Select February. If you closed all the datasets, the top box will be empty, then you will select External SPSS dataset and navigate to where you saved your file and open it
    • Continue
      • Your new dialogue box – you should see all of the variables available in both datasets. * is the active dataset (January), + is the DataSet 2 or new combined dataset.
      • Review what variables are where. New Active Dataset is the new merged dataset – think about what you want to see included in here. Does it match your expectations? Notice the list of variables in the Excluded box – do these make sense? Notice that these are the ID and TRMT – variables which were duplicated in the 2 datasets.
      • We can add a key variable – the variable on which your variables is sorted by. Let’s do this to make sure weights are matched to the appropriate ID.
      • Select Match cases on key variables
        • Select that the variables are sorted in both files – Select both files provide cases – Then select ID from the Excluded box and add it to the Key Variables box – notice how it disappears from the new Active Dataset box too
  • Ok to run – you will get a warning message – click ok

Take a look at the January file. Does it look correct? Remember you can double-check by looking at the February file. If you are happy with the way it looked, save it under a new name – maybe Jan_Feb

Repeat this process to add the March dataset as well and save it as a file called Jan_Mar.sav

Close the February and March files, leaving the new Jan_Mar and April datasets open.

Merge Data Spss

Adding Cases

So now we have a dataset with weight measures taken from January to March on the same individuals. Now we need to add the new observations that have weight measures taken in January and April – this is a great example of Adding Cases to a dataset.

With the Jan_Mar dataset open:

  • Data
  • Merge Files
    • Add Cases
    • As this first case above, because our April dataset is open, you will see it listed in the dialogue box
    • Select April and Continue
      • You should now see a box that lists variables that are Paired and Unpaired. Essentially think of these as the list of variables that are unique and present in both files
      • Since we want ALL the variables in our new dataset, select all the unpaired Variables and put into the Variables in the New Active Dataset – our new dataset
      • OK

Review the dataset, is it what you were expecting?

If it is, save it as Jan_Apr.sav. Close the April dataset

Creating New Variables

Adding a new variable

Cara Merge Data Di Spss

Let’s start by creating a new variable called Wtgain that is the difference between the weight measured in January and the weight measured in March.

  • Transform
  • Compute Variable
    • This will open a new dialogue box
    • In the Target Variable enter a new variable name: wtgain
    • In the Numeric Expression: enter the calculation – select the Weight_mar and place in this box – then add the “-” following by the Weight_jan variable. So it should read weight_mar – weight_jan
    • OK

SPSS will add the new variable at the end of the current dataset. Review and decide whether is completed the action you were expecting. Since the original weight measures had no decimal places, let’s remove the decimal places added to the new variable. Add a label to this new variable.

Save the dataset.

Recoding a variable

Sometimes we have a variable that we want to recode – so in our case we are going to create a new variable called wtclass_jan that will take the weights measured in January and put them into 3 weight classes: 1 = 13-16; 2 = 17-20; 3 = 21-24

SPSS has 2 functions that allow us to recode variables. One is called, Recode into Same Variables… and the second, Recode into Different Variables… In the interest of NOT writing over any data, I recommend that you use the Recode Into Different Variables… option. As you work more and more with your data in SPSS, there may come a time when you may want to use the Recode Into Same Variables.. option, but understand that you will lose any data that you overwrite.

We want to create the new variable called wtclass_jan as described above. To do this:

  • Transform
  • Recode Into Different Variables…
    • You will now see a dialogue box. Select the Weight_jan variable from the left box and throw it over to the Input Variable ->Output Variable box
    • You will now provide a new name wtclass_jan in the Name box under the Output Variable section.
    • Add an appropriate label – maybe something like January Weight Class Group
      • Change
      • you should now see weight_jan -> wtclass_jan in the middle box
    • Now select Old and New Variables button
      • This is where you will tell SPSS that wtclass 1 contains any weight measures taken in January that include 13 – 16
      • There are different ways to do this – select the one that works best for you
      • Old value – will be the values in weight_jan – I will use Range – 13 – 16
      • New value – will be 1
      • Click Add
      • Create the next 2 groups
    • Continue
  • OK

Remember SPSS creates any new variable and adds it to the end of the current dataset. We have a small dataset, so it is easy to find, but when you start to work with your own data, you may have 100s of variables – so remember that the new variables are added to the end.

Check the new wtclass_jan variable to see if it worked.

Save your dataset

What can SPSS do?

There are many data manipulations that can be performed in SPSS. Whether you do these in Excel or SPSS it does not matter. Document any changes you made to remember what you did.

Summary

Reading multiple sheet Excel workbooks into SPSS is easily done with this Custom Dialog. This tutorial demonstrates how to use it.

Before You Start

SPSS Read and Merge Excel Files Tool
  • Make sure you have the SPSS Python Essentials installed.
  • Download and install the xlrd module.
  • If you'd like to generate some test data as done in the syntax example, you'll need the xlwt module as well.
  • Download and install Excel to SPSS Tool. Note that this is an SPSS custom dialog. You'll now find Read and Merge Excel Files under Utilities.
  • Close all datasets in SPSS.

SPSS Syntax Example for Generating Test Data

* Create some small Excel workbooks for testing.

begin program.
rdir=r'd:temp' # Specify folder for writing test files.
import xlwt,random,datetime,os
fmt = xlwt.easyxf(num_format_str='M/D/YY')
wBooks = ['book_' + str(cnt) for cnt in range(1,5)]
for noSheets,wBook in enumerate(wBooks):
wb=xlwt.Workbook()
for sheetNo in range(noSheets + 1):
ws=wb.add_sheet('sheet_%d'%(sheetNo + 1))
for col,cont in enumerate(['date','ID','JobTitle','Revenue']):
ws.write(0,col,cont)
for row in range(1,6):
ws.write(row,0,datetime.datetime(2008 + sheetNo,1,1) + datetime.timedelta(days=random.randrange(1,365)),fmt)
ws.write(row,1,random.choice([None,104,21,60,2,1030]))
ws.write(row,2,random.choice([None,'Developer','Tester','Manager']))
ws.write(row,3,random.randrange(40,80)*1000)
wb.save(os.path.join(rdir,wBook + '.xls'))
end program.

Reading All Data Into SPSS

Since we created our test data in d:temp, this folder will hold the Excel files. We can simply copy-paste this into the dialog. Other than that, we don't have to change anything. The first row holds the variable names and we'd like all sheets from all workbooks to be read.

Description

  • By default, the program will read in all .xls files in a folder specified by the user.
  • By default, all data from all sheets will be imported. The default of all sheets can be overridden by specifying one or more sheets (see below).
  • In order for this to make sense, all sheets in all workbooks are assumed to have similar formats (numbers of columns, column contents).
  • By default, it is assumed that the first row of each sheet contains column names. If these conflict, the column names of the last sheet of the last workbook that's read will be used. If no column names are present, column_1, column_2 and so on will be used as variable names in SPSS.

Merge Data Add Variables Spss

Converting Date Variables

Spss Merge Data One To Many Add Variables

Date variables in the Excel files are not automatically converted to SPSS date variables. After reading in the data, they can be converted with the syntax below.
* Convert 'date' to date format.

compute date=datesum(date.dmy(30,12,1899),date,'days').
format date(edate10).
exe.

What if I Don't Want All Sheets to be Read?

  • In this case, the desired sheets can be specified. Note that the first sheet is referenced by 1 (rather than 0).
  • If two or more sheets are to be read, separate them with commas.
  • If sheets that are specified do not exist in one or more workbooks, the command will not run. An error message will indicate the first workbook where this occurred.

What if I Don't Want All Workbooks to be Read?

This default can not be overridden. A workaround may be to move irrelevant workbooks to a different folder.