[Audio] Descriptive analysis using SPSS Mekdes T.( Assist Professor of Epidemiology and Biostatistics).
[Audio] Outline ❖Importing data ❖ Data cleaning ❖Handling variables ❖(Defining, creating, computing, recoding, adding and deleting) ❖Descriptive analysis ❖ Proportion, mean, median, standard deviation, Skewness, kurtosis 6/ 18/ 2022 BY MEKDS T. YILMA 2.
[Audio] Working with data in Spss Opening: existed SPSS data file Importing data: from Excel, csv, database, text To open existed SPSS file ➔ you can double click on the file OR Click File ➔ Open ➔ Data. The Open Data window will pop up. ➔ 6/ 18/ 2022 BY MEKDS T. YILMA 3.
[Audio] To import data from database File➔ Import data ➔ database➔ new query ➔welcome wizard window will be displayed➔ select Ms acess database ➔Next or (you can also double click on it)➔browse your mdb file➔ from the select data➔ drag all variables to retrieve field➔ click next➔ I n define variable window (for single table) or specify relation ship window (multiple tabel) manage your variables or join the table➔ finish to import 6/ 18/ 2022 BY MEKDS T. YILMA 4.
[Audio] Importing from Epidata Open Epidata➔ export data➔ browse your rec file ➔ select and open➔ ok➔ now the data became exported in two format . spss and . txt file➔ click on . SPSS file➔ the SPSS syntax window➔ scroll down to the last SAVE line➔ cancel the asterix(*)➔ run the syntax (in the SPSS syntax menu bar)➔ all 6/ 18/ 2022 BY MEKDS T. YILMA 5.
[Audio] Data cleaning ➢This is the initial step for any statistical analysis➔ aimed to check and address any errors ➢ Data processing errors are errors that occur after data have been collected. Those common errors are ➢ Transpositions (e.g., 19 becomes 91 during data entry) ➢Copying errors (e.g., 0 ( zero) becomes O during data entry) ➢Coding errors ( e.g., a racial group gets improperly coded because of changes in the coding scheme) ➢ Routing errors (e.g., the interviewer asks the wrong question or asks questions in the wrong order) ➢ Consistency errors (contradictory responses, such as the reporting of a hysterectomy after the respondent has identified himself as a male and pregnancy vs male) ➢Range errors ( responses outside of the range of plausible answers, such as a reported age of 110 ) 6/ 18/ 2022 BY MEKDS T. YILMA 6.
[Audio] How to prevent data processing errors ✓Manual checks during data collection (e.g., checks for completeness, handwriting legibility) ✓ Range and consistency checking during data entry (e.g., preventing impossible results, such as ages greater than 110) ✓Double entry and validation following data entry ✓ Data analysis screening for outliers during data analysis ✓Descriptive analysis 6/ 18/ 2022 BY MEKDS T. YILMA 7.
[Audio] Descriptive analysis Exploratory and summary statistics ✓ You can simple right click on the variable and select descriptive statistics➔ generate frequency distribution ➔ 6/ 18/ 2022 BY MEKDS T. YILMA 8.
[Audio] OR Analyze➔ descriptive statistics➔ frequencies➔ statistics 6/ 18/ 2022 BY MEKDS T. YILMA 9.
[Audio] SPSS Descriptive 6/ 18/ 2022 BY MEKDS T. YILMA 10.
[Audio] SPSS Explore 6/ 18/ 2022 BY MEKDS T. YILMA 11.
[Audio] Graphs in SPSS This window remind you to define value label to the variable If you already define it ➔ OK If not then click on define variable properties ✓ Select the type of graph ✓ Click on the type of selected graph ✓ Then drag and drop it in the preview ✓ Select the variable in each axis ✓ Finally format the graph title, footnote…. Then OK 6/ 18/ 2022 BY MEKDS T. YILMA 12.
[Audio] Common graph for diagnosis and cleaning ➢ Histogram/ stem and leaf plot➔ distribution of the variable ➢ Normal Q-Q plot➔ normality➔ ➢ Box and whisker plot➔ outliers ➢ Scatter plot➔ Correlation ➢ Correlation matrix ➔ Correlation 6/ 18/ 2022 BY MEKDS T. YILMA 13.
[Audio] Handling variables ❑In order for your data analysis to be accurate, it is imperative that you correctly identify the type and formatting of each variable. ❑ Information for the type of each variable is displayed in the Variable View tab. ❑ Click the Variable View tab, locating the variable, and clicking on the cell beneath the " Type" column. A blue "…" button will appear. Clicking the blue "…" button opens the Variable Type window. Select the appropriate type for the variable 6/ 18/ 2022 BY MEKDS T. YILMA 14.
[Audio] Important tip When you are dealing with Numeric: check the type of numeric variable(continuous or discrete) ✓ Note: numeric value coded to nominal and ordinal variable should not be used for mathematical calculation String: is alphanumeric variables or character variables which have values that are treated as text. ✓In the Data View window, missing string values will appear as blank cells. However, these blank cells are not recognized by SPSS as system-missing values. ✓SPSS considers event➔ this has important implications if you plan blank strings to be non-missing ➔ it will affect your sample size. Date: treated as a special type of numeric variable. 6/ 18/ 2022 BY MEKDS T. YILMA 15.
[Audio] Tips continued SPSS date format 6/ 18/ 2022 BY MEKDS T. YILMA 16.
[Audio] SPSS duration format Tips continued 6/ 18/ 2022 BY MEKDS T. YILMA 17.
[Audio] Changing the variable type from string or numeric to a date/time format 1. Step 1: Define the variable as date/time and select the format in which your dates/ times currently appear. 2.Step 2: after you have specified the current format of date/time values for that variable, you can then change the format of the date following the same steps you used to define the variable type and date format during the first step. ❑ Note : to apply these steps your variable values already appear in a standard date/time format. your variable is currently defined as "string" or "numeric" rather than date/time. 6/ 18/ 2022 BY MEKDS T. YILMA 18.
[Audio] Transformations and calculations that involve date and time variables using Date and time wizard 6/ 18/ 2022 BY MEKDS T. YILMA 19.
[Audio] Date and time wizard Can assist you with •Creating a date/time variable from a string containing a date or time •Creating a date/time variable from variables that contain parts of dates or times •Calculating with dates and time(to calculate elapsed time) •Extracting parts of dates or time •Assigning periodicity to a dataset for time series data 6/ 18/ 2022 BY MEKDS T. YILMA 20.
[Audio] Defining variables It involves defining the name, type, label, value, missing ans so on 6/ 18/ 2022 BY MEKDS T. YILMA 21.
[Audio] In the variable View tab displays Name: To change a variable's name, ➢ Double-click on the name of the variable that you wish to re-name. ➢Type your new variable name. Type: To change a variable's type, ➢Click inside the cell corresponding to the "Type" column for that variable. ➢A square "... " button will appear; click on it to open the Variable Type window. ➢Click the option that best matches the type of variable. ➢Click OK WIDTH: IS The number of digits displayed for numerical values or the length of a string variable. To set a variable's width, ➢Click inside the cell corresponding to the " Width" column for that variable. ➢Then click the "up" or "down" arrow icons to increase or decrease the number width. 6/ 18/ 2022 BY MEKDS T. YILMA 22.
[Audio] Defining variable cont DECIMALS: Is the number of digits to display after a decimal point for values of that variable. ◦ Note that it does not apply to string variables and this changes how the numbers are displayed, but does not change the values in the dataset. To specify the number of decimal places for a numeric variable, ➢Click inside the cell corresponding to the " Decimals" column for that variable. ➢Then click the "up" or "down" arrow icons to increase or decrease the number of decimal places. LABEL: Is brief but descriptive definition or display name for the variable. When defined, a variable's label will appear in the output in place of its name. ➢ Double-click on the label of the variable that you wish to label. ➢Type the description of the variable. 6/ 18/ 2022 BY MEKDS T. YILMA 23.
[Audio] Defining variable cont VALUES: Value labels are useful mainly for categorical. To define value label ➢Click the cell that corresponds to the variable whose values you wish to label. ➢ If the values are currently undefined, the cell will say " None." ➢Click the square "…" button. The Value Labels window appears. ➢Add the number in the value and the description in the label ➢Then Add➔ OK 6/ 18/ 2022 BY MEKDS T. YILMA 24.
[Audio] Defining variable cont Missing: To set user-defined missing value ( number, dot or codes), ➢Click inside the cell corresponding to the "Missing" column for that variable.➔ the missing dialog box ➢Click the option that best matches how you wish to define missing data and enter any associated values, ➢Then click OK COLUMNS: The width of each column in the spreadsheet. To set a variable's column width, ➢Click inside the cell corresponding to the " Columns" column for that variable. ➢Then click the "up" or "down" arrow icons to increase or decrease the column width. ALIGN: The alignment of content in the cells of the SPSS Data View spreadsheet. To set the alignment for a variable, ➢Click inside the cell corresponding to the " Align" column for that variable. ➢Then use the drop-down menu to select your preferred alignment: Left, Right, or Center. MEASURE: The level of measurement for the variable (e.g., nominal, ordinal, or scale). To define a variable's measurement level, ➢Click inside the cell corresponding to the " Measure" column for that variable. ➢Then click the drop-down arrow to select the level of measurement for that variable: Scale, Ordinal, or Nominal. 6/ 18/ 2022 BY MEKDS T. YILMA 25.
[Audio] ROLE: Is the role that a variable will play in your analyses (i.e., independent variable, dependent variable, both independent and dependent). ➢ Input: The variable will be used as a predictor (independent variable). ➢ Target: The variable will be used as an outcome (dependent variable). ➢ Both: The variable will be used as both a predictor and an outcome ➢ None: The variable has no role assignment. ➢ Partition: The variable will partition the data into separate samples. ➢ Split: Used with the IBM® SPSS® Modeler (not IBM® SPSS® Statistics). To define a variable's role in your analysis, ➢ Click inside the cell corresponding to the " Role" column for that variable. ➢ Then use the drop-down menu to select the role that variable will take: Defining variable cont 6/ 18/ 2022 BY MEKDS T. YILMA 26.
[Audio] You can define variable from menu bar using Define Variable Properties Data➔ define variable properties ➔ select the variable and define the variable in the dialog box 6/ 18/ 2022 BY MEKDS T. YILMA 27.
[Audio] Variable transformation and data management ➢Variable transformation involves recoding, merging, generating and computing variables ➢ Data management involves Sorting, Splitting, weighting and Partitioning 6/ 18/ 2022 BY MEKDS T. YILMA 28.
[Audio] Recoding ➢ Recoding a variable: used to transform an existing variable into a different form based on certain criteria. ➢It can be done by combining some of the variable categories or values together ➢Used to change a continuous variable into an ordinal categorical variable ➢Used to merge the categories of a nominal variable ➢ Automatic Recode is also used to quickly convert a string categorical variable into a numeric categorical variable. 6/ 18/ 2022 BY MEKDS T. YILMA 29.
[Audio] Recode into Different Variables Transform➔ recode in to different variable ➔ in the new dialog box select and move the input variable ➔ define the name and the label of the output variable➔ click change➔ then click Old and New values➔ in the new dialog box define the old and new values➔ add➔ continue➔ OK 6/ 18/ 2022 BY MEKDS T. YILMA 30.
[Audio] Recode in to different var cont Note: ➢When recoding variables, always handle the missing values first! The most common recoding errors happen when you don't tell SPSS explicitly what to do with missing values. ➢This procedure does not include the ability to add value labels to the new categories, so immediately after recoding, you should add value labels to your new numeric codes. 6/ 18/ 2022 BY MEKDS T. YILMA 31.
[Audio] Recode into Same Variables Transform➔ recode in to same variable ➔ in the new dialog box select the variable and move to variable box➔ Click on old and new value➔ in the new dialog box set the old and new values ➔ add ➔ continue➔ OK 6/ 18/ 2022 BY MEKDS T. YILMA 32.
[Audio] Automatic Recode ➢Recode categorical string variables to labeled numeric variables ➢The first step is resolving any issues with "mismatched" category strings. For example, if there are different capitalizations of the same word, space before or after category and so on ➢Then Transform➔ Automatic recode➔ in the new dialog box select and move the variable ➔ Enter a new name for the auto recoded variable in the New Name field, then click Add New Name.➔ check treat blank string values as user missing➔ OK 6/ 18/ 2022 BY MEKDS T. YILMA 33.
[Audio] Automatic Recode cont 6/ 18/ 2022 BY MEKDS T. YILMA 34.
[Audio] Compute Variable ➢Used to create new variables from existing variables by applying formulas. For examples it can used to • Convert the units of a variable from pound to kg • Use a subject's height and weight to compute their BMI • Compute a subscale score from items on a survey Transform➔ compute variable ➔ in the dialog box provide target variable name, the numeric expression and so on➔ ok 6/ 18/ 2022 BY MEKDS T. YILMA 35.
[Audio] Generating new variable by computing existing variables ➢Transform➔ compute variable ➔ in the dialog box provide target variable name, function group list, click All➔ in the function and special variable scroll down double click on mean➔ MEAN(?,?) will appear in the numeric expression box➔ replace ? With all variables(variables should be separated by comma inside the parentheses)➔ OK ➢Check the new variable in the variable list 6/ 18/ 2022 BY MEKDS T. YILMA 36.
[Audio] Generating new variable by computing existing variables continued ➢ Transform➔ compute variable ➔ in the dialog box provide target variable name, function group list, click All➔ in the function and special variable scroll down double click on Any➔ Any(?,?) will appear in the numeric expression box➔ replace ? With all variables(variables should be separated by comma inside the parentheses)➔ OK ➢Check the new variable in the variable list 6/ 18/ 2022 BY MEKDS T. YILMA 37.
[Audio] Splitting ➢ Splitting: used to organize statistical results into groups for comparison with out separating your data into two different files. ➢The splitting variable(s) should be nominal or ordinal categorical. ➢The data should be sorted first with respect to the splitting variable. To split go to ➢ Data➔ split file ➔ in the new dialog box select either compare groups or organize output by group➔OK 6/ 18/ 2022 BY MEKDS T. YILMA 38.
[Audio] Weighting ➢Used to allocate representative number of each cases especially when your data measures count in this case the " weight" is the number of occurrences. ➢This often happen in the large survey to adjust over or under representation of certain characteristics in your sample ➢To enable weight • Data➔ weight cases ➔ in the new dialog box to enable a weighting variable, click Weight cases by, then double-click on the name of the weighting variable in the left-hand column to move it to the Frequency Variable field. Click OK. ➢To turn off an enabled weighting variable, open Weight Cases window again, and click Do not weight cases. Click OK. 6/ 18/ 2022 BY MEKDS T. YILMA 39.
[Audio] Crosstab Click Analyze > Descriptive Statistics > Crosstabs.➔ in the new dialog box move the two variable 6/ 18/ 2022 BY MEKDS T. YILMA 40.
[Audio] Descriptive analysis 1. Now you have cleaned your data 2. Managed variables 3. Managed outliers 4. Run diagnostics➔ normality 5. Finally run frequencies, summary statistics for reporting perposes 6/ 18/ 2022 BY MEKDS T. YILMA 41.
[Audio] Exporting output Right click on the result➔ window➔ choose the output you want (selected)➔ choose the document type ➔ browse file directory ➔ set filename and save➔OK 6/ 18/ 2022 BY MEKDS T. YILMA 42.
[Audio] Thank You!!! 6/ 18/ 2022 BY MEKDS T. YILMA 43.