Reference no: EM132241083
Exercise
Deliverables: Two Files: (1) Submit this lab report with answers to all questions including output screenshots. (2) Submit an R script that contains all commands with comments that briefly describe each commands purpose.
All questions must be answered in your own words with any paraphrased references properly cited using in-text citations and a reference list as needed.
Part 2 - Run an exercise on the CreditApproval data set, completing this report and providing the commands, output screenshots, and discussion/interpretation as requested. Ensure that all commands are saved in this report and in an R script.
ii. Run the read.csv() command to load the data into a variable named ‘credit'. Then, run the command to preview the first 10 data rows in ‘credit'.
Include the command and output screenshot. Note: Ensure that you use the utils::read.csv() command and not any other similar commands from other packages.
Command: >
Output:
2 iii. Run the str() command on the ‘credit' dataset. Include the command, output screenshot, and a brief description of how the structure is presented.
Command: >
Output:
Description:
b. Descriptive Statistics:
i. Run the summary() command on the ‘credit' dataset to display the descriptive statistics for all variables. Include the command and output screenshot.
Command: >
Output:
ii. Choose two numeric attributes from ‘credit', run the summary() command on both, and provide your interpretation of each of the six descriptive statistics.
Command: >
Output:
Command: >
Output:
Interpretation:
iii. Choose two factor attributes from ‘credit', run the summary() command on both, and provide your interpretation of each of the six descriptive statistics.
Command: >
Output:
Command: >
Output:
Interpretation:
iv. What differences did you observe between the output of the str() and summary() commands (50 words)?
c. Variable Filters - Discretization and Removing Variables:
ii. Run the three different discretization methods discussed in the tutorial (equal interval, equal frequency, k-means clustering). For each method, include the command and output screenshot. For all commands, provide a one-paragraph discussion (100 words) of the input parameters used, the number of bins, and your interpretation of the output.
Command: >
Output:
Command: >
Output:
Command: >
Output:
Discussion:
iii. Compare and contrast the discretization methods above providing at least one example of when you would use each one (150-200 words).
iv. Run a command to remove one of the attributes from the ‘credit' dataset. Run another command to demonstrate that the attribute was successfully removed. Include both commands and output screenshots as well as a discussion of when and why variables should be removed from a dataset.
Command: >
Output:
Command: >
Output:
Discussion:
6
d. Row Filters - Handling Missing Values and Sorting:
i. Run a command to check if the ‘credit' dataset has any missing values. Your command and output should show all attributes along with how many observations total have missing values. Include the command and output screenshot.
Command: >
Output:
ii. Choose one of the numeric attributes with missing values and run the command to replace the missing values with the attribute mean. Then run the command to verify that the variable no longer has missing values. Include both commands and output screenshots.
Command: >
Output:
Command: >
Output:
v. Run the command to sort the ‘credit' dataset by one of the attributes. Then run the command to validate the sorting. Include both commands and output screenshots as well as a discussion where you provide at least two reasons why data should be sorted.
Command: >
Output:
Command: >
Output:
Discussion:
8
e. Data Visualization:
i. Run the plot() function for one of the variables in the ‘credit' dataset. Include the command, output screenshot, and a one-paragraph (100 words), masters-level interpretation of what the plot shows.
Command: >
Output:
Discussion:
References