Provide a summary of sample data using descriptive statistic

Assignment Help Case Study
Reference no: EM13847848

Assignment: Advanced Analytics

Objectives: The assignment will rely on a series of workshops in which  students will understand how to use SAS Enterprise Miner (EM) to combine analytics of structured and unstructured data (text) to make business predictions. The assignment will assume good working knowledge of all previously studied methods.

Mini Case Study: This mini case study will be used in all workshops of module. All amendments, extensions and assumptions should be recorded in the final submission.

Business Problem - Early Warning System

The client is US National Highway Traffic Safety Administration (NHTSA, pronounced "NITS-uh"). NHTSA is an agency of the Executive Branch of the U.S. government, part of the Department of Transportation. They are responsible for reducing deaths, injuries and economic losses resulting from motor vehicle crashes. They require an Early Warning System for potential safety issues associated with automotive vehicles due to manufacturing problems. In particular they require an analytic model to be developed, capable of predicting the likelihood of a vehicle crash, based on publicly available vehicle safety complaints. In the circumstances when the likelihood of crashes is high, NHTSA will initiate a recall of all vehicles likely to be affected.

The sample data for recreational vehicles (e.g. pick-up trucks, minivans, SUVs, etc.), is from the NHTSA. Each record was filed by individuals who had experienced problems with a specific vehicle component that may or may not have resulted in a vehicle crash.

Two data tables have been provided:

- trucks1.sas7bdat (20.5 Mb of complaints data - download from CloudDeakin)

- stoplst2.sas7bdat (word stop list - download from CloudDeakin)

There are 56,601 observations in this sample of NHTSA data, where each observation is a document (record) representing a single complaint filed with the NHTSA through their survey instrument.

Target Variable: "CRASH" Approximately 30% of the complaints (documents) describe a situation in which a vehicle crash resulted from the failure/malfunction of the specified vehicle component.

The NHTSA collects consumer complaints regarding safety related motor vehicle and motor vehicle equipment by make, model and year, and includes Vehicle Component.

Consumers are directed to a web site that guides them through submitting their complaint through a survey instrument. The complaint information is entered into NHTSA's vehicle owner's complaint database and used with other complaints to determine if a safety-related defect trend exists. (For consumers without web access, they may call NHTSA directly and an operator will collect their information and enter it into the database.)

- If a safety-related defect exists in a motor vehicle or item of motor vehicle equipment, the manufacturer must fix it at no cost to the owner. The complaint is the first step in the process.

- Government engineers analyse the problem. If warranted, the manufacturer is asked to conduct a recall. If the manufacturer does not initiate a recall, the government can order the manufacturer to initiate a recall.

- The NHTSA does not have to receive a specific number of complaints before they look into a problem. They gather all available information on a problem. Each complaint is important to them.

Mini Study Predictive Analytics Workshop and Assignment

3 of 5 Questions

Q1. Describe the business problem and the potential value of the predictive model to the Propose an analytic solution to the problem and support your recommendation with references to the conducted data and text analytics.

Q2. Provide a summary of the sample data using descriptive statistics and frequency Specifically identify any anomalous or inconsistent data characteristics, explaining the potential impact.

Q3. Describe any treatments or transformations undertaken to resolve, the anomalous or inconsistent data characteristics from question 2.

Q4. Perform text analytics on the "CSUMMARY" data item, generating at least 5 topic clusters. Provide a description for each of the clusters generated.

Q5. Develop at least three predictive models for each of the following input characteristics combinations:

a. Using only the structured data (all columns excluding: CSUMMARY and the text topic clusters)

b. Using only the unstructured data (using only the generated text topic clusters)

c. Using both structured and unstructured data

Q6. For all models provide a summary of the model assessment statistics over the and validation partitions

Q7. Select the best predictive model and provide a summary of the model and its performance.

Reference no: EM13847848

Questions Cloud

A healthcare setting : Discussion questions. Need 2 references. No word count; just provide evidence-based responses. 1. Name 2 ethical principles that you feel MUST be adhered to within a healthcare setting. Provide rationale. 2. Name 2 laws or legal requirements that p..
Ethics and social responsibility : Looking for an outline of an "ethics and social responsibility" research paper on the Affirmative Action for air traffic controllers.  My final paper needs to be 8-10 pages so need enough in the outline to support that.  The paper is due this weekend..
Identify the frame used by the leaders : Identify the frame(s) used by the leaders in the Challenger and Columbia situations (i.e., Structure, HR, Political, and Symbolic). Review the choice of frames made by the management in those situations. Explain if the situation with the space shu..
Prescription information along with patient name : The prescription information along with patient name, DOB, medication brand name, and prescribing doctor name sorted by most recent date. The most prescribed generic medication name
Provide a summary of sample data using descriptive statistic : Provide a summary of the sample data using descriptive statistics and frequency Specifically identify any anomalous or inconsistent data characteristics, explaining the potential impact.
Implied price of funding : Starware software was founded last year to develop software for gaming applications. Initially, the founder invested $800,000 and received 8 million shares of stock. Starwarenow needs to raise a second round of capital, and it has identified an in..
Basic components of a personal computer : The history of computers in healthcare. Basic components of a personal computer. An introduction to the Electronic Health Record. Similarities and differences between the Internet, Intranet, and Extranet as used in healthcare
What are the legal responsibilities and liabilities : Write a persuasive argument defending your position. Cover the following: Assuming the steel cable was not defective in either its design or manufacturing process, does National have a cause of action against WV Steel? What are the legal responsibi..
Why do some economists claim : Why do some economists claim that we should not worry too much about inflation at this level?

Reviews

Write a Review

Case Study Questions & Answers

  Explain how the identified issues have impacted the business

Explain in detail how the identified issues have impacted the business; Make recommendations to Tom on how to improve the management of his small business.

  How do you feel about the external validity

Which factors affect the internal validity of Xaviers study given the experimental design - how do you feel about the external validity of this study?

  Research health care costs and coverage options

Research health care costs and coverage options (CDHP) for your assigned individual. Consider what the fictional people will need during the year.

  Do you agree with the federal judges decision

What are the ethical Issues in this case and who are the stakeholders. and how are they impacted by this situation?

  The toro company sno risk program

Review the case study "The Toro Company S'No Risk Program" by David E. Bell from this module's assigned readings. Use this tool to conduct your data analysis for this assignment.

  Describe the object and characteristics of syafiqs study

Measurement is the assignment of numbers to characteristics (or attributes) of objects according to a pre-specified set of rules. Describe the object and characteristics of Syafiq's study.

  Analysis of the investment - anthonys orchard case study

Explain how purchase of the apple press might affect the company's revenue goals. Based on this information, explain whether Anthony's Orchard should invest in the apple press.

  Explain how google analyzed their value chain

Write a summary of the case study. In your paper explain how Google analyzed their value chain for the purpose of determining where they were able to create value when using their resources, capabiltites and core competencies

  Improve upon the system you identified

What is the cost for each of the three products Mr. King chose at random? What explains the differences and which of the three systems is the best?

  The airlines fault because of safety violations

Research an air incident that was considered to be the airlines fault because of safety violations. Give the details of the cases and the url for the case (only include the url at the end of the post). Lastly, give your opinion of why airlines cut sa..

  Ways that skullcandys size and growth rate influence

What are some of the ways that Skullcandy's size and growth rate influence its development process

  Krispy kreme financially healthy at year-end 2004

Is Krispy Kreme financially healthy at year-end 2004 and in light of your answer to question 1, what accounts for the firm's recent share price decline?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd