Identifying inconsistent data encodings

Assignment Help Basic Computer Science
Reference no: EM13998102

Data Preparation - Cleaning up any issues in the data to allow it to be analyzed using various software tools such as Tableau. In a project, this phase can take 80 to 90% of the overall effort.

• Decide how to handle any blank values. If blank is unknown, you may want to leave the value blank. On the other hand, it blank means "not applicable", you may want to replace the blank cell with "NA".

• If feasible, merge tables together as needed to join together two or more tables that have different information about the same objects. A common field in multiple tables is needed to join the tables together.

• Manually (or using tools if available), review the data to look for unusual patterns or distributions in the data that might call into question the validity of the data. It involves using a critical eye to examine the data.

Identifying inconsistent data encodings (e.g., different abbreviations might be used for state)

Identifying suspicious data responses (e.g., when physically questionable numbers are put in for a response such as the same answer on a survey for all the questions.)

Are there outliers that don't seem to make sense? For example, salaries for teenagers that are in the six figures or average traffic at a store that is typically in the thousands but then seeing some values that are in the ten range or million range.

• Perform any other needed data preparation required. This is an open-ended step and specific details will depend on the changes needed and software tools used. Make sure to

• Compare the data provided as well as the data that you have prepared to the questions to be analyzed from the Business Understanding phase. Does it appear that it is possible to answer the questions from the data provided?

If you are missing needed data and the sponsor does not have the data nor can the data be generated by the sponsor; the project needs to be revised or cancelled. Make sure to document the data that is needed. If feasible, determine how this data can be collected or generated for future analysis.

• Keep track of issues found during this phase. This might be recommendation back to the sponsor to capture that data originally using a different format or method to reduce the effort needed to clean the data. In some cases, this can be one of the more valuable contributions of your project. Data preparation can take 80 to 90% of a project's overall time and resources.

If issues can be reduced going forward, this can save a great deal of time and money and allow further analysis to be performed easier.

Reference no: EM13998102

Questions Cloud

When choosing a store or business : When choosing a store or business, select a medium or large operation as it will be easier to complete the assignment. Look at the information the store or business uses in their daily business operations and identity four (4) key security risks to t..
Computer security is not an issue for organizations alone : Computer security is not an issue for organizations alone. Anyone whose personal computer is connected to a network or the Internet faces a potential risk of attack. Identify all the potential security threats on a personal computer. Identify some of..
What is the equivalent resistance of the bulbs connected : What is the equivalent resistance of the bulbs connected in parallel? What peak current Imax will be supplied to the parallel combination by the 120-V rms source?
The roles of is professionals : Select who you think are the five main information security professionals as described in Chapters 1 and 11.Write an essay describing the responsibilities for each role you chose and your reasons for including the IS professional role in your top fiv..
Identifying inconsistent data encodings : • Perform any other needed data preparation required. This is an open-ended step and specific details will depend on the changes needed and software tools used.
Review several online newspapers : Review several online newspapers, news sites, and professional magazines and locate examples of three (3) different types of threats "and" three (3) different types of attacks. Write a summary for each article followed by an explanation of how they i..
What are the concepts of sickness disease and imbalance : What are the concepts of body-person between these relativity models of medicine? Know the difference between biomedicine and traditional medicine approaches. What are the concepts of sickness, disease and imbalance
Moving target defenses to network security : Application of Moving Target Defenses to Network Security Resource Mapping System Adaptation Engine Analysis Engine
Develop a list of questions about the patient flow process : Contact the emergency department at a local hospital and ask to set up an interview with an administrator or manager in this ED. This must be someone who can discuss process improvement efforts

Reviews

Write a Review

Basic Computer Science Questions & Answers

  As a beginning business analyst

As a beginning business analyst, you have been tasked by the project manager to develop the system requirements related to the handling of outstanding purchase orders.

  What is the best possible scenario

Q1. With regards to the order of hierarchy, what is the best possible scenario in terms of controlling risk?Q2. What is the hierarchy of control from most to least preferred?

  Determine the maximum number

Determine the maximum number Determine the range (maximum - minimum) Displays a histogram that shows the numbers in each five-unit range.

  What will be the date in the opening of the letter

If /home/zach/draft and /home/max/letter are links to the same file and the following sequence of events occurs, what will be the date in the opening of the letter?

  Compare brands of virtualization software available

Compare and contrast the top three (3) brands of virtualization software available. Focus your efforts on components such as standard configuration, hardware requirements price, and associated costs.

  Compute the execution time for both computers

Compute the execution time for both computers for this application - compare the performance of a vector processor with a system that contains a scalar processor and a GPU-based coprocessor.

  What errors prevent the table displayed

What errors prevent the table displayed above from being first normal form compliant?Bring the table(s) into first normal form compliance without loss of any data. Identify primary and foreign keys (when present) for all tables.

  Ethics and information technology

The APA style three page paperon ethics and information technology and how it touch the ethics of using technology how complicated making ethical decisions about its use truly is. Portents of this cannot be overstated as you know simply by living in ..

  Design a program that prompts the user to enter a string

Design a program that prompts the user to enter a string. The program should then display the number of vowels and the number of consonants in the string. I am absolutely lost and don't even know how to start, any assistance would be appreciated.

  What is the size of a char and a string

What are the sizes in memory of other data types in C++? I mean, I know that a double is 8 bytes and an int is 4 bytes. What is the size of a Char and a String?

  .add a dropdown box/listbox and a checkbox

I need to create an HTML contact form that when submitted is processed by a JSP. a.The JSP displays the entered data along with the length of characters entered in each field.

  Give a recursive definition for the language

Give a recursive definition for the following language over the alphabet {a, b} The language AA of all words containing the substring aa

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd