Analyze the class performances

Assignment Help Basic Computer Science
Reference no: EM13536180

Question 1: As a highly application-driven discipline, data mining has been widely applied in many areas. We briefly presented two highly successful and popular application examples of data mining: business intelligence and Web search engines, in our textbooks. Do you think that data mining can also be applied to the following areas?

If yes, please provide a brief yet concrete example, if not, please briefly state your reasons.

1) Software Engineering.

2) Transportation.

3) Sociology.

Question 2: Suppose a student collected the price and weight of 20 products in a shop with the following result

price $11.78 $85.12 $10.47 $298.00 $38.45 $102.14 $123.62 $203.29 $65.00 $225.50

weight 3.2 3.4 4.5 35.4 9.1 5.7 1.5 23.8 8.6 42.3

price $9.25 $164.32 $102.45 $120.45 $73.15 $625.00 $125.00 $242.64 $441.76 $325.45

weight 5.9 12.3 6.5 11.8 12.2 32.9 11.6 48.0 52.9 78.2

Q2.1. Calculate the mean, Q1, median, Q3, and standard deviation of price and weight;

Q2.2. Draw the boxplots for price and weight

Q2.3. Draw scatter plot and Q-Q plot based on these two variables

Q2.4. Normalize the two variables based on the min-max normalization (min = 1, max = 10)

Q2.5. Normalize the two variables based on the z-score normalization

Q2.6. Calculate the Pearson correlation coefficient. Are these two variables positively or negatively correlated?

Q2.7. Take the price of the above 20 products, partition them into four bins by each of the following methods

1) equal-width partitioning

2) equal-width partitioning

Question 3: Design a data warehouse for a university's gradebook data to analyze the class performances. Suppose the data warehouse  consisting of the following dimensions: department, semester, course, student, instructor, and gradebook; and a set of measures you would like to define.

1. Draw a star-schema, based on your consideration of power and convenience of analysis of the Warehouse
2. Is top 10% in a class a holistic or algebraic measure? Discuss how to develop an efficient (maybe approximate) methods to compute a query like: find those Engineering students whose final score is within top 10% in class in at least 80% of the CS courses that he or she has taken?
3. Is it a good idea to merge this data warehouse and the current university's gradebook database system together into one big data management/analysis system? Why?

Question 4:

A location-based social networking website which provides check-in services hires you to help them build a data warehouse.

Users of this service can "check-in" at venues using mobile device applications by running the applications and selecting from a list of venues that the application locates nearby. Also, users can "add" each other as "friends". The website also has sufficient information about venues, including address, GPS location, and category of the venue (e.g., a Japanese restaurant), and users tend to provide their personal information to the website when they register.

1. Design a data warehouse that may facilitate effective on-line analytical processing for this website (provide both schema and measures, also explain why).

2. Check-in data collected from the website and mobile applications are noisy. Besides network and device errors, are there any other reasons which might cause noises in this data set? For the reason you come up with, discuss a method that can clean-up check-in data effectively in the data warehouse.

3. One may like to performance on-line analytical processing to the checks-in data at different venues by month, by cities and by categories (Italian or Japanese, etc.). How can this be done efficiently in the data warehouse?

4. Hackers create fake profiles on this website. They are using bots to manipulate fake profiles, generate fake check-in data and try to add everyone as their friends (yes this is a common problem for many social network websites, and no, I am not telling you how to write bots). Although bots are trying to mimic real users, they still behave differently, e.g., they check-in at random places (Chicago this minute, Las Vegas next minute), they check-in way too often than real users, and their social network structures are usually very large but also very sparse (your friends on facebook tend to form communities but bots don't do that). Discuss possible solutions on how to identify fake profiles (bots) in your data warehouse.

Reference no: EM13536180

Questions Cloud

Explain liquid phosphorus trichloride is added to water : When liquid phosphorus trichloride is added to water, it reacts to form aqueous phosphorous acid, H3PO3(aq), and aqueous hydrochloric acid
Determine the dollar amount that will be reported for land : What dollar amount will the land be shown in the financial statements and determine the dollar amount that will be reported for land that is shown in the financial statement.
Explain what would the volume of the evaporated water : A tablespoon of water has about 30.0 grams of water in it. If this were to be heated to 300 Farenheit the water evaporates. What would the volume of the evaporated water be if it were collected at 29 psi? (14.7 psi = 1 atm)
Estimate the initial gravitational energy of the box : A box starts out at the top of a frictionless ramp, then slides down. The ramp has a heighth=3meters and a slope of5degrees with respect to the ground. What is the initial gravitational energy of the box at the top of the ramp
Analyze the class performances : Design a data warehouse for a university's gradebook data to analyze the class performances. Suppose the data warehouse  consisting of the following dimensions: department, semester, course, student, instructor, and gradebook; and a set of measures y..
Element of financial statements or an account : Determine the balance in the Retained Earnings account as of January 31, 2010 and comment on whether retained earnings is an element of financial statements or an account.
Explain compounds has electrovalent : Which of the following compound or compounds has electrovalent, covalent, co-ordinate as well as hydrogen bond. Explain the reasons for each making appropriate use of diagrams wherever necessary.
Determine how much power does it deliver to the clock : A grandfather clock is powered by the descent of a 4.40-kg weight. If the weight descends through a distance of 0.750 m in 3.00 days, how much power does it deliver to the clock
Classifying events as asset source : Classifying events as asset source - Receive cash from customers for services rendered

Reviews

Write a Review

Basic Computer Science Questions & Answers

  Describe findings after conducting internet search for cde

The network administrator mentions that other ".cde" files have been sent through an FTP server to another site. Describe your findings after conducting an Internet search for ".cde" files.

  Steps to follow to verify local police department-s findings

Your computer investigation firm has been hired to verify the local police department's findings on a current case. What do you need to ask the police investigator for, and what procedures should you follow?

  Message exchange patterns in soap

Let two main types of message exchange patterns in SOAP (and operation types in WSDL): (1) request-response and (2) one-way.

  Item number in the inventory

Write a console program that will ask the user for an item number in the inventory and display the name of that item. Stay in a loop so that the user may enter the item number multiple times. Use the letter E to exit the application. Show an error me..

  How many nibbles are in a byte

What is the hex value of the least significant nibble of the binary number 1001 0101?

  What is the asymptotic time complexity of algorithm

Suppose an algorithm has two parts. The first part involves sorting and takes (10 nlog n) steps, where n is the input size. What is the asymptotic time complexity of the overall algorithm, in Big Theta notation?

  Simulating problem for arriving for lunch at restaurant

Customers for lunch arrive into a restaurant at Exponential rate of 10 per hour for seated service and at  Exponential rate of 15 per hour for buffet. Simulate this problem for 8 hours using minutes as the basic time units.

  Describe various types of dos attacks and techniques for pre

Describe various types of DoS attacks and techniques for preventingthem

  Determine whether the relation r on the set

Determine whether the relation R on the set of integers Z is reflexive,symmetric,antisymmetricand /or transitive? which of these relations is equivalence relations?which of these relations is partial ordering?

  Optimal value of the objective function

Find the optimal value of the objective function for the following problem by only inspecting its dual. (Do not solve the dual by the simplex method)

  Write a program that asks the user for names of two files

write a program that asks the user for the names of two files. the first file should be opened for reading and the second file should be opened for writing . the program should read the contents of the first file, change all characters to uppercas..

  Synthesizing a picture of an area

Synthesizing a picture of an area, its relevance, structure and core ideas; and identifying what problems they are trying to solve and issues that need further exploration.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd