Analyze the class performances

Assignment Help Basic Computer Science
Reference no: EM13536180

Question 1: As a highly application-driven discipline, data mining has been widely applied in many areas. We briefly presented two highly successful and popular application examples of data mining: business intelligence and Web search engines, in our textbooks. Do you think that data mining can also be applied to the following areas?

If yes, please provide a brief yet concrete example, if not, please briefly state your reasons.

1) Software Engineering.

2) Transportation.

3) Sociology.

Question 2: Suppose a student collected the price and weight of 20 products in a shop with the following result

price $11.78 $85.12 $10.47 $298.00 $38.45 $102.14 $123.62 $203.29 $65.00 $225.50

weight 3.2 3.4 4.5 35.4 9.1 5.7 1.5 23.8 8.6 42.3

price $9.25 $164.32 $102.45 $120.45 $73.15 $625.00 $125.00 $242.64 $441.76 $325.45

weight 5.9 12.3 6.5 11.8 12.2 32.9 11.6 48.0 52.9 78.2

Q2.1. Calculate the mean, Q1, median, Q3, and standard deviation of price and weight;

Q2.2. Draw the boxplots for price and weight

Q2.3. Draw scatter plot and Q-Q plot based on these two variables

Q2.4. Normalize the two variables based on the min-max normalization (min = 1, max = 10)

Q2.5. Normalize the two variables based on the z-score normalization

Q2.6. Calculate the Pearson correlation coefficient. Are these two variables positively or negatively correlated?

Q2.7. Take the price of the above 20 products, partition them into four bins by each of the following methods

1) equal-width partitioning

2) equal-width partitioning

Question 3: Design a data warehouse for a university's gradebook data to analyze the class performances. Suppose the data warehouse  consisting of the following dimensions: department, semester, course, student, instructor, and gradebook; and a set of measures you would like to define.

1. Draw a star-schema, based on your consideration of power and convenience of analysis of the Warehouse
2. Is top 10% in a class a holistic or algebraic measure? Discuss how to develop an efficient (maybe approximate) methods to compute a query like: find those Engineering students whose final score is within top 10% in class in at least 80% of the CS courses that he or she has taken?
3. Is it a good idea to merge this data warehouse and the current university's gradebook database system together into one big data management/analysis system? Why?

Question 4:

A location-based social networking website which provides check-in services hires you to help them build a data warehouse.

Users of this service can "check-in" at venues using mobile device applications by running the applications and selecting from a list of venues that the application locates nearby. Also, users can "add" each other as "friends". The website also has sufficient information about venues, including address, GPS location, and category of the venue (e.g., a Japanese restaurant), and users tend to provide their personal information to the website when they register.

1. Design a data warehouse that may facilitate effective on-line analytical processing for this website (provide both schema and measures, also explain why).

2. Check-in data collected from the website and mobile applications are noisy. Besides network and device errors, are there any other reasons which might cause noises in this data set? For the reason you come up with, discuss a method that can clean-up check-in data effectively in the data warehouse.

3. One may like to performance on-line analytical processing to the checks-in data at different venues by month, by cities and by categories (Italian or Japanese, etc.). How can this be done efficiently in the data warehouse?

4. Hackers create fake profiles on this website. They are using bots to manipulate fake profiles, generate fake check-in data and try to add everyone as their friends (yes this is a common problem for many social network websites, and no, I am not telling you how to write bots). Although bots are trying to mimic real users, they still behave differently, e.g., they check-in at random places (Chicago this minute, Las Vegas next minute), they check-in way too often than real users, and their social network structures are usually very large but also very sparse (your friends on facebook tend to form communities but bots don't do that). Discuss possible solutions on how to identify fake profiles (bots) in your data warehouse.

Reference no: EM13536180

Questions Cloud

Explain liquid phosphorus trichloride is added to water : When liquid phosphorus trichloride is added to water, it reacts to form aqueous phosphorous acid, H3PO3(aq), and aqueous hydrochloric acid
Determine the dollar amount that will be reported for land : What dollar amount will the land be shown in the financial statements and determine the dollar amount that will be reported for land that is shown in the financial statement.
Explain what would the volume of the evaporated water : A tablespoon of water has about 30.0 grams of water in it. If this were to be heated to 300 Farenheit the water evaporates. What would the volume of the evaporated water be if it were collected at 29 psi? (14.7 psi = 1 atm)
Estimate the initial gravitational energy of the box : A box starts out at the top of a frictionless ramp, then slides down. The ramp has a heighth=3meters and a slope of5degrees with respect to the ground. What is the initial gravitational energy of the box at the top of the ramp
Analyze the class performances : Design a data warehouse for a university's gradebook data to analyze the class performances. Suppose the data warehouse  consisting of the following dimensions: department, semester, course, student, instructor, and gradebook; and a set of measures y..
Element of financial statements or an account : Determine the balance in the Retained Earnings account as of January 31, 2010 and comment on whether retained earnings is an element of financial statements or an account.
Explain compounds has electrovalent : Which of the following compound or compounds has electrovalent, covalent, co-ordinate as well as hydrogen bond. Explain the reasons for each making appropriate use of diagrams wherever necessary.
Determine how much power does it deliver to the clock : A grandfather clock is powered by the descent of a 4.40-kg weight. If the weight descends through a distance of 0.750 m in 3.00 days, how much power does it deliver to the clock
Classifying events as asset source : Classifying events as asset source - Receive cash from customers for services rendered

Reviews

Write a Review

Basic Computer Science Questions & Answers

  You are a junior staff member assigned to the chief

you are a junior staff member assigned to the chief information security officers ciso team in a major medical

  What does it mean if they are public or private

When creating classes you have member functions and member variables. What are they and what data types can you use for member variables? What does it mean if they are public or private

  Write the code for invoking a method named sendobject

Write the code for invoking a method named sendObject . There is one argument for this method which is of type Customer . Assume that there is a reference to an object of type Customer , in a variable called John_Doe . Use this reference as your a..

  The first part of the project is a survey

The first part of the project is a survey (with your own words and your own way of organizing topics) of one area you want to explore in Machine Learning

  How skill set an it sales manager or who think broader skill

How does the skill set of an IT Sales Manager differ from that of the technical manager? Who do you think has the broader skills set

  Characterize a page fault rate using pure demand paging

Consider system which uses pure demand paging: When a process first start execution, how would you characterize a page fault rate?

  Limited the mass appeal of computers

Early user interfaces were designed with little or no consideration for the end user. This was largely due to technical and hardware limitations. The poor interface design required a specific skill set for users and limited the mass appeal of ..

  The pc platform to the cell phone

Q3. Many games have been ported from the PC platform to the cell phone. Because of the screen size, however, they tend to be simpler and sometimes more abstract.

  Describe an algorithm for a turing machine

Describe an algorithm for a Turing machine which receives the integer n as input and proceeds to write the description of the n-th Turing machine from the standard enumeration on its tape.

  Create the needed constructors

In this assignment you will be implementing three classes -Date (you would reuse the Date class after implementing toString() and equals() method), Employee and EmployeeTest with main method to test Employee class.

  Design and test using logic works a dual-output function

Design and test using Logic Works a dual-output function to implement a full-adder in Sum-of-Products form. Show the transistor count on your schematic. For bonus points optimize the circuit to minimize transistor count.

  Design program that creates object productionworkers

Design an Employee class that has fields for the following pieces of information: Employee Name Employee Number Next, design a class named ProductionWorker that extends the Employee class

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd