Devise a book recommendation system

Assignment Help Applied Statistics
Reference no: EM131305434

There are two files uploaded to Blackboard - BX-Books.csv and BX_Book-Ratings.csv. The former contains information about a variety of books, and the latter file contains several hundred thousand book ratings from the Book Crossing Website.

Use R to devise a book recommendation system for the data uploaded to Blackboard. In particular, develop a system that can recommend up to three books for an arbitrary user that can be entered into R after sourcing your code. Develop such a system using both a:

(a) User-based collaborative filtering approach. Use Euclidean, Manhattan, correlational, and cosine similarity distance measures. What problems (if any) do you run into?

(b) Item-based collaborative filtering approach. Use an adjusted cosine similarity approach as discussed in class. How does this approach compare to the user-based approach?

To load the data into R you will need to use the read.csv function. (i.e. read.csv(filename,header=TRUE)). Please type in ?read.csv" to the R console to see the syntax if you would like further info regarding the function's syntax.

Make your programs functions, where the names of users, can be entered into the R prompt.

(c) What are some general problems with both approaches? Conceptually speaking, how can these issues be ameliorated?

Hints:

- There is some flexibility with respect to how you construct the details of your recommendation system beyond your nearest neighbor algorithm. For example, you may use more than one nearest neighbor to make your algorithm better and you can weight the distances appropriately as discussed in class. Please feel free to discuss what your code is doing in a Word document or PDF and submit that along with your assignment. This will make it easier for the grader to understand the logic behind your algorithm.

- Make sure your program ignores zero values for the purposes of computing distances. Otherwise your recommendation system will be influenced by unrated books

- Use an estimated rating of above 5 as a threshold for the recommendation system.

- If your model cannot provide any recommendations for a particular individual, then please have it say so. You can discuss this in (c).

Reference no: EM131305434

Questions Cloud

Discuss the characteristics of a horizontal program : The characteristics of a horizontal program lends itself to targeting maternal and child health. A necessary focus of healthcare for refugee populations in the Democratic Republic of the Congo (DRC) would be prenatal care as DRC has one of the hig..
Equilibrium price and equilibrium quantity of bonds : Explain what will happen to the equilibrium price and equilibrium quantity of bonds in each of the following situations.
Why interest rate on greek government bonds was increasing : Explain what the article means by "uncertainty over Greece's ability to fund itself."- What does it mean to say that Greek bonds were "under increasing pressure"?
What racial distinctions exist in regards to stature explain : Steckel (1995) uses anthropometric analysis to better understand well-being during a period of time before GDP was formally recorded. What racial distinctions exist in regards to stature? Explain
Devise a book recommendation system : DATS 6103: Introduction to Data Mining - devise a book recommendation system for the data uploaded to Blackboard. In particular, develop a system that can recommend up to three books for an arbitrary user that can be entered into R after sourcing y..
Why might deflation be good news to investors who hold bonds : In the article referenced in Solved Problem, Consumer Reports also advised, "Bonds could do well in 2010 if deflation reigns."- What is deflation?- Why might deflation be good news to investors who hold bonds?
Why would longer term bonds be most at risk : Longer-term bonds are most at risk." What effect would an increase in expected inflation have on bond prices? Why would longer-term bonds be most at risk?
Discuss about the hiv-aids-malaria-anemia : Discuss about the causes of maternal mortality include severe bleeding, obstructed labor, infection, hypertensive disorders of pregnancy, HIV/AIDS, malaria, anemia, and unsafe abortion.
What will be the effect on bond prices and interest rates : What will be the effect on bond prices and interest rates?- Who is likely to have gained the most: investors who bought long-term bonds in 2010 or investors who sold them? Briefly explain.

Reviews

Write a Review

Applied Statistics Questions & Answers

  Find the probability that a randomly selected flight

Write the formula for the probability curve of x. Graph the probability curve of x. Find P(125 x 135). Find the probability that a randomly selected flight between the two cities will be at least five minutes late.

  Which of sales invoices should be included in random sample

Determine which 50 of the 977 sales invoices should be included in the random sample. Note: There are many possible answers to this exercise.

  Explain and give examples of how the particular outcomes

Explain and give examples of how the particular outcomes of a study can suggest if a particular threat is likely to have been present.

  What proportion of the fire claims are fraudulent

Construct a contingency table summarizing the claims data. Use the pairs of events FIRE and FIRE, FRAUD and FRAUD. What proportion of the fire claims are fraudulent?

  What is a three-period moving average forecast

What is a three-period moving average forecast for the month of July? What is the slope of the regression equation developed when the Sales data are used to predict the Pounds?

  Create names for each cluster and describe their properties

The data file contains data on 325 metropolitan cities in the United States. Please use hierarchical cluster analysis in R to find appropriate number of clusters. Create names for each cluster and describe their properties

  The mean annual income of certified welders

The mean annual income of certified welders is normally distributed with a mean of $50,000 and a population standard deviation of $2,000. The ship building association wishes to find out whether their welders earn more or less than $50,000 annually. ..

  What is the standard error of the mean

a.) What is the standard error of the mean? (round to two decimal places as needed)b.) How would the standard error change if the sample size was 5 instead of 20 with the same sample standard deviation? Select the correct choice below and fill in any..

  Does the data in this sample support the claim

Does the data in this sample support the claim of that the proportion of filled orders from the Tai Pi plant is greater than the proportion of filled order from the Seoul Plant?

  An absence of relationship between variables

A formal statement that there is an absence of relationship between variables when tested by a researcher is called what?1.Null hypothesis?

  Find the pmf for x

Suppose Ana has a pair of dice (the traditional six-sided kind). Let X = the difference of the largest minus the smallest number showing on the dice. Find the PMF for X.

  Do the warm miami temperatures help the punter for the miami

Do the warm Miami temperatures help the punter for the Miami Dolphins kick the ball farther than the punter for the Green Bay Packers, who typically punts in frigid Wisconsin temperatures? That is, can a warm football in general be punted farther tha..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd