Create a graph by coloring each test set point orange

Assignment Help Computer Engineering
Reference no: EM131889357

Assignment

1. R Project for building a Naive Bayes Model: Continuing the Tweeter theme, in the accompa- nying input file Q2in.csv" the frequencies of certain words (listed in the first row) is given. Each line corresponds to the data gathered for a business day. Also, under the column SP500 it is indicated whether on that day the S&P500 stock was up or down. This data is only a simulation.

a) download the data to your computer and read it into a data frame called Q2dat.

b) print the head and tail to make sure the data is read correctly.

c) Using the sample function, select 80% of the data and assign it to a data frame called Q2datTrain. The remainder of the data should be assigned to a data frame called Q2datTest.

If you have not installed it already then install and load the naivebayes package.

d) Run the naive bayes function assuming each column is normally distributed. Use the Q2datTrain as your training set. Call the resulting model NBmodelNormal. Next, predict the whether the stock market will go up or down by applying it to the Q2datTest. Find the empirical error rate of the predicted values.

e) Repeat part 2d) but this time make no assumption about the distribution of the data. Call this model NBmodelKern. Compute the error rate and compare it with the normal results in 2d).

f) Repeat part 2d) but this time turn each column into a non-numerical factor. Specifically, turn each column (other than the response column SP500 of course) of both Q2datTrain and Q2datTest data frames into a binary variable as follows: If the number in a column is larger than or equal to the median of that column then replace the number with 1, otherwise replace it with 0. For instance, suppose the column under Buy has only five values (394,407,398,409, 373). Then the median is 398, so this sequence of data is replaced by (0,1,1,1,0). Use the R functions sapply and median to accomplish this. Build a naive Bayes model based on frequency of zeros and ones in this table (don't forget to make the zeros and ones into a factor.) Call this model NBmodelBinary. Test your results on the test data and report the error rate. Compare this error rate to the two previous cases.

2. R project for building a k-NN model: All answers should be output by your R script including the last question. A simulated data set called blueOrangeIn.csv accompanies this question. The data set has two continuous feature variables X1,X2 and a categorical response variable Y with two values of ‘‘Blue" and ‘‘Orange''. The original data is drawn from random points in a 3 × 3 chessboard colored alternatively blue and orange, with some noise and inaccuracy injected in it.

a) Using the read.csv command read this data into a data frame called Q3dat. Print the head and tail of the Q3dat data frame to make sure it is read correctly.

b) Using the kknn package build six models for k = 1, 10, 100, 1000, 2500, 3500. For test set create a 40×40 grid by subdividing the range of X1 and X2 into 40 equally spaced intervals. The 1600 new points will form the test set which also should be used for graphing the results in the following questions. (Suggestion: Start with only a small portion of the dat, say a random subset of 500 rows. Write our program for that small set. Once you are sure it works, then run it on the full set. Each run on the full set may take several minute.)

c) For each of the value of k mentioned in question 1b) create a graph by coloring each test set point orange or blue based on the predicted value. Also draw the boundary between orange and blue points. You may use the knn2.r file posted as your template.

d) Use validation set technique to find the near optimal k for the k-NN method for this data. To this end use values of k = 1 to k = 901 with jumps of 100, that is, test k = 1,101,201,...,901. For each k run 10 experiments where you would choose 1000 random items from the Q3dat data frame as your training set, and the remaining items as your test set. For each run, build a k-NN model, and test it on the test set, and find number of misclassified orange items, misclassified blue items and the total number of misclassified items. Take the average over 20 experiments, and divide, respectively by the number of orange items, the number of blue items, and the total number of items in the test set. Collect these three items, along with the values of k in vectors. When done find the best k, that is the one resulting in lowest error rate. Also on the same graphics panel, graph orange error rate (proportion of orange points incorrectly classified as blue among all orange points), blue error rate (proportion of blue points incorrectly misclassified as orange) and total error rate against values of k. You may use the file ‘‘knnValidation.r'' as your template.

e) On the three curves of the last problem, one should be increasing with k, one decreasing with k and one that should roughly start high for small k, than hits a minimum and is a roughly flat, and then roughly start moving up again. For each of the first two classes explain why you observe a monotonic increasing and decreasing curve.

Reference no: EM131889357

Questions Cloud

Standard gamble and time trade off techniques : Please discuss the concepts of the Standard Gamble and Time Trade Off techniques. Explain how they are used to evaluate quality of life.
Identify the timing of the firm abandonment strategy : identify the timing of the firm’s abandonment strategy, namely, when would be optimal for Pharmagen to abandon this project?
Examine the different conflict situations : You and your colleagues examined three different conflict situations warranting disciplinary action: Chronic tardiness and absenteeism because the employee.
Company capital structure weights on market value basis : Vedder, Inc., has 5 million shares of common stock outstanding. What are the company’s capital structure weights on a market value basis?
Create a graph by coloring each test set point orange : Create a graph by coloring each test set point orange or blue based on the predicted value. Also draw the boundary between orange and blue points.
How much gain or loss must tillich recognize : How much gain or loss must Tillich recognize when she sells the preferred stock? (Ignore the implications of Sec. 306.)
What is a research process : Why is research often considered as a process - and not only as an outcome? And, what is a research process?
Organic brand ambassador vs a brand program : Discuss the differences between an organic brand ambassador vs. a brand program.
What is value of call option-using risk-free approach : What is the value of the call option, using the risk-free approach, if the strike price is $55 per share?

Reviews

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd