Implementation of a time-series prediction method

Assignment Help Basic Statistics
Reference no: EM131031562

Detailed Question: Have to use random forest prediction by using R or python programming langauage.

In this project, you are asked to study the general topic of time-series data mining, and specifically for time-series data trend prediction. Note that this is not a new topic in the literature, as studies were already around even way before the official advent of data mining research (e.g., in the literature of control theory or pattern recognition). On the other hand, in the literature of data mining, time-series data mining is considered as one of the advanced topics and has many important and hot applications in the real-world such as e-commerce, stock analysis, and weather forecast.

The specific problem in this project is about the time-series data trend prediction. The specific application scenario is in e-commerce. You are given a real dataset obtained from a real-world e-commerce application where there were 1000 products and 31490 customers (i.e., buyers) who bought these products. Of these 1000 products there are 100 key products (popular products). Also these 1000 products are in 15 categories. The specific data are given in the seven tables and the specific details of these tables are given below. The time window of this dataset is in 118 days with data documentation for each day. Hence, the time unit is one day where the timeline goes from the 0-th day to the 117-th day (17 weeks less one day in total). Now you are asked to do the sale quantity prediction for the 100 key products for each day between the 118-th day and the 146-th day (29 days).

-buyer_basic_info.txt: the basic attribute information of the buyers; in particular, the column names of this table are "buyer_id", "registration_time", "seller_level", "buyer_level", "age", and "gender". If we do not know the gender of a buyer, we set this buyer's gender attribute as -1.

-buyer_historical_category15_quantity.txt: the consumption quantities in the 15 categories for the buyers; in particular, the column names of this table are "buyer_id", "consumption quantity in the 1st category", ..., and "consumption quantity in the 15th category". The 15 categories are the ones of the products the customers bought in this dataset.

-buyer_historical_category15_money.txt: the consumption amounts in the 15 categories for the buyers; in particular, the column names of this table are "buyer_id", "consumption amount in the 1st category", ..., and "consumption amount in the 15th category".

-product_features.txt: the basic attribute information of the products; in particular, the column names of this table are "product_id", "attribute_1", "attribute_2", and "original price".

-Key_product_IDs.txt: the key product IDs

-trade_info_training.txt: the trade information between the key products and the buyers from the 0-th day to the 117-th day; in particular, the column names of this table are "product_id", "buyer_id", "trade_time", "trade_quantity", and "trade_price".

-product_distribution_training_set.txt: there are 119 columns, where the 1-st column shows the "product_id" and the 2-nd to the 119-th columns show the "quantities" of the key products from the 0-th day to the 117-th day; for example, the element at the 5-th row and the 10-th column in this table shows the quantity of the 5-th product at the 8-th day.

students are asked to do the prediction for the overall sale quantity of the 100 key products for each day of the time window from the 118-th day to the 146-th day, and also for each key product for each day of the time window.

You are given 10 minutes for the presentation. In the presentation, you must give the following information:

-Explain conceptually what time-series data mining is about
-Showcase the specific problem and the specific method you have implemented or developed as a solution to the problem you are given
-Demonstrate your implementation results in the prediction

The second phase is for the coding part of the project and concerns with the implementation of a time-series prediction method that you either take from the literature or you have developed by yourself as the result of your research in the first phase. You may use any programming language to implement the method and you may also use any existing libraries.

The first two phases begin at the beginning of the semester, and the due date of turning in the coding results is 24 April . Please make sure to follow the format requirement as the text output file specified here. The file puts each prediction as one line where the first prediction is for the overall prediction and each subsequent prediction is for a key product. Each prediction output line begins with the key product id where the overall prediction id is 0. There is a space between the prediction and the key product id. Then there is a space between a pair of the predictions of two neighboring days. The prediction lines in the output file begin with the first line as the overall prediction where the product id is 0, and then the first key product prediction with the smallest product id (i.e., 1), all the way to the last line as the prediction for the last key product prediction (i.e., id = 964). Also note that for undergrad students your output file only has one line prediction just for the overall prediction beginning with the product id = 0.

What you need to turn in: you shall turn in a zipped package containing the source code of your implementation of the prediction method with appropriate comments and documentations in the code, a README file to explain how to compile and run your code under what specific environment, and a text file containing the output matrix following exactly the format requirement stated above.

Verified Expert

Provides a clear workings on exponential time series, future predictions. Comparison of various forecasting techniques and selecting the appropriate technique for model building was done through R program

Reference no: EM131031562

Questions Cloud

Determine how many pounds you need to gain or lose to fit : Convert the previous formula to English units such that the weight is in pounds and the height in inches. Also, calculate your own BMI, and if it is not in the healthy range, determine how many pounds (or kg) you need to gain or lose to be fit.
Discuss what your plan of care would be : Discuss what your plan of care would be, including differential diagnoses and diagnostic exams for patients that present with the following conditions.
Determine additional assessment data that may be needed : These tools can help you determine additional assessment data that may be needed, develop a list of differential diagnoses, list of diagnostic exams that may be needed and also help with your plan of care.
Determine how long it will take him to lose 5 kg : A 100-kg man decides to lose 5 kg without cutting down his intake of 3000 Calories a day.
Implementation of a time-series prediction method : Have to use random forest prediction by using R or python programming langauage - The specific problem in this project is about the time-series data trend prediction.
Determine the amount of extra heat that must be supplied : Determine the amount of extra heat that must be supplied to the gas in the cylinder which is maintained at constant pressure to achieve this result. Assume the molar mass of the gas is 25.
Describe the evolution of managed care and the forces : Summarize at least one (1) managed care trend managed Medicaid and appraise how this trend will affect managed care's overall goal of managing costs, increasing access, and ensuring quality in the delivery of healthcare.
Determine how long it will take for the bmi of this person : Use the data in the text for calories and take the metabolizable energy content of 1 kg of body fat to be 33,100 kJ.
What is the sample correlation coefficient : A study wants to look at the correlation between sugar consumption and the development of cavities. What is the sample correlation coefficient? What type of correlation does this represent

Reviews

Write a Review

 

Basic Statistics Questions & Answers

  Is the mean of the sample larger population of fish

A certain pond to check if they are significantly larger than the mean stated above. Is the mean of the sample larger population of this fish? use either L=1% or L=5%.

  Definition of a low-mileage car

One definition of a low-mileage car is one that is 4 years old and has been driven less than 45,000 miles. What percent of the cars returned are considered low-mileage?

  Determine the appropriate parameters for model

De?ne an appropriate continuous-time Markov chain for a population of such organisms and determine the appropriate parameters for this model.

  Identify the type of sampling used random systematic

identify the type of sampling used random systematic convenience stratified or cluster. to estimate the percentage of

  Simplify the functional expressions using boolean algebra

Simplify the following functional expressions using Boolean algebra and its identities. List the identity used at each step.

  Find the probability that the entire structural system

In any given hurricane event, the probability that any subcomponent is unstable is 0.006. The entire system is considered unstable if any one subcomponent is unstable. Find the probability that the entire structural system is stable (a) in any giv..

  The growth hormone increases the milk production

Is there evidence that the growth hormone increases the milk production?

  If the manufacturers claim is true what is the probability

a manufacturer claims that its wood treatment protects outdoor wood fences a mean of 2601 days with a standard

  Find the probabilty that three are seventy-w bulbs

A box contains four 40-W bulbs, five 6-W bulbs and six 70-W bulbs, if three bulbs are selected without replacement randomly. What is the probabilty that three are 70-W bulbs?

  Do samples give sufficient evidence to indicate new product

At the 0.05 level of significance, do the samples provide sufficient evidence to indicate that on the average, fewer consumers prefer a new product when the spokesperson is male?

  Sequence of distinct points converges

Let X be an infinite set. Let T consist of the empty set and all complements of finite subsets of X . Show that T is a topology in which every singleton {x } is closed, but T is not metrizable. Hint: A sequence of distinct points converges to ever..

  Aptitude test scores of two groups of freshman students

The mean score for freshman on an aptitude test at a certain college is 800 with a standard deviation of 50. That is the probability that two groups of students selected at random, consisting of 36 and 49 students, respectively, will differ in the..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd