Implementation of a time-series prediction method

Assignment Help Basic Statistics
Reference no: EM131031562

Detailed Question: Have to use random forest prediction by using R or python programming langauage.

In this project, you are asked to study the general topic of time-series data mining, and specifically for time-series data trend prediction. Note that this is not a new topic in the literature, as studies were already around even way before the official advent of data mining research (e.g., in the literature of control theory or pattern recognition). On the other hand, in the literature of data mining, time-series data mining is considered as one of the advanced topics and has many important and hot applications in the real-world such as e-commerce, stock analysis, and weather forecast.

The specific problem in this project is about the time-series data trend prediction. The specific application scenario is in e-commerce. You are given a real dataset obtained from a real-world e-commerce application where there were 1000 products and 31490 customers (i.e., buyers) who bought these products. Of these 1000 products there are 100 key products (popular products). Also these 1000 products are in 15 categories. The specific data are given in the seven tables and the specific details of these tables are given below. The time window of this dataset is in 118 days with data documentation for each day. Hence, the time unit is one day where the timeline goes from the 0-th day to the 117-th day (17 weeks less one day in total). Now you are asked to do the sale quantity prediction for the 100 key products for each day between the 118-th day and the 146-th day (29 days).

-buyer_basic_info.txt: the basic attribute information of the buyers; in particular, the column names of this table are "buyer_id", "registration_time", "seller_level", "buyer_level", "age", and "gender". If we do not know the gender of a buyer, we set this buyer's gender attribute as -1.

-buyer_historical_category15_quantity.txt: the consumption quantities in the 15 categories for the buyers; in particular, the column names of this table are "buyer_id", "consumption quantity in the 1st category", ..., and "consumption quantity in the 15th category". The 15 categories are the ones of the products the customers bought in this dataset.

-buyer_historical_category15_money.txt: the consumption amounts in the 15 categories for the buyers; in particular, the column names of this table are "buyer_id", "consumption amount in the 1st category", ..., and "consumption amount in the 15th category".

-product_features.txt: the basic attribute information of the products; in particular, the column names of this table are "product_id", "attribute_1", "attribute_2", and "original price".

-Key_product_IDs.txt: the key product IDs

-trade_info_training.txt: the trade information between the key products and the buyers from the 0-th day to the 117-th day; in particular, the column names of this table are "product_id", "buyer_id", "trade_time", "trade_quantity", and "trade_price".

-product_distribution_training_set.txt: there are 119 columns, where the 1-st column shows the "product_id" and the 2-nd to the 119-th columns show the "quantities" of the key products from the 0-th day to the 117-th day; for example, the element at the 5-th row and the 10-th column in this table shows the quantity of the 5-th product at the 8-th day.

students are asked to do the prediction for the overall sale quantity of the 100 key products for each day of the time window from the 118-th day to the 146-th day, and also for each key product for each day of the time window.

You are given 10 minutes for the presentation. In the presentation, you must give the following information:

-Explain conceptually what time-series data mining is about
-Showcase the specific problem and the specific method you have implemented or developed as a solution to the problem you are given
-Demonstrate your implementation results in the prediction

The second phase is for the coding part of the project and concerns with the implementation of a time-series prediction method that you either take from the literature or you have developed by yourself as the result of your research in the first phase. You may use any programming language to implement the method and you may also use any existing libraries.

The first two phases begin at the beginning of the semester, and the due date of turning in the coding results is 24 April . Please make sure to follow the format requirement as the text output file specified here. The file puts each prediction as one line where the first prediction is for the overall prediction and each subsequent prediction is for a key product. Each prediction output line begins with the key product id where the overall prediction id is 0. There is a space between the prediction and the key product id. Then there is a space between a pair of the predictions of two neighboring days. The prediction lines in the output file begin with the first line as the overall prediction where the product id is 0, and then the first key product prediction with the smallest product id (i.e., 1), all the way to the last line as the prediction for the last key product prediction (i.e., id = 964). Also note that for undergrad students your output file only has one line prediction just for the overall prediction beginning with the product id = 0.

What you need to turn in: you shall turn in a zipped package containing the source code of your implementation of the prediction method with appropriate comments and documentations in the code, a README file to explain how to compile and run your code under what specific environment, and a text file containing the output matrix following exactly the format requirement stated above.

Verified Expert

Provides a clear workings on exponential time series, future predictions. Comparison of various forecasting techniques and selecting the appropriate technique for model building was done through R program

Reference no: EM131031562

Questions Cloud

Determine how many pounds you need to gain or lose to fit : Convert the previous formula to English units such that the weight is in pounds and the height in inches. Also, calculate your own BMI, and if it is not in the healthy range, determine how many pounds (or kg) you need to gain or lose to be fit.
Discuss what your plan of care would be : Discuss what your plan of care would be, including differential diagnoses and diagnostic exams for patients that present with the following conditions.
Determine additional assessment data that may be needed : These tools can help you determine additional assessment data that may be needed, develop a list of differential diagnoses, list of diagnostic exams that may be needed and also help with your plan of care.
Determine how long it will take him to lose 5 kg : A 100-kg man decides to lose 5 kg without cutting down his intake of 3000 Calories a day.
Implementation of a time-series prediction method : Have to use random forest prediction by using R or python programming langauage - The specific problem in this project is about the time-series data trend prediction.
Determine the amount of extra heat that must be supplied : Determine the amount of extra heat that must be supplied to the gas in the cylinder which is maintained at constant pressure to achieve this result. Assume the molar mass of the gas is 25.
Describe the evolution of managed care and the forces : Summarize at least one (1) managed care trend managed Medicaid and appraise how this trend will affect managed care's overall goal of managing costs, increasing access, and ensuring quality in the delivery of healthcare.
Determine how long it will take for the bmi of this person : Use the data in the text for calories and take the metabolizable energy content of 1 kg of body fat to be 33,100 kJ.
What is the sample correlation coefficient : A study wants to look at the correlation between sugar consumption and the development of cavities. What is the sample correlation coefficient? What type of correlation does this represent

Reviews

Write a Review

Basic Statistics Questions & Answers

  Determining standard deviation for number of actual voters

Determine the mean and standard deviation for number of actual voters in groups of 1002.

  You wish to test the claim that u gt 11 at a level of

you wish to test the claim that u gt 11 at a level of significance of ? 0.05 and are given sample statistics n 50

  A ball is dropped and lands randomly on a square with

a ball is dropped and lands randomly on a square with length 1 meter.a find the expected distance from the nearest edge

  Assume that the distribution of score is normal with the

assume that the distribution of score is normal with the mean and sd given. what are the first and third q of the

  Recent difficult economic times have caused an increase in

recent difficult economic times have caused an increase in the foreclosure rate of home mortgages. statistics from the

  Find probability that sample mean is between a range

Standard deviation of 10 minutes. A random sample of 16 cars is selected. What is the probability that the sample mean is between 45 and 52 minutes.

  Probability negative return exceeds previous negative return

What is the probability that next years annual maximum negative return exceeds all previous negative returns? In other words, what is the probability that next years maximum negative return is a new record?

  How many different samples are possible environmental agency

A representative of the Environmental Protection Agency (EPA) wants to select samples from 10 landfills. The director has 15 landfills from which she can collect samples. How many different samples are possible?

  Let x the number of flaws on the surface of a randomly

let x the number of flaws on the surface of a randomly selected boiler of a certain type have a poisson distribution

  In basketball a players free throws percentage ft is

in basketball a players free throws percentage ft is expressed as a number between 0 and 1 which represents the

  Discuss the evolution of that branch over the millennia

Discuss the differences between any two rival philosophies of civil engineering systems design, citing examples from past civilizations. Include a discussion of your preferred philosophy in the context of civil engineering systems resilience and s..

  Explain why it is possible to observe a strong relationship

explain why it is possible to observe a strong relationship between 2 variables based on the value of eta-squared

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd