Write a java program called salesinfominer

Assignment Help JAVA Programming
Reference no: EM13129935

Mining information from sales data is a common task in both traditional and online stores. Past sales information can guide purchasing and enable the company to make recommendations to customers about what they may like to buy.

Part 1 performs two data mining tasks. First, it determines the ranking of products according to how many of each product has been sold. Second, it also determines the five other products most commonly bought with each product.

While these tasks are not particularly complex, performing these tasks on large data sets is computationally expensive and the use of data structures and algorithms that facilitate fast searching and sorting is desirable.

Requirements

Write a Java program called SalesInfoMiner that reads in a file of products, stored in text format (2 lines per entry). The program must then read in information about past sales transactions and output all the products to a new file, sorted by product description, along with their sales rank and the five other products with which they were most commonly purchased.

The program accepts four command line arguments and must not expect any user interaction from the keyboard. The first parameter to the program is the name of the product list to read in; the second parameter is the file containing information about past transactions. The third
parameter is the name of the file to which the ordered product list will be written, and the fourth parameter is the name of the file to which any error or warning messages will be output.

Product File

This file contains data for one or more products

-  Each product occupies two lines
-  Line 1 contains the product id, which is a string of 13 digits
-  Product IDs are unique
-  Line 2 is the description of the product, which is not an empty string
-  You can assume that the file is not empty.

Thus, the data for a product has the format shown below:

<ProductID> (A thirteen digit string)
<Product Description> (A text string)

For example:

1234567890123
Milk - full cream 600ml
9832983102938
Bread - wholemeal loaf
1299128391283
Eggs, Extra Large, Barn laid

Sales Transaction File

This file contains data for one or more sale transactions

-  Each sale transaction has 2 + n lines, where n is the number of items in this sale transaction (as specified on line 2 of the sale transaction)

-  Line 1 has the format Receipt number: <receipt number> where <receipt number> is a positive integer
-  Receipt numbers are unique
-  Line 2 has the format Number of items: <number of items> where <number of items> is a positive integer
-  Each sale item occupies a line, which contains the product id of the item
-  All product IDs in a sale transaction are distinct and must exist in the product file
-  You can assume that the file is not empty

Thus, data for a sale has the format shown below:

Receipt number: <ReceiptNumber> (<ReceiptNumber> is an integer)
Number of items: <NumberItems> (<NumberItems> is an integer)
<ProductID>
...

For example:

Receipt number: 1
Number of items: 3
1234567890123
9832983102938
1299128391283
Receipt number: 2
Number of items: 4
9832983102938
1234567890123
1299128391283
1234567890123

Product Output File

-  For each product, the program determines its sale transaction count, i.e. number of transactions that the product appears in.
-  The program lists the products in an output file, sorted by sale transaction count, in decreasing order.
-  When there is a tie in the sale transaction count, the products are listed in alphabetical order of their descriptions (see example below)
-  For each product, the program also lists the top 5 products that are most frequently bought with this product.
-  The "top 5" list may have less than 5 products. It can also have for than 5 due to ties.

Whenever there is a tie in the frequency, the products are listed in alphabetical order of their descriptions (see example below)

-  The format is as shown below

Product: <Product Description>
ID: <ProductID>
Sales Count: <SalesCount>
This product was purchased most often with:
1: (<ProductID>) - <Product Description>
2: (<ProductID>) - <Product Description>
3: (<ProductID>) - <Product Description>
4: (<ProductID>) - <Product Description>
5: (<ProductID>) - <Product Description>

For example:

Product: Milk - full cream 600ml
ID: 1234567890123
Sales Count: 82
This product was purchased most often with:
1: (1299128391283) Eggs, Extra Large, Barn laid [8]
2: (9832983102938) Bread - wholemeal loaf [5]
3: (3289742398473) Apples - Golden Delicious [4]
4: (9348029348934) Chocolate, Dark, 200g [4]
5: (3024823094839) Noodles, Udon - Fresh 250g [4]
6: (0002011106917) Nuts, almonds, dry roasted [4]
Product: Pork - cured, ham, separable fat, boneless
ID: 0000140956153
Sales Count: 82
This product was purchased most often with:
. . .

"Sales count" is the number of transactions that a product appears in. The numbers in square brackets are the number of transactions a product appears with another product

Calculating the most frequent items purchased with a product: With every product, we need to output the 5 other items with which it was most frequently purchased. These other items must be listed in order of frequency. The item listed at position 1 would have been bought more frequently with the item than the item at position 2, and so on. Though we are interested in the "top 5", the list may have more than five items due to ties. In that case, more than five items are to be included and items involved in a tie are to be ordered by product description. If a product has not been bought with 5 other items, then less than 5 items are output.

Error File
In the sales transaction file, a sales transaction "record" is considered to include all the lines from the line starting with "Receipt number"
? Up to, but not included, the next line starting with "Receipt number:"
? Or up to the end of the file.
When a particular line does not conform, as expected, to the description given above,
? the error should be detected
? an error message is written to the error file (the fourth command-line argument), which indicate
o the type of the error
o the file in which the error occurs
o and the line number of the line on which the error occurs
? no result (i.e. listing of products as described in Section B) is output
? and the program terminates

Based on the specification in Section A, we can have the following types of data error (in addition to errors such as file does not exists, etc.)

For product file:

1. Product ID is invalid - not consisting of 13 digits
2. Product description is empty/missing
3. Product ID is not unique

For sales transaction file:
1. Line 1 of a sale transaction does not start with "Receipt number:"
2. Receipt number (on line 1) is not a positive integer
3. Line 2 of a sale transaction does not start with "Number of items:"
4. Number of items (on line 2) is not a positive integer
5. Product ID does not consists of 13 digits
6. Product ID in a sale transaction has duplicates
7. Product ID does not exist in the product file
8. There are not enough product ID's listed for a particular transaction (i.e. less than the number listed on line 2)

Note: If there are more product ID's listed than expected, we could detect it with the error of type 1 above.

The error messages will have the following format:

ERROR: <Error Name>
<Error Description>
<Solution>

For example:
ERROR: Invalid Product ID
Product 8763726273678 in receipt 4 is not in the product file
Product discarded, processing of transaction continued


You can assume that in the sales transaction file
o There are no blank lines
o There are no duplicated receipt numbers

Restriction on the Use of Classes in the Java Collection Framework

Except for ArrayList and Linkedlist, you are not permitted to use any other classes in the Java Collection Framework.

Testing Your Program

You will test your program against some large data sets, which will be posted on LMS. When testing your program against large data set, please do not run your program on the latcs6 server, use the simula server instead. For some large data sets you may need to request, especially when you test your program on Windows, more memory by using the -Xmx option of the java command.

Execution of the Program

The program will be run with the following command:

java SalesInfoMiner <products> <transactions> <output-file> <error-log>

If an incorrect number of arguments is supplied, then the program must output an error message explaining the arguments required before terminating.

Important Notes:

1. You must not submit your programs in a zip file, or similarly compressed file.

2. Your classes must not belong to a package (i.e. having the package statement at the beginning of the class.

3. If your program does not compile, you must write so in the report.

Failing to comply with any of these requirements will result in 10 marks being deducted from your assignment.

Part 2

(a) In terms of the big-oh notation, what is the complexity of the gnome sort for the average case? Justify your answer. The justification can be based on approximate calculations.

Hint: Initially m (the "marker") is at position 1. After a number of comparisons (and swapping), m is at position 2 for the first time. Then after a number of comparisons, m is at position 2 for the first time. Now, suppose m is at position i, approximately how many comparisons would it take for it to advance to position i + 1 for the first time?

(b) Using the classes GnomeSort and RandomIntegerArray given below, write a program to generate some numerical data to verify your answer for the gnome sort's complexity.

public class GnomeSort
{
/* Performs the gnome sort and returns the number of comparisons
*/
public static long sort(int [] a)
{
int m = 1; // m for ‘marker', the place from which we
// compare the element with the one on the
// left to see if we should move left or right

long comparisons = 0; // to count the number of comparisons
// for the sorting
while(true)
{
if (m == a.length) // m is past the last element,
{ // the sorting is done
break;
}
else if (m == 0) // m is at the beginning of the array,
{ // move right
m = 1;
}
else if (a[m] >= a[m-1]) // the element at m is greater than or
{ // equal to the element on the left,
m++; // move right
comparisons++;
}
else // the element at m is < the element
{ // on the left, swap the two elements
int temp = a[m]; // and move to the left
a[m] = a[m-1];
a[m-1] = temp;
m--;

comparisons++;
}
}

return comparisons;
}

public class RandomIntegerArray
{
/* Return an array of size n, of random integers of value
btween 0 and n, inclusively
*/
public static int [] getRandomIntegerArray(int n)
{
int [] a = new int[n];
for(int i = 0; i < n; i++)
{
a[i] = (int) (Math.random() * (n+1));
}
return a;
}

Reference no: EM13129935

Questions Cloud

Current and quick ratios : SDJ, Inc., has net working capital of $3,710, current liabilities of $5,500, and inventory of $4,440. What is the current ratio?
Compute the expectations and variances of x and y : One of the 4 bus drivers is also randomly selected. Let Y denote the number of students on his bus. Compute the expectations and variances of X and Y?
Illustrate what are the variable expenses per unit : The company plans to sell 17,000 bookbags this year. Illustrate what are the variable expenses per unit?
Investing for future returns : Suppose you are committed to owning a $199,000 Ferrari. If you believe your mutual fund can achieve a 12 percent annual rate of return and you want to buy the car in 9 years on the day you turn 30, how much must you invest today?
Write a java program called salesinfominer : Write a Java program called SalesInfoMiner that reads in a file of products, stored in text format. The program must then read in information about past sales transactions and output all the products to a new file, sorted by product description,..
Problem regarding quarterly sales forecast : Adams Sporting Goods sells bicycles throughout the southeastern United States. The following data were taken from the most recent quarterly sales forecast:
Discuss possible factors that contribute to these difference : The process by which accounting standards has evolved over time worldwide differs significantly from country to country based on many factors.
Test whether sample standard deviation differs significantly : Eight randomly selected plants have heights 172, 156, 154, 163, 170, 169, 170 and 164 cms. Test whether the sample standard deviation differs significantly?
Future value of coin collection : Your coin collection contains 53 1952 silver dollars. If your grandparents purchased them for their face value when they were new, how much will your collection be worth when you retire in 2063, assuming they appreciate at a 4.4 percent annual rat..

Reviews

Write a Review

JAVA Programming Questions & Answers

  Java applet to find how much federal tax we need to pay

Write a java applet to determine how much federal tax we need to pay assuming the tax rate is 12%.  Ask the taxable income.

  Develop parent lock method asks for four digit password

Develop the parent lock method (turns lock on/off) which asks for 4 digit password. Navigating to any channel which is in range 99-110 must ask for password when parent lock is on.

  Sorted list adt and the binary search tree adt

Explain the differences between our specifications of the Sorted List ADT and the Binary Search Tree ADT.

  Write an application that uses string method region

Write an application that uses String method region - Matches to compare two strings input by the user. The application should input the number of characters to be compared and the starting index of the comparison.

  Java program to read line of text which ends with period

Write down the java program which will read the line of text which ends with the period, which serves as sentinel value. Show all the letters which occur in the text.

  Java application to declare and create a vector

Write a java application that perform the following task: declare and create a vector with initial size of 10, increment of 2

  Write jvm program which accepts rpn expression

Write down the IJVM program which accepts RPN expression, with each operand and operator entered line by line. As each number (operand) is entered, push it on stack.

  Technical community blog

Write a blog article for a coding/technical community blog

  Explain how the loop displaying the menu

Explain how the loop displaying the menu is exited, what value does menuSel have when the program finishes? Describe what happens next when menuSel gets this value.

  Define private instance variable to hold boolean value

Write a Java Enumeration "LetterGrade" that represents letter grades A through F including plus and minus grades. Define a private instance variable to hold a boolean value.

  Determine the java application on web

Determine the Java application on Web and explain how program structure functions. Explain the application in as much detail as possible.

  A jsp expression can contain any java expression

A JSP expression can contain any Java expression that evaluates to a

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd