Briefly describe the modeling problem facing lbs

Assignment Help Database Management System
Reference no: EM13985015

Question 1:

LBS is a management investment firmmanaging about $600 million in assets, primarily in stocks and mutual funds, for both institutional and individual investors. It believes that conventional approaches to money management are having an increasingly difficult time meeting or exceed-ing benchmarks. Further, it believes that the new generation of data-mining techniques can capture significant non-linear causal relationships for use in forecasting when market and security price behavior is dominated by non-linearity.

LBS wants to maximize the return on the assets it invests for its clients while minimizing their risk exposure. For LBS, it is not enough just to know which securities to purchase. In order to be successful, the asset management firm must also know when to buy and sell the securities. The firm feels that it can do this through a combination of high-quality analytic tools, highly efficient computer engi¬neering, and market-savvy analysts.

The problem of developing a system to estimate future prices is daunting because financial processes are generally character¬ized by high levels of non-linearity and complexity. The amount of data available to an analyst is overwhelming. Also, financial mar¬kets are constantly evolving so models must adapt to these changes. So,

• The system needs to be able to quickly incorporate knowledge about a domain that often defies explicit definition. On a day-to-day basis, random shocks, crowd psy¬chology, and short-lived trends influence financial markets. Also, different experts have widely varying interpretations of the data even after the fact. Even expert traders sometimes have difficulty explaining what general principle led them to make a specific trade.

• The system needs to be able to deal with and analyze complex data. As a result of the interactions among several different market forces, financial markets can exhibit highly non-linear and highly complex behavior.

• The system needs to be able to deal with the large amounts of economic and financial data that are generated daily. It is difficult or impossible even for the most skilled expert to assimilate this amount of data accurately and consistently. In the words of one experi¬enced trader, "Even the smartest of us is not as smart as the market. In order to make sense of the data, we have little choice but to turn to the computer."

• The system needs to be able to adapt quickly over time.A trading strategy that works in a bull market may not fare well in a bear market. Markets evolve and adapt to different forces over time.

The firm has determined that a meaningful horizon is about 4 weeks. It is an "active" manager that seeks to outperform the mar¬ket, as opposed to a "passive" manager that indexes its portfolio with the market and seeks only to match the market's performance.

The system needs to be able to interpret and analyze large amounts of market data and "update its view of the world" frequently and easily, accessing economic and market data from a variety of sources and, using these data, identifying those stocks that are "likely" to be winners, and those that are more "likely" to be losers, over the next 4 weeks. LBS will use simulated trading systems to test the models.

Models will be tested (or validated) by back testing over several historical years to determine how they would have performed. Models that recommend buying stocks in volumes that were not obtainable or conducting so many trades that transaction fees wiped out profits would not be considered successful.

LBS's data is plentiful, although not necessarily clean.The system does not need to make specific point predictions for prices on a specific date but only to provide the decision maker with estimates of a se¬curity's upside and downside potential. On the other hand, since a decision maker (typically a portfolio manager) would be interpreting the results of a prediction, it would be useful if the model could offer some insight into its analysis. It is also im¬portant that the system fits smoothly into LBS's workflow and current modeling tools. To do this, the system must interface smoothly with the financial databases where the market data are stored.

Since LBS wants a 4-week time horizon, the system need not function in real-time. On the other hand, the system must be able to perform the analysis on each individual security in a reasonable amount of time. The system also must be able to be expanded to accommodate additional securities and input factors.

Inaddition, LBS would like to take up as little of the firm's ex¬pert traders' time as possible. Expert time is valuable; each hour away from market analysis or trading can cost real dollars. Furthermore, and more important, LBS has found that it could be somewhat difficult for their expert traders and analysts to artic-ulate their expertise, especially since the rules are complex and continually evolving.

(a) Briefly describe the modeling problem facing LBS, and identify what type of problem it is in terms of the types of data mining problems discussed in session 1 (prediction, estimation, classification, clustering, association, etc.). Justify your answer.

(b) (What data mining model type would you propose for this problem? Justify your answer.

(c) What are two significant limitations of your proposed approach for the given problem?

Question 2: Assume that you have to build an online recommendation system for buying cars. Cars have hundreds of specifications/features. Comment on whether Naïve Bayes, K Nearest Neighbors or Decision Trees would be the best approach for this type of system. Justify your answer.

Question 3: Assume that using scanner data on customer purchases combined with demographic and behavioral data on customers stored in the corporate data warehouse, you would like to build a predictive model that would help classify customers into one of a set of distinct profitability segments (e.g., high, medium and low). Further, assume that although your company operates across the whole Southern US, you would like to focus on customers spending at least $500 per month on average for the past 12 months, at any of 5 stores in Texas. Discuss whether K-means clustering would be useful to identify the relevant customer set. Justify your answer.

Question 4: Which of the following is a symptom of a decision tree that is "over-fitted"? In each case, briefly justify your answer.

(a) The error rate (misclassification) chart for the model is as in the graphs below (for the training and validation sets):

2174_Chart for the model.png

(b) The tree is unbalanced (i.e., some paths from the root to leaf nodes are long while others are short)

(c) The confusion (classification) matricesfor both the training set and thevalidation sethavelarge valuesin the off-diagonal cells(Hint: In a confusion matrix C, cij indicates the number of cases whose actual output value ri was classified as rj by the tree)

(d) The tree has a highoverall mis-classification rate for the training set but not for the validation set.

(e) A number of the leaf nodes have very low support.

Question 5: Given the following data on purchase transactions expressed as itemsets:

1

Bread

Juice

Ketchup

 

2

Milk

Juice

Apples

 

3

Pepper

Apples

Juice

Wine

4

Juice

Ketchup

Wine

Salt

5

Apples

Detergent

Wine

 

6

Juice

Ketchup

Wine

Apples

7

Bread

Milk

Juice

 

8

Detergent

Wine

Apples

 

9

Salt

Wine

 

 

10

Juice

Ketchup

Milk

Apples

11

Bread

Apples

Wine

 

12

Milk

Juice

Detergent

Ketchup

Each row is an itemset (i.e., a collection of items that were bought together).

(a) Identify all the large itemsets with minsup = 0.25 (i.e., 25%). For each large itemset, compute its support as a percentage (%).

(b) Using the results in (a), state one association rule that has a confidence above 80% and acceptable lift. Compute its confidence, support and lift.

(c) If the APriori approach described in class were used to identify association rules for this data set, identify threeitemsetswhose support would not have to becalculatedby the rule mining process (i.e., their support would not have to be computed)? Explain why they would not be considered.

Question 6:

Consider the following dataset about customers of a particular product. The column "Buyer" indicates whether each customer bought the product or not. You have been asked to use Naïve Bayes Classification to identify potential buyers.

Name

Married

Job

Hair

Gender

Buyer

Peter

No

Manager

Short

Male

Yes

Claudia

Yes

Engineer

Long

Female

No

Angela

No

Lawyer

Long

Female

No

Amy

No

Manager

Long

Female

Yes

Albert

Yes

Engineer

Short

Male

Yes

Karin

No

Manager

Long

Female

No

Nina

Yes

Engineer

Short

Female

Yes

Sergio

Yes

Manager

Long

Male

Yes

Would the following person be a buyer or not (show your calculations)?

John

Yes

Engineer

Short

Male

?

Question 7: Assume that you have joined a company that sells disk drives for PCs. It has decided to enter the market for mobile phones starting next year. The CEO has heard that neural nets are powerful tools for building classification and prediction models, and has asked you to build a Neural Network model for classifying mobile phone products proposed by your R&D department into one of the following three market potential categories: Low, Medium, High. You have been given access to detailed data on the company's products and sales for ten of the last eleven years (current year sales have still to be compiled). How would you respond?

Question 8: Your boss has suggested that rather than using a single type of classification model, it might be useful to combine the strengths of different model types. So she has suggested that you initially build a set of neural network models to figure out the key determinants of buying behavior in each segment, and then use these significant variables to build a decision tree model which would provide the key threshold of each variable that influence the important outcomes in future buying behavior. How would you respond?

Reference no: EM13985015

Questions Cloud

Determine the product or service that you will promote : Determine the product or service that you will promote. The product may be for either the consumer or the business-to-business markets. Provide a general background and description of your proposed product or service and its associated industry
Determine the cost of capital and how to maximize returns : Based on your financial review, determine the risk level of the stock from your investor's point of view. Indicate key strategies that you may use in order to minimize these perceived risks.
Discuss the data characteristics presented. : When reviewing posts made by other students, discuss the data characteristics presented. Do you agree? Could there be more than one data characteristic involved?
Determine the amount allocated to each product : Determine the amount allocated to each product if the estimated net realizable value method is used, and compute the cost per case for each product.  (10points)
Briefly describe the modeling problem facing lbs : Briefly describe the modeling problem facing LBS, and identify what type of problem it is in terms of the types of data mining problems discussed in session
What is the proper between these two events : If event A occurs on Earth at time t=0, and event B occurs in the Proxima Centauri system (41y away) two years in the past, is the separation between these events space like or time like?
Describe what would happen to a company value chain : Could a value chain be maintained without electronics and technology to support it? If so, how?
Write an article on antibiotics and anti-bacterial drugs : Write an article on Antibiotics and Anti-Bacterial Drugs including below mentioned points. Manufacturing of antibiotics, Mode of action of antibiotics, Groups / examples of antibiotics
Bar magnet- around a current-carrying wire : A neutral copper rod, a charged insulator rod, and a bar magnet ac arranged around a current-carrying wire as shown. For each, will it stay where it is? Move toward or away from the wire? Rotate clockwise or counterclockwise?

Reviews

Write a Review

Database Management System Questions & Answers

  Develop an e-r diagram for the library database

Develop an E-R diagram for the library database. The relation schemas for the library database. Normalization of the relations (your relations should be in 4NF)

  Create a query that includes students first names

Perform the following steps in MySQL: Add at least five (5) records into each table (Note: You must determine the field values). Create a query that includes students' first names, last names, and phone numbers

  How to build a data driven business using data analytics

How to Build a Data Driven Business Using Data Analytics

  Describe features of the potential database

List the major topics of the database to track concerts and venues - Write a draft statement of work, including a brief history, a statement of scope, objectives, and preliminary timeline.

  Find an example of an actual data warehouse

Find an example of an actual data warehouse. For this example, describe the content, purpose, user access methods, and sources of data

  Write the sql ddl to create the database

The appropriate SQL commands, which should be copied from your source code in MySQL, and pasted into your submission file; and The resultant tables, which must be screenshots to show the change due to the execution of commands

  What factors will influence how you design the database

The small publishing company you work for wants to create a new database for storing information about all of its author contracts. What factors will influence how you design the database?

  Benefits of data mining to the businesses

Determine the benefits of data mining to the businesses when employing: Predictive analytics to understand the behavior of customer

  How many records would you add or modify in the tables

Suppose you want to generate a report listing each customer name and the amount due from each customer. Which tables contain the data you need to generate the report?

  Convert table to equivalent collection of tables

determine the functional dependencies that exist in the following table. After determining the functional dependencies, convert this table to an equivalent collection of tables.

  List and describe three main capabilities or tools of a dbms

List and describe three main capabilities or tools of a DBMS. Describe the effect of cloud computing on traditional IT infrastructure

  Creating erd containing order and customer entity

Create ERD containing Order and Customer entity types connected by 1-M relationship from Customer to Order. Select suiatble relationship name using your common knowledge of intersection.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd