Design and implement neural network-based machine learning

Assignment Help Other Subject
Reference no: EM133918144 , Length: word count:1000

Artificial Intelligence and Machine Learning

Simulation and problem solving using advanced machine learning models

Assessment - Modelling and Simulation

Task
Using Orange Data Mining or Python, design and implement Neural Network-based machine learning models to solve real-world problems. Answer to the accompanying questions with thoughtful analysis.

Assessment Description
Neural networks, especially their applications in deep learning, have gained significant popularity in recent years. Business Analytics professionals now have access to cloud-based hardware, enabling them to run deep learning models effortlessly via browsers and no-code platforms.

In this assessment, you will design, implement, and evaluate neural network-based models, such as image and text classification, using Orange Data Mining or Python. You will be provided with datasets to apply these models to real-world scenarios, assessing their strengths, limitations, and practical impact. Get expert-level assignment help in any subject.

Assessment Instructions

Use Orange Data Mining or Python to implement neural network-based models to solve real-world problems. The datasets, parameters, and instructions are provided in the assessment sheet.

Based on software outputs, answer the accompanying questions in the assessment sheet.

Write a 1000-word (maximum) report that summarises your work and includes the answers to the questions from the assessment sheet. The report must be written using a Google docs template (shared by your lecturer).

1. Machine Learning Model Comparison

You are a data scientist tasked with solving a real-world healthcare challenge: building a machine learning system to predict breast cancer diagnoses (malignant or benign). Early detection is critical for improving patient outcomes, and your goal is to identify the most accurate predictive model using a provided dataset.
You will:
Explore the dataset.
Implement multiple machine learning algorithms.
Evaluate and compare model performance.
Select the best-performing model for deployment.

The dataset for this activity is available here: Breast Cancer Data. It contains diagnostic data for breast cancer cases, including various features extracted from cell nuclei in digitized images. There are 30 features.
Load the dataset, inspect it, and perform some descriptive analytics (summary statistics, correlation analysis, distributions analysis).

1.1 What is the percentage of benign and malignant diagnosis in the dataset? Provide a visual to justify your answer.

1.2 What are the two most correlated features (respectively the two least correlated features) in the dataset? Provide a visual to justify your answer.

1.3 Provide a visualization of the variables `radius_worst) (feature) and 'diagnosis' (target variable) together? What can you say about the ability of the feature 'radius worst' to predict the target variable 'diagnosis'? Justify your answer.

Perform Principal Component Analysis (PCA) on all the features.

1.4 What is the percentage of explained variance provided by the first 10 principal components? Provide a visual to justify your answer.
From now on, you will only use the first 10 principal components of PCA as predictor variables to build different machine learning models. Perform stratified, replicable train-test splitting on the new dataset using the split ratio 80% training and 20% testing. Train the following machine learning classification models:

Logistic Regression: train a logistic regression model with no regularization.

Random Forest: train a random forest model with 500 trees, number of features considered at each split equals to 5, limit depth of individual trees equals to 3, do not split subsets smaller than 5, and replicable training.

Neural Networks: train a neural network model, with 2 hidden layers, 5 neurons per hidden layers, 'tank' activation function, Adam' solver, no regularization, replicable training, and maximum number of iterations equals to 500.

1.5 Provide the predictive performance metrics table of the 3 machine learning models (logistic regression, random forest, neural networks) on the testing data for each category of the target variable and overall.

1.6 What is the best predictive model according to the Fl score and why? Provide its confusion matrix.

1.7 Suppose that the best predictive model predicts a breast cancer case in testing dataset as a malignant diagnosis. What is the probability that the diagnosis is benign? Be sure to show all your workings, each step you take to reach your answer should be clearly presented.

2. Image Ana lytics

You are working as a machine learning engineer for a company specializing in automated plant identification systems. The company is developing a mobile application that helps users identify flowers in real time using image recognition. Your task is to build a predictive model that can accurately classify whether an image contains a daisy flower or not.

Accurate classification is essential for improving user experience and ensuring the reliability of the app's recommendations. The model you develop will serve as a prototype for future deployment in the app's backend system.

You will build a complete image classification workflow. The goal is to distinguish between daisy and non-daisy images using deep learning-based feature extraction and a custom neural network classifier.

The image dataset for this activity is available here Flowers Data. It contains labeled examples of daisy and non-daisy images. Ensure the dataset is properly organized with clear labels for each class before importing into Orange.

Use the dataset above to build an Orange predictive workflow to classify whether an image is daisy flower or not (daisy vs non-daisy). Use the train-test split ratio of 75:25. Use replicable sampling and stratify the sample. Use SqueezeNet for feature extraction and a neural network for classification, with 2 hidden layers of 10 neurons each, ReLU activation, Adam optimizer, regularization strength of 0.0005 and train for 250 steps maximum with replicable training.

2.1 What is the percentage of daisy and non- daisy flowers in the dataset? Provide a visual to justify your answer.

2.2 Provide the predictive performance metrics table on the testing data for each category of the target variable and overall.

2.3 In the testing set, how many daisy flowers have been wrongly classified as non-daisy? Provide a visual to justify your answer.

2.4 Provide a visual of the daisy flower images in the testing set that have been wrongly classified as non-daisy.

2.5 Classify the image available here Image To Classify using your predictive analytics orange workflow. What are the predicted probabilities to be daisy and non-daisy? Provide a visual to justify your answer.

2.6 Did the predictive system classify well the previous image (in 2.5)7 If Yes or No, what could be the reason?

Suppose we use three principal components (PC1, PC2, PC3) of the extracted features as input features (xl, x2, x3) and perform daisy flower classification using a neural network that has no hidden layers (shown above).

3. Text Analytics

You are a data analyst working for a social media monitoring firm that provides sentiment insights into brands, public figures, and organizations. Your current task is to analyze tweets to determine how people feel about specific entities-such as companies, politicians, or celebrities.

Understanding sentiment at the entity level is critical for reputation management, targeted marketing, and strategic decision-making. Your goal is to build a model that can accurately classify the sentiment expressed in a tweet about a given entity.

Using a labeled Twitter dataset, you will perform entity-level sentiment analysis. For each tweet and its associated entity, your task is to classify the sentiment as: Positive, Negative, and Neutral. The dataset is available here: Twitter Data. The dataset has been already split into training and testing sets. Use 'twitter..training' as the training set and 'twitter_testing' as the testing set. Each record in each dataset contains: tweet id, entity, sentiment label, and tweet content.

Carried out the following steps to the English-language text data:
Preprocessing: Transformation (remove URLs, remove accents, lowercase, parse html).
Preprocessing: Tokenization using the Regexp algorithm, and normalization with Lemmagen Lemmatizer.
Preprocessing: Filtering stopwords and numbers.
Data Exploration: Visualize the target variable, Generate Word Cloud.
Embedding: Convert the cleaned text into numerical features using the fastText algorithm with mean aggregator.
Training: Train a neural network classification model using both the extracted features and the entity variable as predictor variables, with 2 hidden layers of 10 neurons each, tank activation function, Adam optimizer, no regularization and train for 500 steps maximum with replicable training.

Evaluate the predictive performance of the trained model on testing dataset.

3.1 What is the percentage of each category of the target variable in the training data. Provide a visual to justify your answer.

3.2 What is the most frequent word in the dataset? Provide a visual to justify your answer.

3.3 Report the predictive performance metrics on the testing dataset for each category of the target variable and overall.

3.4 In the testing dataset, how many positive tweets have been wrongly classified as Negative? Provide a visual to justify your answer.

3.5 Provide a visual of positive tweets in the testing set that have been wrongly classified as Neutral.

Reference no: EM133918144

Questions Cloud

Strategies to improve your verbal and non-verbal skill : Discuss strategies to improve your verbal and non-verbal skill. Discuss interaction strategies with integrating a palliative approach to care for patients
What is the nash equilibrium : You are considering entering the market for the SmokeRunnerTM, an innovative type of running shoe. What is the Nash equilibrium?
Describe potential health risks for obesity : Describe the potential health risks for obesity that are of concern for Mr. C. Discuss whether bariatric surgery is an appropriate intervention.
What makes the consumer more likely to work overtime : What makes the consumer more likely to work overtime? Explain. Would the consumer choose to work exactly n hours? Explain.
Design and implement neural network-based machine learning : DATA4800 Artificial Intelligence and Machine Learning, Kaplan Business School - Simulation and problem solving using advanced machine learning models
Discuss the ramifications you believe your policy might have : Discuss the ramifications you believe your policy might have not only on future heroin use but also on future use of other illicit drugs.
Discuss personal versus professional management of conflict : Examine interpersonal conflict as a process and discuss personal versus professional management of conflict
Squamous cell carcinoma and lists client symptoms : What is squamous cell carcinoma and lists the client symptoms/manifestations of lung cancer?
Discussing assisted suicide in lecture : A nursing faculty member is discussing assisted suicide in a lecture. An important concept to determine is whether this act is an ethical act.

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd