## What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data prior to partitioning the data for honest assessment as opposed to performing the data cleansing after partitioning the data?

What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data prior to partitioning the data for honest assessment as opposed to performing the data cleansing after partitioning the data?

## What values are not affected by this oversampling?

A confusion matrix is created for data that were oversampled due to a rare target. What values are not affected by this oversampling? Question: What values are not affected by this oversampling? Options: Sensitivity and PV+ Specificity and PV- PV+ and PV- Sensitivity and Specificity Correct Answer The Correct Answer for this Question is Sensitivity

## Which of the two models should be selected and why?

Refer to the exhibit:The plots represent two models, A and B, being fit to the same two data sets, training and validation.Model A is 90.5% accurate at distinguishing blue from red on the training data and 75.5% accurate at doing the same on validation data. Model B is 83% accurate at distinguishing blue from red

## Which statistic, calculated from a validation sample, can help decide which model to use for prediction of a binary target variable?

Which statistic, calculated from a validation sample, can help decide which model to use for prediction of a binary target variable? Question: Which statistic, calculated from a validation sample, can help decide which model to use for prediction of a binary target variable? Options: Adjusted R Square Mallow's Cp Chi Square Average Squared Error Correct

## What is an acceptable division between training, validation, and testing data?

In order to perform honest assessment on a predictive model, what is an acceptable division between training, validation, and testing data? Question: What is an acceptable division between training, validation, and testing data? Options: Training: 50% Validation: 0% Testing: 50% Training: 100% Validation: 0% Testing: 0% Training: 0% Validation: 100% Testing: 0% Training: 50% Validation:

## What is the best data to use for model assessment?

The total modeling data has been split into training, validation, and test data. What is the best data to use for model assessment? Question: What is the best data to use for model assessment? Options: Training data Total data Test data Validation data Correct Answer The Correct Answer for this Question is Validation data

## Including redundant input variables in a regression model can:

Including redundant input variables in a regression model can: Question: Including redundant input variables in a regression model can: Options: Stabilize parameter estimates and increase the risk of overfitting. Destabilize parameter estimates and increase the risk of overfitting. Stabilize parameter estimates and decrease the risk of overfitting. Destabilize parameter estimates and decrease the risk of

## What problem does this present?

One common approach for predicting rare events in the LOGISTIC procedure is to build a model that disproportionately over-re presents those cases with an event occurring (e.g. a 50-50 event/non-event split).What problem does this present? Question: What problem does this present? Options: All parameter estimates are biased. Only the intercept estimate is biased. Only the

## What will be the result?

A non-contributing predictor variable (Pr > |t| =0.658) is added to an existing multiple linear regression model.What will be the result? Question: What will be the result? Options: An increase in R-Square A decrease in R-Square A decrease in Mean Square Error No change in R-Square Correct Answer The Correct Answer for

## The selection criterion used in the forward selection method in the REG procedure is:

The selection criterion used in the forward selection method in the REG procedure is: Question: The selection criterion used in the forward selection method in the REG procedure is: Options: Adjusted R-Square SLE Mallows' Cp AIC Correct Answer The Correct Answer for this Question is SLE