Data mining

APA format with references

1. Read the dataset description at UCI Machine Learning: Credit Approval. In your own words, describe your understanding of the dataset, what the attributes (columns) mean, and what each observation (row) represents. (5 sentences)

http://archive.ics.uci.edu/ml/datasets/Credit+Approval

2. What is discretization? Provide a one-paragraph, masters-level response in your own words.

3. Compare and contrast the discretization methods (equal interval, equal frequency, k-means clustering) providing at least one example of when you would use each one.

4. Why is it important to handle missing values in your dataset prior to beginning your primary data analysis? Provide a one-paragraph, masters level response in your own words.

5. Describe at least one alternative approach for handling missing values other than replacing the values with the attribute mean. Provide a one-paragraph, masters-level response in your own words.

6. Why is data pre-processing important? Describe at least two advantages that pre-processing results in as well as two disadvantages of not pre-processing. Provide a one-paragraph, masters-level response in your own words.

7. What differences did you observe between variable filters and row filters? Provide at least one scenario for each filter type where implementing the filter would benefit your data analysis. Provide a one-paragraph, masters level response in your own words.

"Get 15% discount on your first 3 orders with us"
Use the following coupon
FIRST15

Order Now
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *