EF 507 FALL 2008 Chapter 2 Describing Data: Graphical
Chapter Goals After completing this chapter, you should be able to: Identify types of data and levels of measurement Create and interpret graphs to describe categorical variables: Create a line chart to describe time-series data Create and interpret graphs to describe numerical variables: - Histogram
- Construct and interpret graphs to describe relationships between variables
Describe appropriate and inappropriate ways to display data graphically
Types of Data
Measurement Levels
Graphical Presentation of Data Some type of organization is needed The type of graph to use depends on the variable being summarized
Graphical Presentation of Data Techniques reviewed in this chapter:
Tables and Graphs for Categorical Variables
The Frequency Distribution Table
Bar and Pie Charts Bar charts and Pie charts are often used for qualitative (category) data Height of bar or size of pie slice shows the frequency or percentage for each category
Pie Chart Example
Pareto Diagram Used to portray categorical data A bar chart, where categories are shown in descending order of frequency A cumulative polygon is often shown in the same graph Used to separate the “vital few” from the “trivial many”
Pareto Diagram Example Example: 400 defective items are examined for cause of defect:
Pareto Diagram Example Step 2: Determine % in each category
Pareto Diagram Example
Graphs for Time-Series Data A line chart (time-series plot) is used to show the values of a variable over time Time is measured on the horizontal axis The variable of interest is measured on the vertical axis
Line Chart Example
Graphs to Describe Numerical Variables
Histogram The interval endpoints are shown on the horizontal axis the vertical axis is either frequency, relative frequency, or percentage Bars of the appropriate heights are used to represent the number of observations within each class
Histogram Example
Histograms in Excel Select Tools/Data Analysis
Choose Histogram
Questions for Grouping Data into Intervals 1. How wide should each interval be? (How many classes should be used?) 2. How should the endpoints of the intervals be determined? - Often answered by trial and error, subject to user judgment
- The goal is to create a distribution that is neither too "jagged" nor too "blocky”
- Goal is to appropriately show the pattern of variation in the data
How Many Class Intervals? - may yield a very jagged distribution with gaps from empty classes
- Can give a poor indication of how frequency varies across classes
Few (Wide class intervals) - may compress variation too much and yield a blocky distribution
- can obscure important patterns of variation.
Distribution Shape The shape of the distribution is said to be symmetric if the observations are balanced, or evenly distributed, about the center.
Distribution Shape The shape of the distribution is said to be skewed if the observations are not symmetrically distributed around the center.
Relationships Between Variables Graphs illustrated so far have involved only a single variable When two variables exist other techniques are used:
Scatter Diagrams are used for paired observations taken from two numerical variables The Scatter Diagram: - one variable is measured on the vertical axis and the other variable is measured on the horizontal axis
Scatter Diagram Example
Scatter Diagrams in Excel
Graphing Multivariate Categorical Data
Side-by-Side Chart Example Sales by quarter for three sales territories:
Data Presentation Errors Goals for effective data presentation: Communicate complex ideas clearly and accurately Avoid distortion that might convey the wrong message
Data Presentation Errors Unequal histogram interval widths Compressing or distorting the vertical axis Providing no zero point on the vertical axis Failing to provide a relative basis in comparing data between groups
Chapter Summary Reviewed types of data and measurement levels Data in raw form are usually not easy to use for decision making -- Some type of organization is needed: Techniques reviewed in this chapter:
Which of the following variables is an example of a categorical variable? Which of the following variables is an example of a categorical variable? A. The amount of money you spend on eating out each month. B. The time it takes you to write a test. C. The geographic region of the country in which you live. D. The weight of a cereal box.
The data in the time series plot below represents monthly sales for two years of beanbag animals at a local retail store (Month 1 represents January and Month 12 represents December). Do you see any obvious patterns in the data? Explain. The data in the time series plot below represents monthly sales for two years of beanbag animals at a local retail store (Month 1 represents January and Month 12 represents December). Do you see any obvious patterns in the data? Explain. This is a representation of seasonal data. There seems to be a small increase in months 3, 4, and 5 and a large increase at the end of the year. The sales of this item seem to peak in December and have a significant drop off in January.
At a large company, the majority of the employees earn from $20,000 to $30,000 per year. Middle management employees earn between $30,000 and $50,000 per year while top management earn between $50,000 and $100,000 per year. A histogram of all salaries would have which of the following shapes? At a large company, the majority of the employees earn from $20,000 to $30,000 per year. Middle management employees earn between $30,000 and $50,000 per year while top management earn between $50,000 and $100,000 per year. A histogram of all salaries would have which of the following shapes? a. Symmetrical b. Uniform c. Skewed to right d. Skewed to left
Dostları ilə paylaş: |