<< Hide Menu
6 min read•june 18, 2024
Lusine Ghazaryan
Lusine Ghazaryan
Depending on how we use data, the study of statistics is divided into two main areas: descriptive and inferential. In descriptive statistics, we describe a situation by collecting, organizing, summarizing, and presenting the data. In inferential statistics, we try to make an inference from our collected data to populations by generalizing, estimating, testing, and making predictions. We will preserve the inferential statistics for the future and will focus on the descriptive branch of statistics here.
Suppose the statistics class just had a test. The teacher checked and recorded the test scores of students. The test scores represent numbers that, in statistical terms, are called data, and the whole set of numbers of the students is called a data set. But these numbers are meaningless if we don’t know what measures and who those numbers are measured on. Since we know that these are the test scores for the students enrolled in statistics class, these numbers may convey important information about class performance, test difficulty, students’ abilities, content knowledge, and even testing environment if placed in context.
The statisticians will call the students as elements, and the score of each student as an observation. Soon These observations were part of the teacher’s assessment, and she needs to use these data to analyze the content she taught. Imagine if she had over 30 students, it would be hard for her to look at a data set. It would be much more helpful if she organized the data into tables, drawn graphs, or calculated the average.
As mentioned earlier, data can refer to numbers or other subjective labels, and they are useless without their context. One easy way to provide context is to answer the Ws—who, what, when, where, why (if possible), and *how—*of the dataset we're working with.
Knowing who is involved in generating the data we have at hand provides more information about the cases (circumstances) for which (or whom) data is collected. That being said, there are a lot of ways to describe these individuals involved:
Variables are characteristics or attributes that are measured or observed for each individual in a study. The variables should have a name that clearly identifies what has been measured, so that the data collected can be easily understood and analyzed. 🔎
There are different types of variables, including:
The more we know about the context, the more we'll understand about the data we have! This is where the when and where of our data come in.
The when refers to the time at which the data was collected, which can have an impact on the values that are recorded. For example, values recorded at different points in time may reflect different trends or patterns. ⏰
The where of data refers to the location where the data was collected, which can also have an impact on the values that are recorded. For example, values recorded in different geographical locations may reflect different social, cultural, or economic factors. 🗺️
Both the when and where of data can be important considerations when interpreting the results of a study or analysis. It is important to carefully consider the context in which the data was collected, as it can help to better understand the meaning and implications of the results.
The questions that we ask of a variable, or the why of our analysis, shape how we think about and approach the variable. The questions we ask can influence the way we define and measure the variable, as well as the type of statistical analysis that we use to analyze the data. 🖥️
For example, if we are interested in understanding the relationship between two variables (say, amount of sleep and test scores), we might ask questions such as:
The how of data collection refers to the methods or techniques that are used to collect the data, and it can have a significant impact on the quality and reliability of the data.
There are many different methods for collecting data, including surveys, experiments, observations, and secondary data sources. Each method has its own strengths and limitations, and it is important to choose the most appropriate method for the research question being addressed. 📜
For example, Internet surveys can be a convenient and cost-effective way to collect data from a large number of respondents, but they may also be unreliable due to biases, such as nonresponse bias (where certain groups are more or less likely to respond to the survey) or response bias (where the responses are not accurate or honest). 😔
It is important to carefully consider the how of data collection when designing a study or analysis, in order to ensure that the data is of sufficient quality and reliability to support the research question and conclusions.
Tying these factors together, large data is hard to read and to draw conclusions from it. By constructing tables, drawing graphs, or calculating summary measures such as averages, make up the descriptive portion of statistics. The next few sections will show how to construct tables, graphs, and calculate summary measures. The two branches of statistics are strongly connected, and the knowledge gained in the first few units is going to help you when you are introduced to many inference procedures.
© 2024 Fiveable Inc. All rights reserved.