UDP- Questions and Data Sources Lesson

Questions and Data Sources

The data process starts by asking primary questions. These primary questions are then broken down into smaller, secondary questions that help answer the primary questions. Primary questions are broad and often have complex answers. Secondary questions are specific and can often be answered with numbers or text directly from the data. For example, a school district might want to know how it can improve graduation rates at its schools. In trying to figure out how to improve graduation rates, a school district might ask the secondary question, which schools have the highest graduation rates.

We will look at a sample dataset and formulate a primary question and secondary questions. The dataset below is from the World Bank Open Data which provides free and open access to data about development in countries around the globe. It provides data on GDP, population, life expectancy at birth, health expenditure private, health expenditure public, and total health expenditure for every country in the world from the years 2000 to 2010.    

The data set below has been collected in a spreadsheet program. The information is in a row and column format. The categories for the data are in the columns A - J. The data for the category is in each row under the column.    

The data set below has been collected in a spreadsheet program.  

From this data, we can pose a question: "How does healthcare spending influence life expectancy?"

Here are some secondary questions you could ask from the data to help arrive at the answer.

  1. How much (in USD) is spent on healthcare in total in each country?
  2. How much (in USD) is spent per capita in each country?
  3. In which country is the most spent per person? In which country is the least spent? What is the average for each continent? For the world?
  4. Which countries have the highest life expectancy?
  5. Which countries have the lowest life expectancy?
  6. What is the relationship between public and private health expenditure in each country?

We might develop a hypothesis that states "Countries who spend more on healthcare have a higher life expectancy age."  

A hypothesis is a proposed explanation made on the basis of limited evidence as a starting point for further investigation. A hypothesis is a specific, testable prediction about what you expect to happen in your study. Your research will either support or disprove your hypothesis.

A popular format for saving a database is a CSV, comma separated values file.

CSV allows data to be saved in a table structured format. It takes the form of a text file containing information separated by commas. CSV files can be used with any spreadsheet program, such as Microsoft Excel, Open Office Calc, or Google Sheets. They differ from other spreadsheet file types in that you can only have a single sheet in a file, they cannot save cell, column, or row styling or formulas. It is smaller, easier to store and /or send, and is independent of software, or operating systems.

[CC BY 4.0] UNLESS OTHERWISE NOTED | IMAGES: LICENSED AND USED ACCORDING TO TERMS OF SUBSCRIPTION