DAT - Data, Data, Data (Lesson)

APCompSci_LessonTopBanner.png

Data, Data, Data

Introduction

Being able to access and process data is an important part of many industries. Whether it is determining the best allocation of resources or deciding what new product to create, analyzing data is a necessary component of those conversations. Luckily there are tools available to help read and process data, and this lab will introduce one such tool. You will utilize existing real-world data to answer a question of your own interest so begin brainstorming a topic and question that you would like to explore.

The ease with which data can be stored, cataloged, and searched in an ever more-connected society is an important point to consider. Take a few minutes to think about privacy policies. Regardless of whether a service is offered for free or at a cost, the companies that people interact with on a regular basis collect data about them. Privacy polices are commonplace everywhere from social media to your local doctor’s office. By agreeing to use a service, and often just by visiting a website, you are agreeing to the privacy policy.

Record the answers to the following questions on a sheet of paper. You will need the answers to complete the Lab Check Quiz.

Activity 1

  • Make a list of two or three sites that you typically visit.
  • Find the privacy policy for a specific site.
  • Identify two or three pieces of information that are collected. This information could include information about the device used to access the site, such as type or IP address, as well as information about specific content viewed.
This lab content is from the CollegeBoard.

It is important to understand different file types, what certain file extensions mean, and how to open, view or access data within different types of files.

You will need the DataLabCode folder for this Activity. This folder is located in the table of contents for download.

Activity 2

1. From within Microsoft Excel, open the Cereal.csv and Cereal.xlsx. When you access the files the file types might show as the following in the folder:Microsoft Excel Comma Separated Values File is the csv and Microsoft Excel Worksheet is the xlsx file. They look the same and contain the same data, although the file type is different.

2. Open the .csv file in a text editor. You can do this by opening the Notepad on your computer and then opening the .csv file from there. (You may need to choose the option to show “All Files” if you do not see the file you need to open.)

screenshot showing how to choose all files 

When you open the file in a text editor you will notice the comma separating the data. Each row of the original file is on its own line.

3. Try and open the .xlsx file in a text editor. Notice that the contents are illegible.

4. During this lab, you might run into a file type that is JSON files (.json). JSON stands for JavaScript Object Notation, and although files of this type look harder to read than a .csv or .xlsx file, the data is formatted specifically to be read and utilized by computer programs.

Record the answers to the following questions on a sheet of paper. You will need the answers to complete the Lab Check Quiz.

1. What does the word delimited mean? (Research this new word.) Why is this necessary when talking about data files?

2. Given a data file, how can you determine the type of data that might be contained in a specific column?

Activity 3

Record the answers to the following questions on a sheet of paper. You will need the answers to complete the Lab Check Quiz. (You will also use your answers to these questions for the Data Collection and Privacy Discussion.)

1. Identify two broad areas of interest that you have. Examples include health, sports, etc.

2. For each of the two areas of interest you identified, determine one question to which you might want to know the answer. These should be questions that are not easily answered with an online search. Two examples of questions that are easy to answer with an online search are who won the 1998 Superbowl (The Denver Broncos), and what is the height of the world’s tallest building (at the time of publication, Burj Khaliga, 2717 ft/ 828 m).

Activity 4

All three of these sites allow searching of their data catalog (the collection of data sets). Spend a few minutes looking at the various topics of data available. Examples of topics include education, finance, nutrition, government, athletics, and technology.

Record the answers to the following questions on a sheet of paper. You will need the answers to complete the Lab Check Quiz.

1. Are there data sets that might apply to one of the two questions that you wrote in the previous activity? Find two different data sets that might be used to answer one of your questions. List the site and any search criteria used to find the data sets. If not, consider revising one of your questions so that you can identify an applicable data set.

2. How many records are in each of the data sets you identified? Describe one benefit of using a larger data set with more records.

Lab Check

Record the answers to the following questions on a sheet of paper. You will need the answers to complete the Lab Check Quiz.

1. Describe one way that user data captured by a site (whether knowingly or unknowingly) has contributed to an improvement in the service provided. Have there been any positive impacts of this data outside of the service or website?


2. Do you know how the data in the data set you identified was collected? If so, please describe, if not, describe one way that the data might have been collected.


3. In your opinion, are there situations where the benefit provided from the data collected is worth the risk to personal privacy? Why or why not?

APCompSci_LessonBottomBanner.pngIMAGES CREATED BY GAVS