This is the third course in the Google Data Analytics Certificate. These courses will equip you with the skills needed to apply to introductory-level data analyst jobs. As you continue to build on your understanding of the topics from the first two courses, you’ll also be introduced to new topics that will help you gain practical data analytics skills. You’ll learn how to use tools like spreadsheets and SQL to extract and make use of the right data for your objectives and how to organize and protect your data. Current Google data analysts will continue to instruct and provide you with hands-on ways to accomplish common data analyst tasks with the best tools and resources. Learners who complete this certificate program will be equipped to apply for introductory-level jobs as data analysts. No previous experience is necessary. By the end of this course, you will: - Find out how analysts decide which data to collect for analysis. - Learn about structured and unstructured data, data types, and data formats. - Discover how to identify different types of bias in data to help ensure data credibility. - Explore how analysts use spreadsheets and SQL with databases and data sets. - Examine open data and the relationship between and importance of data ethics and data privacy. - Gain an understanding of how to access databases and extract, filter, and sort the data they contain. - Learn the best practices for organizing data and keeping it secure.
View Syllabus
Skills You'll Learn
Spreadsheet, Metadata, Data Collection, Data Ethics, SQL
Reviews
5 stars
80.86%
4 stars
15.94%
3 stars
2.37%
2 stars
0.35%
1 star
0.46%
RL
Mar 24, 2021
Excellent course that provides a robust introduction to working with databases with Google sheets and SQL. Covers bias and data privacy as well. Gives clear definitions and is well taught and paced.
FR
Aug 9, 2021
As a beginner in SQL and Data analyst world, this course "Prepare Data for exploration" open my eyes and enrich my knowledge. I do really enjoy this course. and Excited to move forward. Thank you :)
From the lesson
Organizing and protecting your data
Good organization skills are a big part of most types of work, and data analytics is no different. In this part of the course, you’ll learn the best practices for organizing data and keeping it secure. You’ll also learn how analysts use file naming conventions to help them keep their work organized.
Taught By
Google Career Certificates
Home
Subjects
Solutions
Create
Log in
Sign up
Upgrade to remove ads
Only ₩37,125/year
- Science
- Computer Science
- Computer Graphics
Review terms and definitions
Focus your studying with a path
Take a practice test
Get faster at matching termsHow do you want to study today?
Flashcards
Learn
Test
Match
Terms in this set (80)
Which method of data-collection is most commonly used by scientists?
Observations
Organizations such as the U.S. Centers for Disease Control (CDC) often use data collected from hospitals. What kind of data is the CDC using if it is collected by hospitals, then sold to the CDC for its own analysis?
Second Party Data
Fill in the blank: In data analytics, a _____ refers to all possible data values in a certain dataset.
Population
The use of external data is particularly valuable in which
circumstances?
-When analysis includes data from audio files
-When analysis involves data that hasn't been cleaned
-When analysis requires a lot of structured data
-When analysis depends on as many data sources as possible
When analysis depends on as many data sources as possible.
Fill in the blank: The running time of a movie is an example of _____ data.
continuous
Question 4
Which of the following is an example of unstructured data?
-Email message
-Contact saved on a phone
-GPS location
-Rating of a local favorite restaurant
Structured data enables data to be grouped together to form relations. This makes it easier for analysts to do what with the data? Select all that
apply.
-Search
-Rewrite
-Analyze
-Store
Search, Analyze, & Store
Fill in the blank: Internet search engines are an everyday example of how Boolean operators are used. The Boolean operator _____ expands the number of results when used in a keyword search.
OR
Which of the following statements accurately describes a key difference between wide and long data?
-Wide data subjects can have multiple rows that hold the values of subject attributes. Long data subjects can have data in multiple columns.
-Every wide data subject has multiple columns. Every long data subject has data in a single column.
-Every wide data subject has a single column that holds the values of subject attributes. Every long data subject has multiple columns.
-Wide data subjects can have data in multiple columns. Long data
subjects can have multiple rows that hold the values of subject attributes.
-Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes.
What does data transformation enable data analysts to accomplish?
Change the structure of the data
A data analyst is working on an urgent traffic study. As a result of the short time frame, which type of data are they most likely to use?
Historical
Which of the following is an example of continuous data?
-Leading actors in movie
-Movie budget
-Box office returns
-Movie run time
-Movie run time
TRUE or FALSE: Nominal qualitative data has a set order or scale.
False
Which of the following is a benefit of internal data?
-Internal data is more reliable and easier to collect.
-Internal data is less vulnerable to biased collection.
-Internal data is the only data relevant to the problem.
-Internal data is less likely to need cleaning.
-Internal data is more reliable and easier to collect.
TRUE or FALSE: A social media post is an example of structured data.
False
Fill in the blank: A Boolean data type can have _____ possible values.
2
TRUE or FALSE: Data transformation can change the structure of the data. An example of this is taking data stored in one format and converting it to another.
True
Which of the following are examples of sampling bias? Select all that apply.
-A clinical study includes three times more men than women.
-An online marketing analytics firm stores data in a spreadsheet.
-A national election poll only interviews people with college degrees.
-A survey of high-school-age students does not include homeschooled students.
-A clinical study includes three times more men than women.
-A national election poll only interviews people with college degrees.
-A survey of high-school-age students does not include homeschooled students.
Fill in the blank: The tendency to search for or interpret information in a way that validates pre-existing beliefs is _____ bias.
Confirmation
Which of the following terms are also ways of describing observer bias? Select all that apply.
Researcher & Experimentor Bias
Which of the following are usually good data sources? Select all that apply.
-Academic papers
-Governmental agency data
-Vetted public datasets
-Social media sites
-Academic papers
-Governmental agency
data
-Vetted public datasets
To determine if a data source is cited, you should ask which of the following questions? Select all that apply.
-Who created this dataset?
-Is the data relevant to the problem I'm trying to solve?
-Is this dataset from a credible organization?
-Has this dataset been properly cleaned?
-Who created this dataset?
-Is this dataset from a credible
organization?
A data analyst is analyzing sales data for the newest version of a product. They use third-party data about an older version of the product. For what reasons is this inappropriate for their analysis? Select all that apply.
-The data is not current
-The data is biased
-The data is not original
-The data is not accurate
-The data is not current
-The data is not original
Fill in the blank: _____ states that all data-processing activities and algorithms should be completely explainable and understood by the individual who provides their data.
Transaction Transparency
A data analyst removes personally identifying information from a dataset. What task are they performing?
Data anonymization
Before completing a survey, an individual acknowledges reading information about how and why the data they provide will be used. What is this concept called?
Consent
What aspect of data ethics promotes the free access, usage, and sharing of data?
Openness
What are the main benefits of open data? Select all that apply.
-Open data makes good data more widely available.
-Open data combines data from different fields of knowledge.
-Open data increases the amount of data available for purchase.
-Open data restricts data access to certain groups of people.
-Open data makes good data more widely available.
-Open data combines data from different fields of knowledge.
Universal participation is a standard of open data. What are the key aspects of universal participation? Select all that apply.
-Certain groups of people must share their private data.
-All corporations are allowed to sell open data.
-No one can place restrictions on data to discriminate against a person or group.
-Everyone must be able to use, re-use, and redistribute open data.
-No one can place restrictions on data to discriminate against a person or
group.
-Everyone must be able to use, re-use, and redistribute open data
Fill in the blank: A preference in favor of or against a person, group of people, or thing is called _____. It is an error in data analytics that can systematically skew results in a certain direction.
Data Bias
Which of the following are types of data bias often encountered in data analytics? Select all that apply.
-Confirmation bias
-Interpretation bias
-Observer bias
-Educational bias
-Confirmation bias
-Interpretation bias
-Observer bias
TRUE or FALSE: In general, the usefulness of data decreases as time passes.
True
If a company uses your personal data as part of a financial transaction, you should be made aware of the nature and scale of the transaction. What concept of data ethics does this refer to?
Currency
Ownership is a key issue in data ethics. Who owns data?
The individual who originally generates the data
TRUE or FALSE: An employer accesses an employee's credit report without their consent. This is not a violation of the employee's privacy because they work at the company.
False
What is the process of protecting people's private or sensitive data by eliminating identifying information?
Data Anonymization
TRUE or FALSE: A key aspect of open data is free access to people's personal information.
False
Fill in the blank: A _____ is an identifier that references a database column in which each value is unique.
Primary Key
Fill in the blank: A relational database contains a series of _____ that can be connected to form relationships.
Tables
A key benefit of working with normalized databases is that they help lower data redundancy. Which of the following is an example of redundancy?
-Team members in different office locations working with the same data
-A database that forms two or more relationships
-The same piece of data being stored in two different places
-A database containing two foreign keys
-The same piece of data being stored in two different places
A large company has several data collections across its many departments. What kind of metadata indicates exactly how many collections a piece of data lives in?
Structual
The date and time a photo was taken is an example of which kind of metadata?
Administrative
A large metropolitan high school gives each of its students an ID number to differentiate them in its database. What kind of metadata are the ID numbers?
Descriptive
What is the process for arranging data into a meaningful order to make it easier to understand, analyze, and visualize?
sorting
A data analyst is reviewing a national database of real estate sales. They are only interested in sales of condominiums. How can the analyst narrow their scope?
Filter out non-condominiums
A data analyst works for a rental car company. They have a spreadsheet that lists car ID numbers and the dates cars were returned. How can they sort the spreadsheet to find the most recently returned cars?
By return date in descending order
Fill in the blank: To keep a header row at the top of a spreadsheet, highlight the row and select _____ from the View menu.
Freeze
In MySQL, what is acceptable syntax for the SELECT keyword? Select all that apply.
-SELECT
-"SELECT"
-select
-'select'
-select
SELECT
A database table is named blueFlowers. What type of case is this?
Camel Case
In BigQuery, what optional syntax can be removed from the following FROM clause without stopping the query from running?
FROM `bigquery-public-data.sunroof_solar.solar_potential_by_postal_code`
Backticks
In the following FROM clause, what is the table name in the SQL query?
FROM
bigquery-public-data.sunroof_solar.solar_potential_by_postal_code
solar_potential_by_postal_code
Primary and foreign keys are two connected identifiers within separate tables. These tables exist in what kind of database?
Relational
When working with data from an external source, what can metadata help data analysts do? Select
all that apply.
-Ensure data is clean and reliable
-Combine data from more than one source
-Understand the contents of a database
-Choose which analyses to run
-Ensure data is clean and reliable
-Combine data from more than one source
-Understand the contents of a database
Think about data as driving a taxi cab. In this metaphor, which of the following are examples of metadata? Select all that
apply.
-Company that owns the taxi
-License plate number
-Make and model of the taxi cab
-Passengers the taxi picks up
-Company that owns the taxi
-License plate number
-Make and model of the taxi cab
Fill in the blank: Data _____ is the process of ensuring the formal management of a company's data assets.
Governance
In
what circumstance might a data analyst choose not to use external data in their analysis?
-The data is free for anyone to access
-The data cannot be confirmed to be reliable
-The data represents diverse perspectives
-The data is too thorough
-The data cannot be confirmed to be reliable
A data analyst reviews a national database of movie theater showings. They want to find the first movies shown in
San Francisco in 2001. How can they organize the data to return the first 10 movies shown at the top of their list? Select all that apply.
-Sort by date in descending order
-Filter out showings outside of San Francisco
-Filter out showings not in 2001
-Sort by date in ascending order
-Filter out showings outside of San Francisco
-Filter out showings not in 2001
-Sort by date in ascending order
TRUE or FALSE: When writing a query, you must remove the two backticks around the name of the dataset in order for the query to run properly.
False
Data analysts use guidelines to describe a file's version, content, and date created. What are these guidelines called?
Naming Conventions
Data analysts use foldering to achieve what
goals? Select all that apply.
-To organize files into subfolders
-To assign metadata about the folders
-To transfer files from one place to another
-To keep project-related files together
-To organize files into subfolders
-To keep project-related files together
Fill in the blank: To separate current from past work and reduce clutter, data analysts create _____. This involves moving files from completed projects to a separate location.
Archives
What is the process of structuring folders broadly at the top, then breaking down those folders into more specific topics?
Creating a hierarchy
Successful file naming conventions include information that's useful when trying to locate or update a file. Which of the following is
an effective file name?
-Data_519
-CampaignData_03
-AirportCampaign_2013_10_09_V01
-May30-2019_AirportAdvertisingCampaignResults_Terminals3-5_InclCustSurveyResponses_PLUS_IdeasforJune
-AirportCampaign_2013_10_09_V01
What aspects of a file do file-naming conventions typically describe? Select all that apply.
-Collaborators
-Creation date
-Version number
-Content
-Creation date
-Version number
-Content
A data analyst is working with a file from a customer satisfaction survey. The survey was sent to anyone who became a customer between April and June, 2020. Which of the following is an effective name for the file?
-Survey_Responses
-April_May_June_2020_Responses_to_New_Customer_Survey_ANALYS-SDATA_928310
-Apr-June2020_CustSurvey_V
-NewCustomerSurvey_2020-6-20_V03
-NewCustomerSurvey_2020-6-20_V03
What process do data analysts use to keep project-related files together and organize them into subfolders?
Foldering
Data analysts use archiving to separate current from past work. What does this process involve?
Moving files from completed projects to another location
TRUE or FALSE: Data analysts create hierarchies to organize their folders. They do this by structuring folders by specific topics at the top, then more broadly below
False
Using encryption to protect data is an example of what?
Data Security
TRUE or FALSE: To reduce clutter, a data analyst hides cells that contain long, complex formulas. To view the formulas again, the analyst will need to adjust the spreadsheet sharing or encryption settings.
False
Fill in the blank: File-naming conventions are _____ that describe a file's content, creation date, or version.
Consistant guidelines
Fill in the blank: A data analytics team uses _____ to indicate consistent naming conventions for a project. This is an example of using data about data.
Metadata
Reviewing the data enables you to describe how you will use it to achieve your client's goals. First, you notice that all of the data is first-party data. What does this mean
It's data that was collected by Garden employees using the company's own resources.
The question in column E asks, "Was your order accurate? Please respond yes or no." What kind of data is this?
Boolean Data
TRUE or FALSE: The next thing you review is the file containing pictures of sandwich deliveries over a period of 30 days. This is an example of structured data.
False: Unstructured
TRUE or FALSE: Now that you're familiar with the data, you want to build trust with the team at Garden. You decide to impress them by taking the initiative to reach out to your social media followers. You explain that Garden is a new client, and you show them the pictures of Garden's sandwich deliveries from the client file. Then, you ask them if they have any photos of sandwich deliveries that you can evaluate.
This is an example
of going above and beyond expectations and a great way to build trust.
FALSE
Our data analytics team often surveys clients to get their feedback. If you were on the team, how would you ensure the sample is representative of the population as a whole?
-Use a randomized sample of the population that includes all genders.
-Make sure the sample is chosen at random.
-Only include participants who can answer
survey questions in a timely manner.
-Include clients with disabilities in the survey sample.
-Use a randomized sample of the population that includes all genders.
-Make sure the sample is chosen at random.
-Include clients with disabilities in the survey sample.
Our data analytics team often uses both internal and external data. Describe the difference between the two.
-Internal data came from a
company's own systems. External data comes from outside the organization.
-Internal data is often generated from within the company. External data is generated outside the organization.
-External data is often generated from within the company. Internal data is generated outside the organization.
-External data came from a company's own systems. Internal data came from the organization.
-Internal data came from a company's own systems. External
data comes from outside the organization.
-Internal data is often generated from within the company. External data is generated outside the organization.
Our analysts often work with the same spreadsheet, but for different purposes. How would you use sorting to help in this situation?
-Sort data to make it easier to understand, analyze and visualize
-Sort the data to arrange data in a meaningful order
-Sort data to show only the data
that meets a specific criteria while hiding the rest
-Sort data to highlight the header row.
-Sort data to make it easier to understand, analyze and visualize
-Sort the data to arrange data in a meaningful order
For your final question, your interviewer explains that Sewati Financial Services cares about data privacy. The company needs its clients' trust, and this is an important responsibility for the
data analytics team.
He asks: What does data privacy involve? Select all that apply.
-Putting privacy measures in place to protect people's data
-Encryption and sharing permissions
-A person's legal right to their data
-Preserving a data subject's information and activity any time a data transaction occurs
-Putting privacy measures in place to protect people's data
-A person's legal right to their data
CS 302
65 terms
Hughston_May
MIS 3300 Midterm Exam
91 terms
Jake_Ward7
Data Warehousing Questions
40 terms
rasprin2
MIS Exam 1 Ch 4
34 terms
Aica_Talastas
Sets found in the same folderASK QUESTIONS TO MAKE DATA DRIVEN DECISIONS
42 terms
Melanie_Benton
Week 4: Practice Quiz: Mastering Spreadsheet Basic…
3 terms
ronmb19931
Week 4: Practice Quiz: Test your knowledge on SQL…
5 terms
ronmb19931
Exam 4
20 terms
ssonora
Other sets by this creatorQuiz Questions
54 terms
Jasmin_Gomez2
Vocabulary
6 terms
Jasmin_Gomez2
Quiz Questions
57 terms
Jasmin_Gomez2
Vocabulary
62 terms
Jasmin_Gomez2
Other Quizlet setsscm 400 ch22
10 terms
VZETINA
Computer Science Quiz 1 - Review
24 terms
corey_hansen_bob
Unit 5 Exam
30 terms
Carter_Daugherty2
HG Chapter 8: Gender and Peer Relationships - Midd…
10 terms
Aquanaomi17