Big Data Integration and Processing Quiz 5 Answer

Big Data Integration and Processing
Quiz 5 Answer

Quiz 5 - Pipeline and Tools

Q1) What is data-parallelism as defined in lecture?

Having multiple multiple data pipelines at the same time.

Simultaneously processing input data from multiple cores.

Running the same function simultaneously for the partitions of a data set on multiple cores.

At each step of the data pipeline, process values simultaneously by using multiple cores.

Q2) Of the following, which procedure best generalizes big data procedures such as (but not limited to) the map reduce process?

split->sort->merge

split->do->merge

split->map->shuffle and sort->reduce

split ->shuffle and sort->map->reduce

Q3) What are the three layers for the Hadoop Ecosystem? (Choose 3)

Data Creation and Storage

Data Management and Storage

Data Integration and Processing

Data Manipulation and Integration

Coordination and Workflow Management

Q4) What are the 5 key points in order to categorize big data systems?

Coordination, Latency, Productivity, Speed, Fault Tolerance

Execution model, Speed, Scalability, Flexibility, Fault Tolerance

Coordination, Latency, Productivity, Flexibility, Fault Tolerance

Execution model, Latency, Scalability, Programming Language, Fault Tolerance

Q5) What is the lambda architecture as shown in lecture?

A type of swappable data processing layer.

An architecture that natively supports lambda calculus.

A type of hybrid data processing architecture.

A type of architecture that only contains part of the data processing method.

Q6) Which of the following scenarios is NOT an aggregation operation?

Removing undefined values.

Counting the total number of data.

Counting the total number of data per type.

Averaging the total number of data per type.

Q7) What usually happens to data when aggregated as mentioned in lecture?

Data become organized.

Data becomes smaller.

Data becomes personalized.

Data becomes faster to process.

Q8) What is K-means clustering?

Classify data by k actions.
Divide samples using k lines.
Classify data by k decisions.
Group samples into k clusters.

Q9) Why is Hadoop not a good platform for machine learning as mentioned in lecture? (Choose 4)

Too massive.

Java support only.

Bottleneck using HDFS.

Map and Reduce Based Computation.

Unable to support machine learning.

No interactive shell and streaming.

Requires nodes and multiple machines.

Q10) What are the layers (parts) of Spark? (Choose 5)

MLlib

Graphx

SparkSQL

Spark Graph

Spark Core

Spark RDD

Spark Streaming

Worker Node

Q11) What is in-memory processing?

Having the input completely in disk.

Having the input completely in memory.

Having the pipeline completely in disk.

Having the pipeline completely in memory.

Writing data to disk between pipeline steps.

Writing data to memory between pipeline steps.

--------------------------------------------------------------------------------------------------------------------------------------------------------

Big Data Integration and Processing

Big Data Integration and Processing Quiz 1 Answer

Big Data Integration and Processing Quiz 2 Answer

Big Data Integration and Processing Quiz 3 Answer

Big Data Integration and Processing Quiz 4 Answer

Big Data Integration and Processing Quiz 5 Answer

Big Data Integration and Processing Quiz 6 Answer

Big Data Integration and Processing Quiz 7 Answer

Big Data Integration and Processing Quiz 8 Answer

Big Data Integration and Processing Quiz 9 Answer

Big Data Integration and Processing Quiz 10 Answer

Big Data Integration and Processing Quiz 5 Answer