Big Data Integration and Processing
Quiz 7 Answer
Quiz 7 - More on Spark
Q1) Which part of SPARK is in charge of creating RDDs?
- Storage
- Local CPU
- Driver Program
- Spark Executor
- Worker Node
Q2) How does lazy evaluation work in Spark?
- Transformations are not executed until the action stage.
- Actions are queued and executed at a certain threshold.
- Actions are not executed until the transformation stage.
- Transformations are queued and executed at a certain threshold.
Q3) What are the consequences of lazy evaluation as mentioned in lecture?
- There are no consequences.
- Hiccups within the system during queue execution.
- Errors sometimes do not show up until the action stage.
Q4) What is a wide transformation?
- The name for the most used transformations.
- Transformations that take a lot of nodes to complete.
- A transformation that requires data shuffling across node partitions.
- A longer time-taking transformation compared to narrow transformations.
Q5) Where does the data for each worker node get sent to after a collect function is called?
- Spark SQL
- Spark Context
- Spark Streaming
- Other Worker Nodes
- None; Stays in the Same Node
Q6) What are DataFrames?
- A type of narrow transformation.
- A column like data format that can be read by Spark SQL.
- A special type of data node that contains framework to manipulate SQL.
Q7) Can RDD's be converted into DataFrames directly without manipulation?
- Yes
- No: lines have to be converted into row.
- No: RDD’s needed to be made relational first.
- No: RDD’s cannot be converted into DataFrames.
Q8) What is the function of Spark SQL as mentioned in lecture? (Choose 3)
- Connect to variety of databases.
- Better worker node interpolation.
- Enables relational queries on Spark.
- Better ability to manipulate big data.
- Deploy business intelligence tools over Spark.
- Efficient data manipulation using SQL like structure.
Q9) What is a triplet in GraphX?
- A type of data to contain vertex info.
- A type of data to contain edge info.
- A type of data to contain both edge and vertex info.
- A type of data to contain the information on connections between vertices and edges.
--------------------------------------------------------------------------------------------------------------------------------------------------------
Big Data Integration and Processing
0 Comments