In this example, we create a table, and then start a Structured Streaming query to write to that table. Column // Create an example dataframe. What can this tool do? Choose from the following 5 JSON conversions offered by this tool: CSV to JSON - array of JSON structures matching your CSV plus JSONLines (MongoDB) mode. Sounds like you need to filter columns, but not records. Before we jump into Spark with Scala examples, let's presenting a high-level overview of the key concepts you need to know. Aggregator doc/comments says: A base class for user-defined aggregations, which can be used in [[DataFrame]] and [[Dataset]]it works well with Dataset/GroupedDataset, but i am having no luck using it with DataFrame/GroupedData. This has a performance. •The DataFrame data source APIis consistent, across data formats. I'm trying to find the best solution to convert an entire Spark dataframe to a scala Map collection. Here's a quick look at how to use the Scala Map class, with a colllection of Map class examples. But my requirement is different, i want to add Average column in test dataframe behalf of id column. Define a catalog that maps the schema from Spark to HBase. scala> val sqlcontext = new org. When used with unpaired data, the key for groupBy() is decided by the function literal passed to the method. a 2-D table with schema; Basic Operations. txt") A SparkSession can be used create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files. 1> RDD Creation a) From existing collection using parallelize meth. ix[x,y] = new_value Edit: Consolidating what was said below, you can’t modify the existing dataframe. Map to use the mutable map set. spark scala dataframe dataframes spark dataset spark sql databricks sparksql apache spark spark-sql csv aggregations spark dataframe spark streaming apache spark datasets scala notebook sql rdd scala spark mllib pyspark spark 2. NET conventions. First, for primitive types in examples or demos, you can create Datasets within a Scala or Python notebook or in your sample Spark application. Apache Spark: RDD, DataFrame or Dataset? January 15, 2016. It provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs. _ import org. You want to load the Greenplum Database table named otp_c, specifying airlineid as the partition column. val colNames = Seq("c1", "c2") df. The Apache Spark Dataset API provides a type-safe, object-oriented programming interface. py, takes in as its only argument a text file containing the input data, which in our case is iris. It creates a new collection with the. The last one throws an exception when it meets a corrupted record. Apache Spark is a fast and general-purpose cluster computing system. com In Apache Spark map example, we’ll learn about all ins and outs of map function. 4 Obtaining Spark. Spark scala SQL (dataframe)基本操作 上面两张图可以看出dataframe的map操作和rdd基本是一样的,但是dataframe. MapType class and applying some DataFrame SQL functions on the map column using the Scala example. These examples are extracted from open source projects. These fundamentals will be used throughout the rest of this Spark with Scala course. We'll look at how Dataset and DataFrame behave in Spark 2. spark dataset api with examples - tutorial 20 November 8, 2017 adarsh Leave a comment A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational operations. The encoder maps the domain specific type T to Spark's internal type system. Dataframe is an immutable collection of data distributed across nodes similar to RDD. First Create SparkSession. scala> lines. Another downside with the DataFrame API is that it is very scala-centric and while it does support Java, the support is limited. words from file using Map Reduce,Write a Program to calculate. Write an Spatial SQL/DataFrame application. spark scala dataframe dataframes spark dataset spark sql databricks sparksql apache spark spark-sql csv aggregations spark dataframe spark streaming apache spark datasets scala notebook sql rdd scala spark mllib pyspark spark 2. The following code examples show how to use org. Temporary views in Spark SQL are session-scoped and will disappear if the session that creates it terminates. The map function is applicable to both Scala's Mutable and Immutable collection data structures. One way could be to map each state to a number between 1 and 50. If None is given (default) and index is True, then the index names are used. Here sc es una instancia de SparkContext que está disponible implícitamente en Spark-shell. Oracle database is one of the widely used databases in world. 使用Scala将org. Tehcnically, we're really creating a second DataFrame with the correct names. Sparks intention is to provide an alternative for Kotlin/Java developers that want to develop their web applications as expressive as possible and with minimal boilerplate. You can call saveToMongoDB() with a WriteConfig object to specify a different database and collection. 6 the Project Tungsten was introduced, an initiative which seeks to improve the performance and scalability of Spark. We'll look at how Dataset and DataFrame behave in Spark 2. This binary structure often has much lower memory footprint as well as. import org. In a text editor, construct a Map of read options for the GreenplumRelationProvider data source. Another downside with the DataFrame API is that it is very scala-centric and while it does support Java, the support is limited. Spark SQL supports many built-in transformation functions in the module org. scala> val fileToDf = file1. In general, Spark DataFrames are quite efficient in terms of performance as shown in Fig. Spark rdd Map: Map will take each row as input and return an RDD for the row. From there we can make predicted values given some inputs. This method uses reflection to generate the schema of an RDD that contains specific types of objects. Natural Language Toolkit¶. Python Code. This Spark sql tutorial also talks about SQLContext, Spark SQL vs. DataFrame from CSV vs. Spark provides key capabilities in the form of Spark SQL, Spark Streaming, Spark ML and Graph X all accessible via Java, Scala, Python and R. It has interfaces that provide Spark with additional information about the structure of both the data and the computation being performed. SparkSession import org. Please note that I have used Spark-shell's scala REPL to execute following code, Here sc is an instance of SparkContext which is implicitly available in Spark-shell. _ therefore we will start off by importing that. Spark, built on Scala, has gained a lot of recognition and is being used widely in productions. Dataframes are similar to tables in RDBMS in that data is organized into named columns. 6) organized into named columns (which represent the variables). As an extension to the existing RDD API, DataFrames features seamless integration with all big data tooling and infrastructure via Spark. I prefer the Scala version due to the strong typing and the ability to catch errors at compile time. Since I have hundred. 0 features a new Dataset API. Write to MongoDB. As you can see in above image RDD X is the source RDD and RDD Y is a resulting RDD. In this article, I will explain how to explode array or list and map DataFrame columns to rows using different Spark explode functions (explode, explore_outer, posexplode, posexplode_outer) with Scala example. Raj on SPARK Dataframe Alias AS; Nikunj Kakadiya on SPARK Dataframe Alias AS; PySpark RDD operations – Map, Filter, SortBy, reduceByKey, Joins – SQL & Hadoop on Basic RDD operations in PySpark; Spark Dataframe – monotonically_increasing_id – SQL & Hadoop on PySpark – zipWithIndex Example; Subhasis Mohanty on PySpark – zipWithIndex. Current information is correct but more content will probably be added in the future. The Scala interface for Spark SQL supports automatically converting an RDD containing case classes to a DataFrame. L et us look at an example where we apply zipWithIndex on the RDD and then convert the resultant RDD into a DataFrame to perform SQL queries. Scala's Predef object offers an implicit conversion that lets you write key -> value as an alternate syntax for the pair (key, value). Finally, you can create a bound Column using the Dataset the column is supposed to be part of using Dataset. But instead of predicting a dependant value given some independent input values it predicts a probability and binary, yes or no, outcome. map(row) val dataFrame = spark. Unification of the Java and Scala APIs. Spark Scala Tutorial: In this Spark Scala tutorial you will learn how to read data from a text file, CSV, JSON or JDBC source to dataframe. This guide covers the Scala language features needed for Spark programmers. DataFrame from Parquet: Parquet is a column oriented file storage format which Spark has native support for. GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together. From spark 2. [email protected] The following code examples show how to use org. Subtract Mean. Steps to Connect Oracle Database from Spark. For example, given a class Person with two fields, name (string) and age (int), an encoder is used to tell Spark to generate code at runtime to serialize the Person object into a binary structure. // IMPORT DEPENDENCIES import org. write spark read example cargar scala apache-spark hadoop apache-spark-sql spark-dataframe hdfs Analice CSV como DataFrame/DataSet con Apache Spark y Java Español. You can apply normal spark functions (map, filter, ReduceByKey etc) to sql query results. JavaBeans and Scala case classes representing rows of the data can also be used as a hint to generate the schema. I know this one is possible using join but I think join process is too slow. Consider an example of map with key as student ids and student names as the value. Maxmunus Solutions is providing the best quality of this Apache Spark and Scala programming language. This helps Spark optimize execution plan on these queries. As mentioned in an earlier post, the new API will make it easy for data scientists and people with a SQL background to perform analyses with Spark. New in Spark 2. Resilient distributed datasets are Spark’s main and original programming abstraction for working with data distributed across multiple nodes in your cluster. Even if you’re new to SpatialKey, it’s easy to start exploring the power of location intelligence. Each map key corresponds to a header name, and each data value corresponds the value of that key the specific line. SparkContext. If you are working with Spark, you will most likely have to write transforms on dataframes. Scala offers lists, sequences, and arrays. jdbc, mysql, Spark, spark dataframe, spark sql, spark with scala Top Big Data Courses on Udemy You should Take When i was newbie , I used to take so many courses on Udemy and other platforms to learn. Scala's Predef object offers an implicit conversion that lets you write key -> value as an alternate syntax for the pair (key, value). This post will help you get started using Apache Spark DataFrames with Scala on the MapR Sandbox. We can also perform aggregation on some specific columns which is equivalent to GROUP BY clause we have in typical SQL. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. Resilient distributed datasets are Spark's main and original programming abstraction for working with data distributed across multiple nodes in your cluster. map, flatMap, filter). This opinionated guide exists to provide both novice and expert Python developers a best practice handbook to the installation, configuration, and usage of Python on a daily basis. contains("test")). Spark SQl is a Spark module for structured data processing. Spark supports columns that contain arrays of values. Product interface. {SparkConf, SparkContext} import org. CA, NY, TX, etc. Now let us see some transformations like map,flatmap, filter which are commonly used. Converting an Apache Spark RDD to an Apache Spark DataFrame. Apache Spark is a fast and general-purpose cluster computing system. By default Scala uses immutable map. The encoder maps the domain specific type T to Spark's internal type system. Spark Framework is a simple and expressive Java/Kotlin web framework DSL built for rapid development. Creating Spark UDF with extra parameters via currying - example. join(df2p1) scala> df3. Even if you’re new to SpatialKey, it’s easy to start exploring the power of location intelligence. Here are a few examples of parsing nested data structures in JSON using Spark DataFrames (examples here done with Spark 1. See in my example: # generate 13 x 10 array and creates rdd with 13 records, each record. At Scala, we focus on building strategic partnerships with the world’s leading brands to apply a wide array of technology — including digital signs, mobile sensors, audience intelligence, virtual reality and computer vision technology — in the physical space. c) or semi-structured (JSON) files, we often get data with complex structures like. write spark read example cargar scala apache-spark hadoop apache-spark-sql spark-dataframe hdfs Analice CSV como DataFrame/DataSet con Apache Spark y Java Español. Map to use the mutable map set. Reading Oracle data using the Apache Spark DataFrame API The new version of Apache Spark (1. Now that we have some Scala methods to call from PySpark, we can write a simple Python job that will call our Scala methods. Spark DataFrames API is a distributed collection of data organized into named columns and was created to support modern big data and data science applications. txt and converting it into DataFrame, using the Map. Below is a simple Spark / Scala example describing how to convert a CSV file to an RDD and perform some simple filtering. Global Temporary View. But JSON can get messy and parsing it can get tricky. x that is supposed to be a replacement to the older RDD API. Here is a simple example of converting your List into Spark RDD and then converting that Spark RDD into Dataframe. x that is supposed to be a replacement to the older RDD API. contains("test")). Thus, if you want to leverage the power of Scala and Spark to make sense of big data, this book is for you. _ with import s2cc. MapType class and applying some DataFrame SQL functions on the map column using the Scala example. How to Create a Spark Dataset? There are multiple ways of creating Dataset based on usecase. Really appreciated the information and please keep sharing, I would like to share some information regarding online training. [full outer join on nullable columns for spark dataframe] how-to apply a full outer join on a spark dataframe #scala #spark #dataframe #joins: spark-dataframe-fullouter-join-on-nullable-columns. Two types of Apache Spark RDD operations are- Transformations and Actions. How would I go about changing a value in row x column y of a dataframe? In pandas this would be df. Prior to Spark 1. How to convert dataframe table 1 to table 2 using value mapping in spark scala. SparkContext. We define a case class that defines the schema of the table. For our example let's start with two clusters to see if they have a relationship to the label, "UP" or "DN". Spark Transformations in Scala Examples. In fact, the time it takes to do so usually prohibits this from any data set that is at all interesting. length) val totalLength = lineLengths. 0, Whole-Stage Code Generation, and go through a simple example of Spark 2. This course gives you the knowledge you need to achieve success. Now that we have some Scala methods to call from PySpark, we can write a simple Python job that will call our Scala methods. Here is a simple example of converting your List into Spark RDD and then converting that Spark RDD into Dataframe. What can this tool do? Choose from the following 5 JSON conversions offered by this tool: CSV to JSON - array of JSON structures matching your CSV plus JSONLines (MongoDB) mode. Tehcnically, we're really creating a second DataFrame with the correct names. Below is the scala code which you can run in a zeppelin notebook or spark-shell on your HDInsight cluster with Spark. Lets begin the tutorial and discuss about the DataFrame API Operations using Spark 1. 3 the Java API and Scala API have been unified. Simple example would be calculating logarithmic value of each RDD element (RDD) and creating a new RDD with the returned elements. In this post I am going to describe with example code as to how we can add a new column to an existing DataFrame using withColumn() function of DataFrame. You may need to include a map transformation to convert the data into a Document (or BsonDocument or a DBObject). For example, if you are the user gpadmin with. reduce((a, b) => a + b) Pretty straightforward, right? Things are getting interesting when you want to convert your Spark RDD to DataFrame. count Now the execution time get back to normal. One of Apache Spark’s selling points is the cross-language API that allows you to write Spark code in Scala, Java, Python, R or SQL (with others supported unofficially). These examples give a quick overview of the Spark API. Spark DataFrames API is a distributed collection of data organized into named columns and was created to support modern big data and data science applications. You may access the tutorials in any order you choose. I am using the Spark Scala API. First, for primitive types in examples or demos, you can create Datasets within a Scala or Python notebook or in your sample Spark application. Spark is built on the concept of distributed datasets, which contain arbitrary Java or Python objects. Consider the. 4 Obtaining Spark. I need a cle, i need an architectural cad drafter per hour, i need a logo to use hope for change, spark dataframe example, spark dataframe tutorial, spark dataframe join, spark sql tutorial, spark dataset, spark dataframe schema, spark create dataframe from list, spark dataframe map example, i need a flyer design in an hour, lord i need you. As I continue practicing with Scala, it seemed appropriate to follow-up with a second part, comparing how to handle dataframes in the two programming languages, in order to get the data ready before the modeling process. Spark SQL: Relational Data Processing in Spark Michael Armbrusty, Reynold S. Simplilearn's Spark SQL Tutorial will explain what is Spark SQL, importance and features of Spark SQL. Scala's Predef object offers an implicit conversion that lets you write key -> value as an alternate syntax for the pair (key, value). Writing a Spark DataFrame to ORC files Created Mon, Dec 12, 2016 Last modified Mon, Dec 12, 2016 Spark Hadoop Spark includes the ability to write multiple different file formats to HDFS. The Scala interface for Spark SQL supports automatically converting an RDD containing case classes to a DataFrame. It is designed to ease developing Spark applications for processing large amount of structured tabular data on Spark infrastructure. Any problems email [email protected] Spark scala SQL (dataframe)基本操作 上面两张图可以看出dataframe的map操作和rdd基本是一样的,但是dataframe. Since then, a lot of new functionality has been added in Spark 1. You enter the spark-shell interactive scala shell. x that is supposed to be a replacement to the older RDD API. This post will help you get started using Apache Spark DataFrames with Scala on the MapR Sandbox. Spark Dataframe Tail. The following example creates a 10 document RDD and saves it to the MongoDB collection specified in the SparkConf:. Raj on SPARK Dataframe Alias AS; Nikunj Kakadiya on SPARK Dataframe Alias AS; PySpark RDD operations – Map, Filter, SortBy, reduceByKey, Joins – SQL & Hadoop on Basic RDD operations in PySpark; Spark Dataframe – monotonically_increasing_id – SQL & Hadoop on PySpark – zipWithIndex Example; Subhasis Mohanty on PySpark – zipWithIndex. DefaultSource class that creates DataFrames and Datasets from MongoDB. But instead of predicting a dependant value given some independent input values it predicts a probability and binary, yes or no, outcome. All examples will be in Scala. Basically map is defined in abstract class RDD in spark and it is a transformation kind of operation which means it is a lazy operation. Join GitHub today. We will be using the last one in our example because we do not want to proceed in case of data errors. Please note that I have used Spark-shell's scala REPL to execute following code, Here sc is an instance of SparkContext which is implicitly available in Spark-shell. Concepts "A DataFrame is a distributed collection of data organized into named columns. scala Explore Channels Plugins & Tools Pro Login About Us. scala> lines. One of Apache Spark’s selling points is the cross-language API that allows you to write Spark code in Scala, Java, Python, R or SQL (with others supported unofficially). In Python, we will do all this by using Pandas library, while in Scala we will use Spark. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. For our example let's start with two clusters to see if they have a relationship to the label, "UP" or "DN". The functools module is for higher-order functions: functions that act on or return other functions. Consider the. I first heard of Spark in late 2013 when I became interested in Scala, the language in which Spark is written. Apache Spark is a fast and general-purpose cluster computing system. Log Processing Sample. DataFrame row to Scala case class using map() In the previous example, we showed how to convert DataFrame row to Scala case class using as[]. The difference between mutable and immutable is that when the object is immutable the object itself cannot be changed. Tenga en cuenta que he utilizado Scala REPL de Spark-shell para ejecutar el siguiente código. However not all language APIs are created equal and in this post we'll look at the differences from both a syntax and performance point of view. Spark packages are available for many different HDFS versions Spark runs on Windows and UNIX-like systems such as Linux and MacOS The easiest setup is local, but the real power of the system comes from distributed operation Spark runs on Java6+, Python 2. He also has extensive experience in machine learning. Lets begin the tutorial and discuss about the SparkSQL and DataFrames Operations using Spark 1. Lets begin the tutorial and discuss about the DataFrame API Operations using Spark 1. MapType class and applying some DataFrame SQL functions on the map column using the Scala example. Consider the. We will see three such examples and various operations on these dataframes. More than a year later, Spark's DataFrame API provides a rich set of operations for data munging, SQL queries, and analytics. For example, here's a way to create a Dataset of 100 integers in a notebook. Spark, built on Scala, has gained a lot of recognition and is being used widely in productions. By Andy Grove. scala> val df = spark. You can vote up the examples you like and your votes will be used in our system to product more good examples. These examples give a quick overview of the Spark API. Sparks intention is to provide an alternative for Kotlin/Java developers that want to develop their web applications as expressive as possible and with minimal boilerplate. Spark SQL introduces a tabular functional data abstraction called DataFrame. By default Scala uses immutable map. For our example let's start with two clusters to see if they have a relationship to the label, "UP" or "DN". Hope it answer your question. •The DataFrame data source APIis consistent, across data formats. “Apache Spark, Spark SQL, DataFrame, Dataset” Jan 15, 2017. Get an overview of Spark and its associated ecosystem. Creating a Spark Dataframe. Below is a simple Spark / Scala example describing how to convert a CSV file to an RDD and perform some simple filtering. In this article, I will explain how to explode array or list and map DataFrame columns to rows using different Spark explode functions (explode, explore_outer, posexplode, posexplode_outer) with Scala example. The names of the arguments to the. Datasets provide compile-time type safety—which means that production applications can be checked for errors before they are run—and they allow direct operations over user-defined classes. The Spark tutorials with Scala listed below cover the Scala Spark API within Spark Core, Clustering, Spark SQL, Streaming, Machine Learning MLLib and more. In DataFrame data is organized into named columns. Use the connector’s MongoSpark helper to facilitate the creation of a DataFrame:. Convert RDD to DataFrame with Spark Learn how to convert an RDD to DataFrame in Databricks Spark CSV library. This feature is not available right now. One way could be to map each state to a number between 1 and 50. py, takes in as its only argument a text file containing the input data, which in our case is iris. scala> val fileToDf = file1. So this is the way we operated on the dataframe in spark 1. Dataframes can be transformed into various forms using DSL operations defined in Dataframes API, and its various functions. Here we provide an example of how to do linear regression using the Spark ML (machine learning) library and Scala. Dataframe in Apache Spark is a distributed collection of data, organized in the form of columns. >>> df4 = spark. spark drop columns column cast array scala apache-spark dataframe apache-spark-sql apache-spark-ml How to sort a dataframe by multiple column(s)? Drop data frame columns by name. Resilient distributed datasets are Spark’s main and original programming abstraction for working with data distributed across multiple nodes in your cluster. 3 flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item). Underlying processing of dataframes is done by RDD’s , Below are the most used ways to create the dataframe. By Andy Grove. •The DataFrame data source APIis consistent, across data formats. This example transforms each line in the CSV to a Map with form header-name -> data-value. select(colNames). For our example let's start with two clusters to see if they have a relationship to the label, "UP" or "DN". When I run spark job in scala IDE output is generated correctly but. Dataframe exposes the obvious method df. x if using the mongo-spark-connector_2. Deploying the key capabilities is crucial whether it is on a Standalone framework or as a part of existing Hadoop installation and configuring with Yarn and Mesos. Authors of examples: Matthias Langer and Zhen He Emails addresses: m. The names of the arguments to the case class are read using reflection and become the names of the columns. How to calculate Rank in dataframe using scala with example. You may access the tutorials in any order you choose. The Mongo Spark Connector provides the com. He also has extensive experience in machine learning. --Spark website Spark provides fast iterative/functional-like capabilities over large data sets, typically by. SparkConf import org. Now that we have some Scala methods to call from PySpark, we can write a simple Python job that will call our Scala methods. The page outlines the steps to visualize spatial data using GeoSparkViz. Spark packages are available for many different HDFS versions Spark runs on Windows and UNIX-like systems such as Linux and MacOS The easiest setup is local, but the real power of the system comes from distributed operation Spark runs on Java6+, Python 2. This has a performance. join(df2p1) scala> df3. This helps Spark optimize execution plan on these queries. And of course this can be problematic because if you have no type information, how do you suddenly create type information?. Apache Spark : RDD vs DataFrame vs DatasetWith Spark2. We will do multiple regression example, meaning there is more than one input variable. For example, given a class Person with two fields, name (string) and age (int), an encoder is used to tell Spark to generate code at runtime to serialize the Person object into a binary structure. Please try again later. Setting the location of 'warehouseLocation' to Spark warehouse. Map to use the mutable map set. DataFrame; Next, let us walk through two examples to illustrate the use cases of grouped map Pandas UDFs. JSON is a very common way to store data. repartition(1) scala> val df2p1 = df2. RDDs can be manipulated through operations like map, filter, For example, the Scala code The main abstraction in Spark SQL's API is a DataFrame, a dis-. textFile("data. To start a Spark's interactive shell:. DataFrame API Example Using Different types of Functionalities. createDataFrame it was useful for you to explore the process of converting Spark RDD to DataFrame and Dataset. Seq no and 2. You use linear or logistic. Apache Spark is a fast and general-purpose cluster computing system. In DataFrame data is organized into named columns. 9+)¶ XGBoost4J-Spark is a project aiming to seamlessly integrate XGBoost and Apache Spark by fitting XGBoost to Apache Spark's MLLIB framework. Almost all companies use Oracle as a data warehouse appliance or transaction systems. spark dataset api with examples - tutorial 20 November 8, 2017 adarsh Leave a comment A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational operations. [email protected] Spark DataFrame columns support arrays and maps, which are great for data sets that have an arbitrary length. The first part of the blog consists of how to port hive queries to Spark DataFrames, the second part discusses the performance tips for DataFrames. 0 Structured Streaming (Streaming with DataFrames) that you can. CA, NY, TX, etc. Spark Example to load a csv file in Scala. MapType class and applying some DataFrame SQL functions on the map column using the Scala example. Apache Spark is a fast, scalable data processing engine for big data analytics. 3) introduces a new API, the DataFrame. "I studied Spark for the first time using Frank's course "Apache Spark 2 with Scala - Hands On with Big Data!". 6 comes with support for automatically generating encoders for a wide variety of types, including primitive types (e. spark scala dataframe dataframes spark dataset spark sql databricks sparksql apache spark spark-sql csv aggregations spark dataframe spark streaming apache spark datasets scala notebook sql rdd scala spark mllib pyspark spark 2. Apache Spark Examples.