spark Archives - Java / Cloud / BigData

October 21, 2021October 21, 2021
Hadoop, Java, Spark
Leave a comment

Spark’s User Defined Functions in Java

In this article we will find the answer for questions: How to change the column in Spark? How to modify column in Spark? In other words: how to create a user defined function (UDF) and apply it. For example, let’s have a look to UDF, that takes a String and returns a String. For Spark […]

October 21, 2021October 21, 2021
Hadoop, Java, Scala, Spark
Leave a comment

Spark failed to connect to the MetaStore Server

The problem You may encounter errors like this when running a Spark script / application: Solutions If you do not need the MetaStore server, there are two ways to disable it. Please note that Spark version >= 2.x is required. The first way (via spark2-submit parameters) The second way (via SparkConf object) Java example: Scala […]

October 21, 2021October 21, 2021
Hadoop, Java, Scala, Spark
Leave a comment

Spark custom parquet OutputCommiter

If you need to make your own implementation of OutputCommiter for spark parquet-output tasks, then first of all you need to make a class that extends from org.apache.hadoop.mapreduce.OutputCommiter: Further, regardless of implementation of the OutputCommiter, you need to register the full class name in the Hadoop (!) configuration of Spark like this: Example for Scala: […]

October 21, 2021October 21, 2021
Hadoop, Java, Scala, Spark
Leave a comment

Spark concurrent write to same HDFS path

The problem Sometimes you need to run such a scenario when several Spark tasks write data along the same path to HDFS. During the execution of tasks, you may encounter some errors: Suppose we have one Spark task, that writes to the hdfs://data/test directory. At runtime, Spark will make a temporary directory: hdfs://data/test/_temporary/0. There is […]

June 12, 2021October 21, 2021
Hadoop, Scala, Spark
Leave a comment

How to run Scala script with Object on Spark

Suppose you have code like this in Scala: And you run it like this: And you want to put it in a Scala Object so that IntelliJ IDEA doesn’t highlight the code in red. Firstly, you are going to do smth like that: But when you run the script, nothing happens and spark2-shell just opens. […]