Spark failed to connect to the MetaStore Server

The problem You may encounter errors like this when running a Spark script / application: Solutions If you do not need the MetaStore server, there are two ways to disable it. Please note that Spark version >= 2.x is required. The first way (via spark2-submit parameters) The second way (via SparkConf object) Java example: Scala […]

READ MORE

Spark custom parquet OutputCommiter

If you need to make your own implementation of OutputCommiter for spark parquet-output tasks, then first of all you need to make a class that extends from org.apache.hadoop.mapreduce.OutputCommiter: Further, regardless of implementation of the OutputCommiter, you need to register the full class name in the Hadoop (!) configuration of Spark like this: Example for Scala: […]

READ MORE

Spark concurrent write to same HDFS path

The problem Sometimes you need to run such a scenario when several Spark tasks write data along the same path to HDFS. During the execution of tasks, you may encounter some errors: Suppose we have one Spark task, that writes to the hdfs://data/test directory. At runtime, Spark will make a temporary directory: hdfs://data/test/_temporary/0. There is […]

READ MORE

Spring TCP server

Let’s suppose that you need configure a TCP server in Spring / Spring Boot application. We will make it that the server prints the client’s request on the screen, then sends a response to the client. First, let’s include the spring-integration-ip dependency. In Maven it will look like this: Like gRPC ideology, in this case […]

READ MORE

Terracotta Ehcache notes

Java has a caching standard – JCache. This standard is described in the JSR-107 specification. This standardized approach makes it easier to integrate with different tools that perform the same function. Thus, Spring Boot supports JCache integration. Therefore, whatever implementation of JCache we would use, the code remains the same. Some of the common JCache […]

READ MORE

gRPC Timeouts & Deadlines

First, let’s figure out how the timeout differs from the deadline. Timeout is relative value, but deadline is absolute. If we take a timeout and a deadline of 5 seconds, and we will write the following line (the code for an example, it will not work if you run it): Then, if you want to […]

READ MORE

gRPC channel for target was not shutdown properly

Imagine you are faced with an error like this: The point is that if you create a gRPC channel, you must close it after use. Let’s say you’ve already made a channel like this: After use, it will need to be closed. It is important to wait until it closes; awaitTermination() is used for this:

READ MORE

Scala + Maven + IntelliJ IDEA project setup

To create a Scala project using Maven to manage dependencies in IntelliJ IDEA, you first need to install the Scala plugin: File -> Preferences (Settings) -> Plugins Search Scala in the Marketplace Install it and restart IntelliJ IDEA Next, let’s create a regular Java + Maven project: File -> New -> Project Select Maven -> […]

READ MORE