How to Replace Spark DataFrame Column Value? – Scala and PySpark

Similar to relational database tables, a DataFrame in Spark is a dataset organized into named columns. Spark DataFrame consists of columns and rows. When you are working on a multiple data sources, you may receive a data with unwanted values such as junk characters in your Spark DataFrames. In this article, we will check how to replace such a junk value in Spark DataFrame column. We will also check methods to replace values in Spark DataFrames. Replace Spark DataFrame Column Value It is very common requirement to cleanse the source…

Comments Off on How to Replace Spark DataFrame Column Value? – Scala and PySpark

How to Find String in Spark DataFrame? – Scala and PySpark

As a data engineer, you get to work on many different datasets and databases. It is common requirement to enrich the input data by filtering out unwanted data or to search for a specific string within a data or Spark DataFrame if you are working on Apache Spark. For example, identify the unwanted or junk string within a dataset. In this article, we will check how to find a string in Spark DataFrame with various methods. We shall see what are different methods find a string in a given data…

Comments Off on How to Find String in Spark DataFrame? – Scala and PySpark