Newest 'apache-spark' Questions

0 votes

0 answers

6 views

Error: The requested KieBase "normalRulesKBase" does not exist in Drools

I'm working on creating multiple KieSessions dynamically using Drools in Scala. I have two sets of rules stored in different folders (slidingwindow and normal) within the resources/rules directory. ...

Basil Saju

1

asked 36 mins ago

0 votes

0 answers

10 views

function to deduplicate data using pyspark partition, windowing and row by coalescing resulting in dropped rows of single element partition

I am trying to deduplicate a dataset based on partitions on a specific key, within that partition I want to take the most recent non-null values of all the other rows. There are a lot of columns so ...

MarMar

305

asked 4 hours ago

0 votes

0 answers

5 views

Keytab File configration at Spark ReadStream from Kafka format

Dears, Iam Tring to consume from kafka using spark.readStream.format("kafka") but every time i got an error of **Caused by: org.apache.kafka.common.KafkaException: Failed to load SSL ...

Emad Kamal EL-hewihy

1

asked 10 hours ago

0 votes

0 answers

13 views

Left join two sorted dataframes in pyspark

I have two dataframes that are ordered by a certain column, which is also the join key. Is it possible to merge these two dataframes returning a sorted one in O(n+m)? I don't care if its not done in ...

David Davó

579

asked 11 hours ago

0 votes

0 answers

14 views

Load multiple parquet files in order in pyspark

I have a time series dataset split among multiple parquet files. files = [ '0to9999.parquet', '10000to19999.parquet', '20000to20000.parquet', ... ] spark.read.parquet(*files).show() ...

David Davó

579

asked 11 hours ago

0 votes

1 answer

16 views

Read CSV with "§" as delimiter using Databricks autoloader

I'm very new to spark streaming and autoloader and had a query on how we might be able to get autoloader to read a text file with "§" as the delimiter. Below I tried reading the file as a ...

beingmanny

1

asked 11 hours ago

0 votes

1 answer

16 views

How to Read Multiple CSV Files with Skipping Rows and Footer in PySpark Efficiently?

I have several CSV files with an inconsistent number of data rows without a header row and I want to read these files into a single PySpark DataFrame. The structure of the CSV files is as follows: ...

Purushottam Nawale

47

asked 12 hours ago

0 votes

0 answers

10 views

Few records are missing when writing to a avro file using Spark

While writing a avro file on hdfs, few records are getting dropped. This happens occasionally. The data in dataset is flat, not nested, still sometimes some records are dropped while writing. what ...

CodeRunner

149

asked 13 hours ago

0 votes

0 answers

11 views

Spark Job failed in EMR with exit code 137

The spark job runs in emr with 7.2 version. It's failing with below error. Any advice or tips to debug? Error Job aborted due to stage failure: Task 1518 in stage 14.0 failed 4 times, most recent ...

user3858193

1,478

asked 14 hours ago

-1 votes

0 answers

12 views

Scala Spark JDBC format

I have a scenario with Scala spark JDBC connecting to Snowflake using a private key. When I give a privatekey (of type java.security.PrivateKey) in the option(attached screenshot), it throws an error ...

Ganesh Dogiparthi

383

asked 17 hours ago

0 votes

0 answers

36 views

how create new column or update column inside a dataframe?

Good morning eveyone I have a question today, that i don't know exactly how to do. Having a dataframe, i need create columns dynamically, and those column will contein a set of validations that i have ...

Julio

511

asked 18 hours ago

0 votes

1 answer

18 views

difference between spark.kubernetes.driver.request.cores, spark.kubernetes.driver.limit.cores and spark.driver.cores

I am new to Kubernetes but not to Apache Spark. I am currently working on EMR on EKS which is essentially spark on kubernetes and I cant get my head around the difference between spark.kubernetes....

Vikas Saxena

1,173

asked 19 hours ago

0 votes

1 answer

43 views

How convert a list into multiple columns and a dataframe?

i have a challenge today, is: Having a list of s3 paths, inside a list, split this and get a dataframe with one column with the path and a new column with just the name of the folder. my list have the ...

Julio

511

asked yesterday

0 votes

1 answer

23 views

How to handle accented letter in Pyspark

I have a pyspark dataframe in which I need to add "translate" for a column. I have the below code df1 = df.withColumn("Description", F.split(F.trim(F.regexp_replace(F....

user175025

394

asked yesterday

0 votes

0 answers

21 views

Parquet Partition Strategy for single small file and read Optimization

I have single parquet file ranging from 5 to 100Mb of Data. While i tried to create partition on Date column multiple files are getting created which reduces the read performance as there are many ...

Rohan Gala

1

asked yesterday

Collectives™ on Stack Overflow

Error: The requested KieBase "normalRulesKBase" does not exist in Drools

function to deduplicate data using pyspark partition, windowing and row by coalescing resulting in dropped rows of single element partition

Keytab File configration at Spark ReadStream from Kafka format

Left join two sorted dataframes in pyspark

Load multiple parquet files in order in pyspark

Read CSV with "§" as delimiter using Databricks autoloader

How to Read Multiple CSV Files with Skipping Rows and Footer in PySpark Efficiently?

Few records are missing when writing to a avro file using Spark

Spark Job failed in EMR with exit code 137

Scala Spark JDBC format

how create new column or update column inside a dataframe?

difference between spark.kubernetes.driver.request.cores, spark.kubernetes.driver.limit.cores and spark.driver.cores

How convert a list into multiple columns and a dataframe?

How to handle accented letter in Pyspark

Parquet Partition Strategy for single small file and read Optimization

Hot Network Questions

Collectives™ on Stack Overflow

Related Tags