Skip to main content
0 votes
0 answers
6 views

Error: The requested KieBase "normalRulesKBase" does not exist in Drools

I'm working on creating multiple KieSessions dynamically using Drools in Scala. I have two sets of rules stored in different folders (slidingwindow and normal) within the resources/rules directory. ...
Basil Saju's user avatar
0 votes
0 answers
10 views

function to deduplicate data using pyspark partition, windowing and row by coalescing resulting in dropped rows of single element partition

I am trying to deduplicate a dataset based on partitions on a specific key, within that partition I want to take the most recent non-null values of all the other rows. There are a lot of columns so ...
MarMar's user avatar
  • 305
0 votes
0 answers
5 views

Keytab File configration at Spark ReadStream from Kafka format

Dears, Iam Tring to consume from kafka using spark.readStream.format("kafka") but every time i got an error of **Caused by: org.apache.kafka.common.KafkaException: Failed to load SSL ...
Emad Kamal EL-hewihy's user avatar
0 votes
0 answers
13 views

Left join two sorted dataframes in pyspark

I have two dataframes that are ordered by a certain column, which is also the join key. Is it possible to merge these two dataframes returning a sorted one in O(n+m)? I don't care if its not done in ...
David Davó's user avatar
0 votes
0 answers
14 views

Load multiple parquet files in order in pyspark

I have a time series dataset split among multiple parquet files. files = [ '0to9999.parquet', '10000to19999.parquet', '20000to20000.parquet', ... ] spark.read.parquet(*files).show() ...
David Davó's user avatar
0 votes
1 answer
16 views

Read CSV with "§" as delimiter using Databricks autoloader

I'm very new to spark streaming and autoloader and had a query on how we might be able to get autoloader to read a text file with "§" as the delimiter. Below I tried reading the file as a ...
beingmanny's user avatar
0 votes
1 answer
16 views

How to Read Multiple CSV Files with Skipping Rows and Footer in PySpark Efficiently?

I have several CSV files with an inconsistent number of data rows without a header row and I want to read these files into a single PySpark DataFrame. The structure of the CSV files is as follows: ...
Purushottam Nawale's user avatar
0 votes
0 answers
10 views

Few records are missing when writing to a avro file using Spark

While writing a avro file on hdfs, few records are getting dropped. This happens occasionally. The data in dataset is flat, not nested, still sometimes some records are dropped while writing. what ...
CodeRunner's user avatar
0 votes
0 answers
11 views

Spark Job failed in EMR with exit code 137

The spark job runs in emr with 7.2 version. It's failing with below error. Any advice or tips to debug? Error Job aborted due to stage failure: Task 1518 in stage 14.0 failed 4 times, most recent ...
user3858193's user avatar
  • 1,478
-1 votes
0 answers
12 views

Scala Spark JDBC format

I have a scenario with Scala spark JDBC connecting to Snowflake using a private key. When I give a privatekey (of type java.security.PrivateKey) in the option(attached screenshot), it throws an error ...
Ganesh Dogiparthi's user avatar
0 votes
0 answers
36 views

how create new column or update column inside a dataframe?

Good morning eveyone I have a question today, that i don't know exactly how to do. Having a dataframe, i need create columns dynamically, and those column will contein a set of validations that i have ...
Julio's user avatar
  • 511
0 votes
1 answer
18 views

difference between spark.kubernetes.driver.request.cores, spark.kubernetes.driver.limit.cores and spark.driver.cores

I am new to Kubernetes but not to Apache Spark. I am currently working on EMR on EKS which is essentially spark on kubernetes and I cant get my head around the difference between spark.kubernetes....
Vikas Saxena's user avatar
  • 1,173
0 votes
1 answer
43 views

How convert a list into multiple columns and a dataframe?

i have a challenge today, is: Having a list of s3 paths, inside a list, split this and get a dataframe with one column with the path and a new column with just the name of the folder. my list have the ...
Julio's user avatar
  • 511
0 votes
1 answer
23 views

How to handle accented letter in Pyspark

I have a pyspark dataframe in which I need to add "translate" for a column. I have the below code df1 = df.withColumn("Description", F.split(F.trim(F.regexp_replace(F....
user175025's user avatar
0 votes
0 answers
21 views

Parquet Partition Strategy for single small file and read Optimization

I have single parquet file ranging from 5 to 100Mb of Data. While i tried to create partition on Date column multiple files are getting created which reduces the read performance as there are many ...
Rohan Gala's user avatar

15 30 50 per page
1
2 3 4 5
5516