82,738
questions
0
votes
0
answers
6
views
Error: The requested KieBase "normalRulesKBase" does not exist in Drools
I'm working on creating multiple KieSessions dynamically using Drools in Scala. I have two sets of rules stored in different folders (slidingwindow and normal) within the resources/rules directory. ...
0
votes
0
answers
10
views
function to deduplicate data using pyspark partition, windowing and row by coalescing resulting in dropped rows of single element partition
I am trying to deduplicate a dataset based on partitions on a specific key, within that partition I want to take the most recent non-null values of all the other rows. There are a lot of columns so ...
0
votes
0
answers
5
views
Keytab File configration at Spark ReadStream from Kafka format
Dears,
Iam Tring to consume from kafka using spark.readStream.format("kafka") but every time i got an error of
**Caused by: org.apache.kafka.common.KafkaException: Failed to load SSL ...
0
votes
0
answers
13
views
Left join two sorted dataframes in pyspark
I have two dataframes that are ordered by a certain column, which is also the join key.
Is it possible to merge these two dataframes returning a sorted one in O(n+m)? I don't care if its not done in ...
0
votes
0
answers
14
views
Load multiple parquet files in order in pyspark
I have a time series dataset split among multiple parquet files.
files = [
'0to9999.parquet',
'10000to19999.parquet',
'20000to20000.parquet',
...
]
spark.read.parquet(*files).show()
...
0
votes
1
answer
16
views
Read CSV with "§" as delimiter using Databricks autoloader
I'm very new to spark streaming and autoloader and had a query on how we might be able to get autoloader to read a text file with "§" as the delimiter. Below I tried reading the file as a ...
0
votes
1
answer
16
views
How to Read Multiple CSV Files with Skipping Rows and Footer in PySpark Efficiently?
I have several CSV files with an inconsistent number of data rows without a header row and I want to read these files into a single PySpark DataFrame. The structure of the CSV files is as follows:
...
0
votes
0
answers
10
views
Few records are missing when writing to a avro file using Spark
While writing a avro file on hdfs, few records are getting dropped.
This happens occasionally. The data in dataset is flat, not nested, still sometimes some
records are dropped while writing.
what ...
0
votes
0
answers
11
views
Spark Job failed in EMR with exit code 137
The spark job runs in emr with 7.2 version. It's failing with below error. Any advice or tips to debug?
Error
Job aborted due to stage failure: Task 1518 in stage 14.0 failed 4 times, most recent ...
-1
votes
0
answers
12
views
Scala Spark JDBC format
I have a scenario with Scala spark JDBC connecting to Snowflake using a private key.
When I give a privatekey (of type java.security.PrivateKey) in the
option(attached screenshot), it throws an error ...
0
votes
0
answers
36
views
how create new column or update column inside a dataframe?
Good morning eveyone
I have a question today, that i don't know exactly how to do.
Having a dataframe, i need create columns dynamically, and those column will contein a set of validations that i have ...
0
votes
1
answer
18
views
difference between spark.kubernetes.driver.request.cores, spark.kubernetes.driver.limit.cores and spark.driver.cores
I am new to Kubernetes but not to Apache Spark. I am currently working on EMR on EKS which is essentially spark on kubernetes and I cant get my head around the difference between spark.kubernetes....
0
votes
1
answer
43
views
How convert a list into multiple columns and a dataframe?
i have a challenge today, is:
Having a list of s3 paths, inside a list, split this and get a dataframe with one column with the path and a new column with just the name of the folder.
my list have the ...
0
votes
1
answer
23
views
How to handle accented letter in Pyspark
I have a pyspark dataframe in which I need to add "translate" for a column.
I have the below code
df1 = df.withColumn("Description", F.split(F.trim(F.regexp_replace(F....
0
votes
0
answers
21
views
Parquet Partition Strategy for single small file and read Optimization
I have single parquet file ranging from 5 to 100Mb of Data.
While i tried to create partition on Date column multiple files are getting created which reduces the read performance as there are many ...