Newest 'apache-spark+azure-databricks' Questions

1 vote

1 answer

56 views

DataFrame write to Azure-SQL row-by-row performance

We are using azure databricks spark to write data to Azure SQL database. Last week we switched from runtime 9.1 (spark 3.1) to newer 14.3 (spark 3.5) using spark native JDBC driver. However when we ...

chabin

21

asked Sep 12 at 15:12

1 vote

1 answer

43 views

Pyspark Java whitelisted class issues

I am trying to migrate hive metastore into unity catalog, so that I have to enabled unity catalog in my existing cluster but my one of the notebook we are using below code is not supported now and ...

Developer Rajinikanth

330

asked Sep 11 at 9:47

-1 votes

1 answer

73 views

How avoid small files using databricks when data is been writing

I am performing two write operations, each in a different notebook. The first operation involves writing approximately 22 million records with 90 columns, and the second involves writing about 10 ...

BryC

115

asked Sep 3 at 23:24

1 vote

1 answer

100 views

Spark : Persist was not working as expected

I was using a PySpark DataFrame where I called a UDF function. This UDF function makes an API call and stores the response back into the DataFrame. My goal is to store the DataFrame and reuse it ...

Deepak Kumar

25

asked Aug 25 at 8:17

1 vote

2 answers

71 views

Scala UDF with long running initialization

I have a scala UDF which functionally works, but slower than it should be. It is a function that looks up location from an IP addresses. This uses a relatively large database (200+ MB), which I ...

Martin Secher Skeem

151

asked Aug 14 at 12:39

0 votes

1 answer

108 views

DLT - how to get pipeline_id and update_id?

I need to insert pipeline_id and update_id in my Delta Live Table (DLT), the point being to know which pipeline created which row. How can I obtain this information? I know you can get job_id and ...

Zeruno

1,619

asked Aug 8 at 0:27

0 votes

1 answer

54 views

How to ensure UDF containing api call is being run across multiple worker nodes in Databricks

Working in an Azure Databricks environment. I have a spark dataframe containing 200 rows, each of which represents a container in ADLS. For each row I need to sum the size of the blobs in that ...

user26590429

1

asked Jul 31 at 17:12

0 votes

0 answers

39 views

Can we create multiple Spark executors within a single driver node on a Databricks cluster?

I have a power user compute with a single driver node and I'm trying to parallelize forecasting across multiple series by aggregating the data and doing a groupBy and then an apply on the groupBy. The ...

Manav Karthikeyan

43

asked Jul 25 at 20:27

0 votes

1 answer

262 views

Error : No parent external location found while creating a dataframe and saving as table in ADLS on azuredatabricks free trial

Working on my free trial azure account , I am trying to copy csv files to ADLS Gen2 and save the dataframe as table in adls silver layer. code: DForderItems = spark.read.csv("abfss://bronze@...

azuredataengineer89

1

asked Jul 17 at 17:54

1 vote

2 answers

128 views

Unable to InferSchema with Databricks SQL

When I attempt to create a table with Databricks SQL I get the error: AnalysisException: Unable to infer schema for CSV. It must be specified manually. %sql CREATE TABLE IF NOT EXISTS newtabletable ...

Patterson

2,635

asked Jul 13 at 7:28

1 vote

1 answer

90 views

Databricks Performance Tuning with Joins around 15 tables with around 200 Million Rows

As part of our Databricks notebook, we are trying to run sql joining around 15 Delta Tables with 1 Fact and around 14 Dimension Tables. The Data coming out of Joins is around 200 Million records. ...

Nanda

61

asked Jul 11 at 4:11

2 votes

2 answers

126 views

Spark reading CSV with bad records

I am trying to read a csv file in spark using a pre-defined schema. For which I use: df = (spark.read.format("csv") .schema(schema) .option("sep", ";") ...

Tarique

649

asked Jul 10 at 16:45

0 votes

1 answer

74 views

Does the performance of inserting data into Azure SQL database from Databricks affected by the sizing of the database?

I am now working on a use case which would need to ingest huge data (~10M of rows) from Azure Databricks materialized view to an Azure SQL database. The database is using the elastic standard (eDTU 50)...

akaya_1992

17

asked Jul 5 at 2:36

0 votes

0 answers

61 views

No PYTHON_UID found for session (random uuid)

I'm facing a strange error when writing a stream from a dataframe coming from Azure Databricks to a Postgres table: First I use databricks-connect==14.2.1 to connect to create a session for our ...

Mohamed Aoutir

655

asked Jul 3 at 16:02

0 votes

1 answer

102 views

Databricks: create table using delta table path giving AnalysisException: The specified schema does not match existing schema at dbfs:/mnt/datalake/

I have have a Delta Table Path. Based on this, I am trying to create table using create table if not exists dbo.DimCustomer (CustomerSK BIGINT GENERATED BY DEFAULT AS IDENTITY, first_name varchar(128),...

Nanda

61

asked Jul 3 at 3:11

Collectives™ on Stack Overflow

All Questions

DataFrame write to Azure-SQL row-by-row performance

Pyspark Java whitelisted class issues

How avoid small files using databricks when data is been writing

Spark : Persist was not working as expected

Scala UDF with long running initialization

DLT - how to get pipeline_id and update_id?

How to ensure UDF containing api call is being run across multiple worker nodes in Databricks

Can we create multiple Spark executors within a single driver node on a Databricks cluster?

Error : No parent external location found while creating a dataframe and saving as table in ADLS on azuredatabricks free trial

Unable to InferSchema with Databricks SQL

Databricks Performance Tuning with Joins around 15 tables with around 200 Million Rows

Spark reading CSV with bad records

Does the performance of inserting data into Azure SQL database from Databricks affected by the sizing of the database?

No PYTHON_UID found for session (random uuid)

Databricks: create table using delta table path giving AnalysisException: The specified schema does not match existing schema at dbfs:/mnt/datalake/

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags