Skip to main content

All Questions

1 vote
1 answer
56 views

DataFrame write to Azure-SQL row-by-row performance

We are using azure databricks spark to write data to Azure SQL database. Last week we switched from runtime 9.1 (spark 3.1) to newer 14.3 (spark 3.5) using spark native JDBC driver. However when we ...
chabin's user avatar
  • 21
1 vote
1 answer
43 views

Pyspark Java whitelisted class issues

I am trying to migrate hive metastore into unity catalog, so that I have to enabled unity catalog in my existing cluster but my one of the notebook we are using below code is not supported now and ...
Developer Rajinikanth's user avatar
-1 votes
1 answer
73 views

How avoid small files using databricks when data is been writing

I am performing two write operations, each in a different notebook. The first operation involves writing approximately 22 million records with 90 columns, and the second involves writing about 10 ...
BryC's user avatar
  • 115
1 vote
1 answer
100 views

Spark : Persist was not working as expected

I was using a PySpark DataFrame where I called a UDF function. This UDF function makes an API call and stores the response back into the DataFrame. My goal is to store the DataFrame and reuse it ...
Deepak Kumar's user avatar
1 vote
2 answers
71 views

Scala UDF with long running initialization

I have a scala UDF which functionally works, but slower than it should be. It is a function that looks up location from an IP addresses. This uses a relatively large database (200+ MB), which I ...
Martin Secher Skeem's user avatar
0 votes
1 answer
108 views

DLT - how to get pipeline_id and update_id?

I need to insert pipeline_id and update_id in my Delta Live Table (DLT), the point being to know which pipeline created which row. How can I obtain this information? I know you can get job_id and ...
Zeruno's user avatar
  • 1,619
0 votes
1 answer
54 views

How to ensure UDF containing api call is being run across multiple worker nodes in Databricks

Working in an Azure Databricks environment. I have a spark dataframe containing 200 rows, each of which represents a container in ADLS. For each row I need to sum the size of the blobs in that ...
user26590429's user avatar
0 votes
0 answers
39 views

Can we create multiple Spark executors within a single driver node on a Databricks cluster?

I have a power user compute with a single driver node and I'm trying to parallelize forecasting across multiple series by aggregating the data and doing a groupBy and then an apply on the groupBy. The ...
Manav Karthikeyan's user avatar
0 votes
1 answer
262 views

Error : No parent external location found while creating a dataframe and saving as table in ADLS on azuredatabricks free trial

Working on my free trial azure account , I am trying to copy csv files to ADLS Gen2 and save the dataframe as table in adls silver layer. code: DForderItems = spark.read.csv("abfss://bronze@...
azuredataengineer89's user avatar
1 vote
2 answers
128 views

Unable to InferSchema with Databricks SQL

When I attempt to create a table with Databricks SQL I get the error: AnalysisException: Unable to infer schema for CSV. It must be specified manually. %sql CREATE TABLE IF NOT EXISTS newtabletable ...
Patterson's user avatar
  • 2,635
1 vote
1 answer
90 views

Databricks Performance Tuning with Joins around 15 tables with around 200 Million Rows

As part of our Databricks notebook, we are trying to run sql joining around 15 Delta Tables with 1 Fact and around 14 Dimension Tables. The Data coming out of Joins is around 200 Million records. ...
Nanda's user avatar
  • 61
2 votes
2 answers
126 views

Spark reading CSV with bad records

I am trying to read a csv file in spark using a pre-defined schema. For which I use: df = (spark.read.format("csv") .schema(schema) .option("sep", ";") ...
Tarique's user avatar
  • 649
0 votes
1 answer
74 views

Does the performance of inserting data into Azure SQL database from Databricks affected by the sizing of the database?

I am now working on a use case which would need to ingest huge data (~10M of rows) from Azure Databricks materialized view to an Azure SQL database. The database is using the elastic standard (eDTU 50)...
akaya_1992's user avatar
0 votes
0 answers
61 views

No PYTHON_UID found for session (random uuid)

I'm facing a strange error when writing a stream from a dataframe coming from Azure Databricks to a Postgres table: First I use databricks-connect==14.2.1 to connect to create a session for our ...
Mohamed Aoutir's user avatar
0 votes
1 answer
102 views

Databricks: create table using delta table path giving AnalysisException: The specified schema does not match existing schema at dbfs:/mnt/datalake/

I have have a Delta Table Path. Based on this, I am trying to create table using create table if not exists dbo.DimCustomer (CustomerSK BIGINT GENERATED BY DEFAULT AS IDENTITY, first_name varchar(128),...
Nanda's user avatar
  • 61

15 30 50 per page
1
2 3 4 5
67