Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

See when key Fabric features will launch and what’s already live, all in one place and always up to date. Explore the new Fabric roadmap

Reply
Anonymous
Not applicable

Cannot convert numpy type object to spark type

I am new to MS fabric and trying to do a tutorial hands on - https://learn.microsoft.com/en-us/fabric/data-science/tutorial-data-science-batch-scoring

 

I am running the following code to Instantiate MLFlowTransformer object

 

from synapse.ml.predict import MLFlowTransformer

model = MLFlowTransformer( inputCols=list(df_test.columns),

        outputCol='predictions',

       modelName='lgbm_sm',

       modelVersion=1 )

 

It is failing with following error

ValueError: Cannot convert numpy type object to spark type

 

How do I resolve this error? Is it something to do with list(df_test.columns)? Please advice.

10 REPLIES 10
fikakagyta
New Member

The error you're seeing —
ValueError: Cannot convert numpy type object to spark type — typically means that Spark is encountering a data type in your input that it doesn't know how to convert into its internal types, and it often relates to using numpy.object_ or dtype=object in your DataFrame columns.

Root Cause:

In your code:

model = MLFlowTransformer(
    inputCols=list(df_test.columns),
    outputCol='predictions',
    modelName='lgbm_sm',
    modelVersion=1
)

You're passing list(df_test.columns) as the inputCols. This itself is likely not the problem.

The real issue is most likely with the df_test DataFrame — it seems to be a Pandas DataFrame (or a Spark DataFrame with some columns of ambiguous/numpy object type), and SynapseML expects a Spark DataFrame with well-defined schema and no ambiguous types.


How to Fix It:

  1. Ensure df_test is a Spark DataFrame, not a Pandas one. If it's currently a Pandas DataFrame, convert it:

spark_df_test = spark.createDataFrame(df_test)

But even when converting, Spark sometimes can't infer schema properly from object dtype. So it’s safer to:

  1. Make sure df_test has clear types before conversion:

import pandas as pd

# Ensure all columns have concrete types (avoid object dtype)
df_test = df_test.astype({
    'col1': 'float64',
    'col2': 'int64',
    'col3': 'string',  # Replace with your actual column names/types
    # ...
})

Then convert:

spark_df_test = spark.createDataFrame(df_test)
  1. Pass the Spark DataFrame to your pipeline:

model = MLFlowTransformer(
    inputCols=spark_df_test.columns,
    outputCol='predictions',
    modelName='lgbm_sm',
    modelVersion=1
)

transformed = model.transform(spark_df_test)

Summary

  • Ensure you're using a Spark DataFrame, not a Pandas one.

  • Make sure all columns in your DataFrame have explicit types, especially no object types.

  • Use .astype() in Pandas before converting to Spark DataFrame.

  • Use spark.createDataFrame() for conversion.

Would you like help inferring the correct schema for your DataFrame or code to automatically clean the column types?

MiSchroe
Frequent Visitor

I ran into the same problem.

BTW, I used the notebook from Explore a sample in Fabric.

 

It turned out that the "Exited" column was converted into a string where the plot over the categorical columns was coded. Instead, the Exited column has to be of type integer before you feed it into the training.

 

You can check in your experiment in the MLmodel file:

MiSchroe_2-1741257333515.png

 

 

In line 34 the dtype should be int64, not object!

Don't forget to adapt the version number to your latest version of the model in the call to MLFlowTransformer.

 

Hope it helps!

Michael

@MiSchroe - Thanks for clarifying the cause of the error.  Pardon my ignorance, but could you explain how you would change the dtype from object to int64 in the context of this data science tutorial.  Any help would be greatly appreciated as I'd like to better understand how this all fits together and complete it.

In the notebook 3-train-evaluate I have changed the data type of the Exited column to int:

MiSchroe_0-1743585832726.png

NOTE: In row 3 I load the table dbo.df_clean, because I have used a Lakehouse with the new schema. In the original tutorial the schema has to be omitted.

DrewAnderson
New Member

It seems like the issue might be related to the format of the DataFrame you're using. Make sure you're working with a valid Spark DataFrame, not a pandas one, as Spark expects a different structure.

Also, double-check that your model name and version are correct and compatible with your setup. It’s a good idea to ensure that your Spark and MLFlow versions are aligned with MS Fabric as well.

Baludesineti
New Member

yes, me also getting same error. can any one please tell us. how to slove it 

Anonymous
Not applicable

Hi @Anonymous 

 

I followed the tutorials without experiencing any error. Instead of entering codes manually, I downloaded and imported notebooks from microsoft/fabric-samples · GitHub. You can try running these notebooks and see if they work. Ensure that you need to attach the same lakehouse you used in the other parts of this series.

vjingzhanmsft_0-1735017790173.png

 

For the error you encountered, you can check the output of the previous code cell of the problem code. At the same time make sure that the attached lakehouse is correct. "df_test" is a table created in Tutorial Part 3

vjingzhanmsft_1-1735017875249.png

 

Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!

Anonymous
Not applicable

Thank you for taking time to reply to my question. I followed the same steps exactly as you mentioned in your reply. Downloaded the notebooks from GitHub, followed the sequence of steps as mentioned in tutorial, no changes to note book code, all steps ran absolutely fine until this particular cell as mentioned  in my initial question. Any more insights as to what is causing this error and how to resolve it?

Anonymous
Not applicable

Hi @Anonymous 

 

I tested it again and it still worked at my end. I got nothing but a warning. 

vjingzhanmsft_0-1735269140637.png

 

I guess maybe our environment is a little different? I ran the notebook in Runtime 1.2. My service version is 13.0.24766.49 and data region is UK South (London). Can you check your environment?

vjingzhanmsft_1-1735269338349.pngvjingzhanmsft_3-1735269715625.png

 

Best Regards,
Jing

Hi @Anonymous 
Thank you for your answers,  I followed your suggestion and switched to Runtime 1.2. I can ensure that I did upload and execute those notebooks without any modification, and I tried to run all the previous 3 notebooks and used the same Runtime 1.2.
Unfortunately, I'm facing this new error 
Screenshot 2025-02-26 165644.png


any idea how to fix this? Thanks.

Helpful resources

Announcements
May FBC25 Carousel

Fabric Monthly Update - May 2025

Check out the May 2025 Fabric update to learn about new features.

May 2025 Monthly Update

Fabric Community Update - May 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors
Top Kudoed Authors