Earn recognition and rewards for your Microsoft Fabric Community contributions and become the hero our community deserves.
Learn moreSee when key Fabric features will launch and what’s already live, all in one place and always up to date. Explore the new Fabric roadmap
I am new to MS fabric and trying to do a tutorial hands on - https://learn.microsoft.com/en-us/fabric/data-science/tutorial-data-science-batch-scoring
I am running the following code to Instantiate MLFlowTransformer object
from synapse.ml.predict import MLFlowTransformer
model = MLFlowTransformer( inputCols=list(df_test.columns),
outputCol='predictions',
modelName='lgbm_sm',
modelVersion=1 )
It is failing with following error
ValueError: Cannot convert numpy type object to spark type
How do I resolve this error? Is it something to do with list(df_test.columns)? Please advice.
The error you're seeing —
ValueError: Cannot convert numpy type object to spark type — typically means that Spark is encountering a data type in your input that it doesn't know how to convert into its internal types, and it often relates to using numpy.object_ or dtype=object in your DataFrame columns.
In your code:
model = MLFlowTransformer( inputCols=list(df_test.columns), outputCol='predictions', modelName='lgbm_sm', modelVersion=1 )
You're passing list(df_test.columns) as the inputCols. This itself is likely not the problem.
The real issue is most likely with the df_test DataFrame — it seems to be a Pandas DataFrame (or a Spark DataFrame with some columns of ambiguous/numpy object type), and SynapseML expects a Spark DataFrame with well-defined schema and no ambiguous types.
Ensure df_test is a Spark DataFrame, not a Pandas one. If it's currently a Pandas DataFrame, convert it:
spark_df_test = spark.createDataFrame(df_test)
But even when converting, Spark sometimes can't infer schema properly from object dtype. So it’s safer to:
Make sure df_test has clear types before conversion:
import pandas as pd # Ensure all columns have concrete types (avoid object dtype) df_test = df_test.astype({ 'col1': 'float64', 'col2': 'int64', 'col3': 'string', # Replace with your actual column names/types # ... })
Then convert:
spark_df_test = spark.createDataFrame(df_test)
Pass the Spark DataFrame to your pipeline:
model = MLFlowTransformer( inputCols=spark_df_test.columns, outputCol='predictions', modelName='lgbm_sm', modelVersion=1 ) transformed = model.transform(spark_df_test)
Ensure you're using a Spark DataFrame, not a Pandas one.
Make sure all columns in your DataFrame have explicit types, especially no object types.
Use .astype() in Pandas before converting to Spark DataFrame.
Use spark.createDataFrame() for conversion.
Would you like help inferring the correct schema for your DataFrame or code to automatically clean the column types?
I ran into the same problem.
BTW, I used the notebook from Explore a sample in Fabric.
It turned out that the "Exited" column was converted into a string where the plot over the categorical columns was coded. Instead, the Exited column has to be of type integer before you feed it into the training.
You can check in your experiment in the MLmodel file:
In line 34 the dtype should be int64, not object!
Don't forget to adapt the version number to your latest version of the model in the call to MLFlowTransformer.
Hope it helps!
Michael
@MiSchroe - Thanks for clarifying the cause of the error. Pardon my ignorance, but could you explain how you would change the dtype from object to int64 in the context of this data science tutorial. Any help would be greatly appreciated as I'd like to better understand how this all fits together and complete it.
In the notebook 3-train-evaluate I have changed the data type of the Exited column to int:
NOTE: In row 3 I load the table dbo.df_clean, because I have used a Lakehouse with the new schema. In the original tutorial the schema has to be omitted.
It seems like the issue might be related to the format of the DataFrame you're using. Make sure you're working with a valid Spark DataFrame, not a pandas one, as Spark expects a different structure.
Also, double-check that your model name and version are correct and compatible with your setup. It’s a good idea to ensure that your Spark and MLFlow versions are aligned with MS Fabric as well.
yes, me also getting same error. can any one please tell us. how to slove it
Hi @Anonymous
I followed the tutorials without experiencing any error. Instead of entering codes manually, I downloaded and imported notebooks from microsoft/fabric-samples · GitHub. You can try running these notebooks and see if they work. Ensure that you need to attach the same lakehouse you used in the other parts of this series.
For the error you encountered, you can check the output of the previous code cell of the problem code. At the same time make sure that the attached lakehouse is correct. "df_test" is a table created in Tutorial Part 3.
Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!
Thank you for taking time to reply to my question. I followed the same steps exactly as you mentioned in your reply. Downloaded the notebooks from GitHub, followed the sequence of steps as mentioned in tutorial, no changes to note book code, all steps ran absolutely fine until this particular cell as mentioned in my initial question. Any more insights as to what is causing this error and how to resolve it?
Hi @Anonymous
I tested it again and it still worked at my end. I got nothing but a warning.
I guess maybe our environment is a little different? I ran the notebook in Runtime 1.2. My service version is 13.0.24766.49 and data region is UK South (London). Can you check your environment?
Best Regards,
Jing
Hi @Anonymous
Thank you for your answers, I followed your suggestion and switched to Runtime 1.2. I can ensure that I did upload and execute those notebooks without any modification, and I tried to run all the previous 3 notebooks and used the same Runtime 1.2.
Unfortunately, I'm facing this new error
any idea how to fix this? Thanks.