Skip to content

[python-package] UserWarning: Found ... in params #6324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
memeplex opened this issue Feb 16, 2024 · 10 comments · Fixed by #6579
Closed

[python-package] UserWarning: Found ... in params #6324

memeplex opened this issue Feb 16, 2024 · 10 comments · Fixed by #6579
Labels

Comments

@memeplex
Copy link

Description

This and many other similar issues have been reported along the years. Yet I cannot find a reasonable solution. lightgbm is constantly throwing uninteresting warnings about aliases. They are not even easy to locally suppress once you start running subprocesses (n_jobs = -1 or > 1) and have to deal with PYTHONWARNINGS in the environment. In such cases there is a long stream of warnings, coming for example from a grid search. The verbose argument does little to silence lightgbm since the warnings are thrown at Python side.

It was suggested to use a parameter's name instead of its alias, but take for example num_iterations that is documented in https://lightgbm.readthedocs.io/en/latest/Parameters.html as a name, not an alias: it nevertheless throws a warning and you have to try every other alias in order to discover that n_iterations is the right one.

I think some action has to be taken here.

Reproducible example

import numpy as np
import pandas as pd
import lightgbm as lgb

X = pd.DataFrame(dict(x1=np.linspace(0, 1, 100), x2=np.linspace(0, 1, 100)))
y = 2 * X.x1 + 3 * X.x2
lgb.LGBMRegressor(verbose=-1, num_iterations=10).fit(X, y)

Environment info

LightGBM version or commit hash: 4.3.0

Command(s) you used to install LightGBM:

pip install lightgbm

macOS Sonoma
Python 3.11.7

@memeplex
Copy link
Author

Just for reference, currently I'm doing this first thing after launching a kernel:

    warnings.filterwarnings(
        "ignore", category=UserWarning, module="lightgbm.engine", lineno=172
    )
    os.environ["PYTHONWARNINGS"] = f"ignore::UserWarning:lightgbm.engine:172"

I would prefer to match the message using a regex but PYTHONWARNINGS doesn't support that and I need it to keep subprocesses silent. So in order not to suppress potentially interesting warnings I match the line number, which is quite fragile.

@jameslamb jameslamb changed the title UserWarning: Found ... in params [python-package] UserWarning: Found ... in params Feb 16, 2024
@jmoralez
Copy link
Collaborator

Hey @memeplex, thanks for using LightGBM. The arguments for the scikit-learn API are a bit different to be consistent with other scikit-learn estimators, you can see here that the expected argument for the iterations is n_estimators.

I believe this issue is specific to that argument and that line, since other arguments don't issue alias warnings with verbosity<0. I believe we can check if verbosity<0 there to be consistent with the behavior for the other arguments.

@jameslamb
Copy link
Collaborator

Sure, we can reconsider this.

Linking the relevant prior discussions:

Here is where the specific warnings you're talking about come from:

for alias in _ConfigAliases.get("num_iterations"):
if alias in params:
num_boost_round = params.pop(alias)
_log_warning(f"Found `{alias}` in params. Will use it instead of argument")

for alias in _ConfigAliases.get("num_iterations"):
if alias in params:
_log_warning(f"Found '{alias}' in params. Will use it instead of 'num_boost_round' argument")
num_boost_round = params.pop(alias)

I think some action has to be taken here.

I'll be more specific.

In my opinion, this warning causes more confusion than it prevents, and should just be completely removed.

I think it'd be a better state for lightgbm (the Python package) to have the following characteristics:

  • no warning raised when parameters from params override keyword arguments
  • well-documented rules for the order of precedence in resolving parameters
    • e.g., constructor keyword arguments in scikit-learn interface, train() / cv() keyword arguments, parameters stored in an existing Booster if loading with init_model or similar, parameters passed through via params dictionary (with main params preferred to aliases)
  • tests confirming that that order is respected (many of these already exist)
  • documentation on how to determine which parameters were actually used during training (the recent work to store all params in the model file should help with this)

LightGBM's interface (especially in the Python and R packages) allows a lot of different ways to route configuration to training, and I think it'd be valuable to devote some time to elevating the rigor applied to documenting and testing those mechanisms.

@jmoralez @borchero @shiyu1994 @guolinke what do you think?

@jameslamb
Copy link
Collaborator

Sorry @jmoralez , I didn't see your response before posting mine. I still think we should consider removing this warning.

@jmoralez
Copy link
Collaborator

I agree on not issuing a warning if there's a single alias but I think we should check if there are several definitions for the iterations and issue warnings if they are as we do here:

Log::Warning("%s is set=%s, %s=%s will be ignored. Current value: %s=%s",

@jameslamb
Copy link
Collaborator

I agree with that. Doing something like this:

lgb.train(
    params={
        "num_iterations": 10,
        "n_iter": 500
    }
   ...
)

I'd want to be warned that I've provided multiple aliases via the same mechanism that mean the same thing and have different values.

But that should be accomplished on the C++ side, via the code in that link you shared.

@jmoralez
Copy link
Collaborator

But for train and cv we pick a single value for that

for alias in _ConfigAliases.get("num_iterations"):
if alias in params:
num_boost_round = params.pop(alias)
_log_warning(f"Found `{alias}` in params. Will use it instead of argument")
params["num_iterations"] = num_boost_round

for alias in _ConfigAliases.get("num_iterations"):
if alias in params:
_log_warning(f"Found '{alias}' in params. Will use it instead of 'num_boost_round' argument")
num_boost_round = params.pop(alias)
params["num_iterations"] = num_boost_round

so that would be the only place to warn (I believe that's why that warning is there)

@jameslamb
Copy link
Collaborator

ohhhh I missed that this already was specific to only num_iterations!!!

When I saw this in the description:

take for example num_iterations

the phrase "for example" made me think it was happening for all or at least many more parameters. @memeplex do you see this warning for any other parameters? If so can you share a reproducible example of that?

would be the only place to warn

Even in that case, I think it'd be preferable for LightGBM to just resolve num_boost_round + things passed through params on the Python side, and to only issue a warning in the following case:

  • multiple aliases for num_iterations passed through params dictionary
  • they have different values
num_iteration_configs_provided = {
    alias: params[alias]
    for alias in _ConfigAliases.get("num_iterations")
    if alias in params
}
multiple_values_provided = len(num_iteration_configs_provided) > 1 
values_conflict = len(set(num_iteration_configs_provided.values())) != len(num_iteration_configs_provided)

if multiple_values_provided and values_conflict:
     value_string = ", ".join(f"{alias}={val}" for alias, val in num_iteration_configs_provided)
     _log_warning(
         f"Found conflicting values for num_iterations provided via 'params': {value_string}."
         "To be confident in the maximum number of boosting rounds LightGBM will perform and to "
         "suppress this warning, modify 'params' so that only one of those is present."
    )

params = _choose_param_value(
    main_param_name='num_iterations',
    params=params,
    default_value=num_boost_round
)
num_boost_round = params["num_iterations"]

And for it to never warning about the num_boost_round keyword argument having a different value than one passed through params, since "pass in params to override other configuration" is the approach we promote as many other places as possible in the library.

So:

# no warning
lgb.train(
   params={"n_iter": 5},
   num_boost_round=10,
   ...
)

# no warning
lgb.train(
   params={"n_iter": 10},
   num_boost_round=10,
   ...
)

# no warning
lgb.train(
   params={"n_iter": 10, "num_iterations": 10},
   num_boost_round=10,
   ...
)

# no warning
lgb.train(
   params={"n_iter": 5, "num_iterations": 5},
   num_boost_round=10,
   ...
)

# warning
lgb.train(
   params={"n_iter": 5, "num_iterations": 75},
   num_boost_round=10,
   ...
)

@memeplex
Copy link
Author

the phrase "for example" made me think it was happening for all or at least many more parameters.

You are right, in the past I've seen this happening with almost every parameter, but checking it again with ~20 parameters and aliases, I can only reproduce it with num_iterations variations. Much better, but still noisy since the number of trees is like the first parameter one sets.

@jameslamb
Copy link
Collaborator

jameslamb commented Feb 16, 2024

I can only reproduce it with num_iterations variations

Excellent!

Narrowing it down like that helps a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants