Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addition to work with NGBoost #24

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RichardScottOZ
Copy link
Contributor

Hi Steven,

FYI, did this last year to use your work with NGBoost, finally got around to updating.

@stevenpawley
Copy link
Owner

Thanks for the contribution Richard - a prediction method that can output distribution will be useful, I'll merge this in the next few days.

@RichardScottOZ
Copy link
Contributor Author

Yes, haven't put any sensible relevant comments/doc bits on it, as it was literally just do this so I could output.

@RichardScottOZ
Copy link
Contributor Author

I saw a prediction module? Yesterday I was working on a hack for using hdbscan...are the functions in raster.py going to migrate..or version with other uses?

@stevenpawley
Copy link
Owner

Hi Richard,

I'm just working on a couple of problems relating to the in-memory files feature that I added to Pyspatialml, but I'd like to return to this. NGBoost looks like it uses a predict_dist method. Do you know if this works within scikit learn's structures, e.g. it can function inside a pipeline etc?

Scikit learn doesn't appear to support prediction intervals very uniformly/extensively. GradientBoostingRegressor enables prediction intervals via quantile predictions, but it does this without a new method, by setting or modifying the 'alpha' parameter of the estimator in-place, and then using the regular predict function for the specified quantile.

My favourite R random forest implementation, ranger, which there is also a Python wrapper around the C++ libs, also allows quantile prediction, but in Python it uses a predict_quantile method to perform this, so a different approach again, and so I don't think quantile predictions can be made easily if the estimator is encapsulated within another structure like a Pipeline.

@RichardScottOZ
Copy link
Contributor Author

I haven't tried it, but I would guess probably? Only thing I think I remember seeing is a grid search mentioned there.

@RichardScottOZ
Copy link
Contributor Author

I was wondering about that a little when I saw your apply function - e.g. if needed StandardScaler raster stack based on the original for clustering - a function and argument dictionary with the array, anything else?

@stevenpawley
Copy link
Owner

Yes, was wondering the same thing, if the apply method could be used for applying predictions with arbitrary/non-standard methods. I think it can, but I should work through it with an example because I'd still like to use NGBoost or skranger for prediction intervals, but when I tried with skranger it wouldn't work if wrapped inside pipelines or other methods because they don't have a predict_quantiles method to pass through.

@RichardScottOZ
Copy link
Contributor Author

Yes, so possibly might need some sort of overloading custom pipeline hackery in that case, which isn't ideal.

@RichardScottOZ
Copy link
Contributor Author

and hdbscan class label estimation looks like this, basically

result, result_strengths_t = hdbscan.approximate_predict(estimator, flat_pixels) (so 2 to do)

and there is #result = estimator.predict_proba(flat_pixels)
result = hdbscan.prediction.membership_vector(estimator, flat_pixels) - which gives the probabilities of being in any particular cluster

@stevenpawley stevenpawley force-pushed the master branch 4 times, most recently from d6c947e to f9508f4 Compare May 20, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants