Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fitting Base Class and Gaussian Model from PR 143 #172

Merged
merged 19 commits into from
Aug 2, 2024
Merged

Conversation

eloiseyang
Copy link
Collaborator

Added MethodBase as an abstract class for all fit methods.
Added GaussianModel class using MethodBase.
Added ProjectionFit class.
Added tests for all the above including datasets.
Separated from pull request 143.

Added GaussianModel class using MethodBase
Added ProjectionFit class
Added tests for all the above including datasets
Separated from 8c85a56 on phys-cgarnier/lcls-tools/dev
Copy link
Collaborator Author

@eloiseyang eloiseyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fitting classes were written with ML in mind. Can we find the intersection between the ML and physics needs? Otherwise, we may have to separate out the ML fitting tools and the physics fitting tools.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should complete this before merging.

weighted_sigma = np.sqrt(np.cov(x, aweights=data))

self.init_values = {
self.param_names[0]: weighted_mean, # data.mean()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text keys are more clear than the numbered indices here. I don't imagine these param names are going to change all that often. Maybe a pydantic structure would be a better way of accomplishing what this does?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We can discuss what that would look like in our meeting.

# Creating a normal distribution of points around initial offset.
offset_prior = norm(self.init_values["offset"], 0.5)
self.priors = {
self.param_names[0]: mean_prior,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numbered indices are not clear.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Nicole did not want hardcoded names though, so the compromise was we made a class attribute than contained the param names as a list that wouldn't get overwritten easily. I agree the using indices on that list might be confusing to a reviewer though, I am not sure what the best way to resolve both problems is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that we want to avoid hard coded EPICS names. Hard coded keys of a predefined dictionary should be okay. I think defining a struct-like class could be a good solution here.

def _forward(self, x: np.array, params: np.array):
# Load distribution parameters
# needs to be array for scipy.minimize
mean = params[0]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numbered indices are not clear. What is the params list?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Params list is the list of parameters the fit function uses. At the method class level, they need to be reference arbitrarily Furthermore, scipy.optimize.minimize requires they be stored in an ordered list.

params_dict = self.unnormalize_model_params(fitted_params_dict, projection_data)
return params_dict

def plot_fit(self, **kwargs):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should separate out plotting functions.

self.assertAlmostEqual(fit.min(), 0.0, places=self.decimals)
return

# def super_gaussian(self, x, amp, mu, sig, P, offset):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should delete this test before we push.

and upper bound on for acceptable values of each parameter)
"""

param_names: list = None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design pattern is exactly the kind of thing pydantic is meant to simplify. Can we a simpler way to accomplish this?

lcls_tools/common/data_analysis/fit/method_base.py Outdated Show resolved Hide resolved
lcls_tools/common/data_analysis/fit/method_base.py Outdated Show resolved Hide resolved
@eloiseyang
Copy link
Collaborator Author

@phys-cgarnier added some changes to the way we set parameters in the model. Waiting on a single function fitting tool for physics applications before merging.

… have a default field, also set method in projection_fit.py to have a default file. This will make using the tool much easier. All that has to be down is call gaussian fit and give it an image to use
@eloiseyang
Copy link
Collaborator Author

Thanks for pushing the beamsize code, @phys-cgarnier. It looks good to me, we just have some flake8 tests failing. If you could fix those I think we're ready to push this branch. We'll want to write some regression tests for the gaussian fit and beamsize scripts too, but I think that can wait for now.

@nneveu can you take a look too?

Copy link
Member

@nneveu nneveu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much better, thanks for the the edits @phys-cgarnier and @eloiseyang! I'd like to see a few more comments for functions, remove the commented out test, and an issue written on circling back to the naming conventions, but otherwise ok to merge. Thanks so much!

}
return result
else:
centroid = beamspot_chars['centroid']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we comment the two cases here for calc beamsize, i.e. you do the first if the intensity is low and the else if intensity is high enough? and how was the default threshold picked?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will comment both on both cases. The default was selected by the ML group, I just used theirs so I am sure how the decided on it.

"Sx": np.NaN,
"Sy": np.NaN,
"bb_penalty": np.NaN,
"total_intensity": beamspot_chars["total_intensity"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated except the 10**? maybe combine with above?



print(pts)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to leave this print?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No thats a typo, lol.

n_stds = self.bounding_box_half_width

pts = np.array(
(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we be more descriptive than pts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will rename it

temp = pts - np.array((roi_radius, roi_radius))
distances = np.linalg.norm(temp, axis=1)
# subtract radius to get penalty value
bb_penalty = np.max(distances) - roi_radius
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bounding box penalty or something else? can you add a comment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I will add a comment with more description.

return priors

def _forward(self, x: np.array, method_parameter_list: np.array):
# Load distribution parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the function doing the fit right? can we rename it to something with 'fit' in the name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a #TODO comment to revisit would be enough for now.

return normalized_data

def unnormalize_model_params(
self, method_params_dict: dict, projection_data: np.ndarray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does 'model' need to be in this function name?

"""sets up the model and init_values/priors"""
self.model.profile_data = projection_data

def fit_model(self) -> scipy.optimize._optimize.OptimizeResult:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to say fit_model or fit_method here?

def fit_model(self) -> scipy.optimize._optimize.OptimizeResult:
"""
Fits model params to distribution data and plots the fitted params
as a function of the model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still plotting? maybe update this comment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the empty file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know I believe there was a test there originally. I will remove it then write the test in another PR

@nneveu nneveu merged commit c950303 into slaclab:main Aug 2, 2024
3 checks passed
@nneveu nneveu deleted the fit branch August 2, 2024 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants