Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a new fitting toolkit and examples of how to use it with Screen Profiles. #143

Draft
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

phys-cgarnier
Copy link
Collaborator

Updating fitting tools, image processing, and screen beam profile measurement classes, draft PR

@phys-cgarnier phys-cgarnier marked this pull request as draft April 12, 2024 16:39
@nneveu
Copy link
Member

nneveu commented Apr 16, 2024

Thanks for all your hard work on this! Some initial comments after looking at a few files:

  • There are a lot of changes here, probably 2-3 PR's worth. I recommend submitting PR's with less changes. I.e. some of the image changes could have been in their own PR vs. being added with the fitting work.
  • I think plotting should be separate from more fundamental functions like fitting/image processing.
  • Please rename the PR to something a little more descriptive than Dev, this will help us if we want to look back at the PR later. I suggest relating it to an issue # you are trying to close.

Thanks again, great work!

Copy link
Member

@nneveu nneveu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above. I'll look more carefully at the measurement and test files after we chat.


@abstractmethod
def find_init_values(self, data: np.ndarray) -> list:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change all 'pass'es to NotImplementedError.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for abstract methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to make sure no one calls this w/o using something that overwrites the behavior? So there should be a catch for users that call this but it does not behave how they expect? I haven't used these a lot though, so maybe it's fine, we can discuss at the meeting.


def plot_init_values(self):
"""Plots init values as a function of forward and visually compares it to the initial distribution"""
fig, axs = plt.subplots(1, 1, figsize=(10, 5))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of hard coding figure subplots and sizes. Can we either rework this to be more general, or remove the plotting part as a function somewhere else?

def plot_priors(self):
"""Plots prior distributions for each param in param_names"""
num_plots = len(self.priors)
fig, axs = plt.subplots(num_plots, 1, figsize=(10, 10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with hard coding fig size etc. idk if this belongs in a base model?

pass

def log_likelihood(self, x, y, params):
return -np.sum((y - self.forward(x, params)) ** 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses a function that is not implemented? Can we get this from scipy instead of calc ourselves? Something like: https://stackoverflow.com/questions/59869759/log-likelihood-function-generated-by-scipy-stats-rv-continuous-fit ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into this.

loss_temp = -self.log_likelihood(x, y, params)
if use_priors:
loss_temp = loss_temp - self.log_prior(params)
return loss_temp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these functions for ML related calcs? Loss/log liklihood/priors, etc?
If so, maybe the belong in an ML specific file

Copy link
Collaborator Author

@phys-cgarnier phys-cgarnier Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes these functions are essentially what we want to minimize, the minimization tells us the best fit parameters. In the case where log prior is used our minimization problem turns into a bayesian regression problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically, the use of priors/Bayesian inference isn't really ML and doesn't have that much overhead, I'm not sure if we should separate into different classes. We should discuss

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matlab2py file still exist, so this file should still exist too. May need to get updated or renamed, but the functions being tested here are still in the repo? Unless I'm missing something?


class ROI(BaseModel, ABC):
roi_type: str
center: List[PositiveFloat]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see from the indexing below that center is two floats (which makes sense x/y), maybe show that shape here or comment. Should we force this to always be a list of two floats?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder ask ryan

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the ROI for an image? If so, should these be integers so you get the centre x/y pixel? or if it's in millimeters I would add the units to the variable name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: look at this file again after next commits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: look at this file again after next commits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a test for these?

@phys-cgarnier phys-cgarnier changed the title Dev Creating a new fitting toolkit and examples of how to use it with Screen Profiles. Apr 25, 2024
self.param_names: list = None
self.param_bounds: np.ndarray = None
self.init_values: dict = None
self.init_values_list: list = None
Copy link
Collaborator Author

@phys-cgarnier phys-cgarnier Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recall that projection_fit.py needs a list in order to use fit_model. If you change the type of self.init_values to a dict and unpack it to an ordered list inside fit model then now ProjectionFit becomes model dependent instead of a template. (knowledge of the gaussian model params should not be in projection_fit). Recommending instead make a dictionary and a list of param values inside MethodBase.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my other comment on why init_values_list is not needed

if not isinstance(profile_data, np.ndarray):
raise TypeError("Input must be ndarray")
self._profile_data = profile_data
self.find_init_values(self._profile_data)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed function call here from self.find_priors to self.find_init_values. Please see the follow comment for changes in self.find_init_values

mean = np.argmax(gaussian_filter(data, sigma=5)) / (len(data))
sigma = 0.1
self.init_values_list = np.array([amplitude, mean, sigma, offset])
self.init_values = {"amplitude":amplitude,"mean":mean,"sigma":sigma,"offset":offset}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was requested that users might like the option to grab init values and not have to have knowledge of what parameters belonged to the ordered np.array. So self.init_values was changed from a list to a dictionary. However, to run scipy.optimize.minimize an np.array is still needed so that is saved in a different variable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't save it in a different variable, we should just do the mapping inside the scipy.optimize.minimize call, should be a one-liner

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize, I did nor realize python dictionaries are ordered.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while technically dicts are ordered we don't have to rely on that, we can use
np.array([self.init_values[name] for name in self.param_names])

self.init_values_list = np.array([amplitude, mean, sigma, offset])
self.init_values = {"amplitude":amplitude,"mean":mean,"sigma":sigma,"offset":offset}
# if use_priors = True in projection_fit then find priors? use case where projection fit is instantiated with use_priors = False then flag is changed but you have no priors
self.find_priors()
Copy link
Collaborator Author

@phys-cgarnier phys-cgarnier Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highly recommend calling self.find_priors when init_values are found.

Lets consider the workflow for projection_fit.fit_projection(some_projection_data).
1.) This method passes normalized projection data to projection.setup_model.
2.) projection.setup_model calls the setter method for gaussian_model.profile_data
3.) The setter method for gaussian_model.profile_data triggers self.find_init_values
4.) those init values are used in projection_fit.fit_model()
5.) now consider the case when projection_fit.use_priors = True, every thing works smoothly right? Well no.
6.) projection_fit.use_priors = True can be set upon initialization or after. Also, the advantage of having gaussian_model.profile_data be a @Property is that it can be updated easily. Meaning that we can call projection_fit.fit_projection( with_any_1d_array).
7.) So use_priors = True but priors have not changed. Which is not good. Calling gauss_model.find_priors inside
projection_fit.fit_model() is also bad.

edit: I am not sure why stuff is bolded.

#making an ordered list here is making this fit_model model dependent since it needs knowledge of the params list ordering

res = scipy.optimize.minimize(
self.model.loss,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a see a potential problem with making forward and log prior receive a dictionary. Model.loss calls forward and log prior. The problem is that it is scipy.optimize.minimize would pass an unpacked list of length == len(model.init_values) to model.loss on reiterations.
I can't think of how I would be able to resolve this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the calls in model.loss to model._forward and model._log_prior which take np arrays instead. The public versions can take dicts as arguments

phys-cgarnier and others added 22 commits April 30, 2024 15:27
…d maybe make norm a private variable of gaussModel.py and stop using _forward as a staticmethod
…on to fit without displaying normalized values, changed fit_model method in scipy.optimize.minimize from default (BFGS) to Powell
@nneveu
Copy link
Member

nneveu commented Aug 2, 2024

@phys-cgarnier @eloiseyang should we close this PR now that it's been reworked/merged elsewhere?

@eloiseyang
Copy link
Collaborator

@phys-cgarnier @eloiseyang should we close this PR now that it's been reworked/merged elsewhere?

Let's leave it for now. I might need to use this branch to pull out the last of the changes we haven't yet. Once everything is merged we can close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants