Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can we still encode Range values in samples? #49

Open
samthiriot opened this issue Dec 12, 2017 · 5 comments
Open

can we still encode Range values in samples? #49

samthiriot opened this issue Dec 12, 2017 · 5 comments
Assignees
Labels
Milestone

Comments

@samthiriot
Copy link
Collaborator

When we were creating Range attributes before the huge refactoring, it was possible to give both a list of codes (like "1","2"...) and the corresponding textual counterparts ("less than 10m","11 to 16"...).
Now we are only constructing these ranges with the textual version.
This works well to read aggregate stats from CSV files where we expect all the columns to explicitly contain "less than 10m"; but for sample files, the values is often encoded as "1","2"...

is it still possible to deal with that?

Tks !

@chapuisk
Copy link
Contributor

Hey,
I am not quite sure to understand the problem. If attribute is encoded as "1", "2" ... you can go for int attribute or if it is not integer value per se, for an ordered value attribute. If this is just another way to encode range attribute, then use a mapped attribute with a record mapper where you can define a mapping like: {1 : less than 10; 2 : 11 to 16 ...}. This option force you to define two attributes: referent range attribute and mapped "int to range" (or "ordered to range") attribute. Hope it can help you to overcome you issue.

@samthiriot
Copy link
Collaborator Author

samthiriot commented Dec 12, 2017

thanks ! I think you understood my question ^^
The cases are, as you say:

  • range: never stored in samples as "0 to 10" or "11 to 15" but 0 or 1
  • boolean: stored as 0 or 1 and not FALSE or TRUE
    only for integers and double we have a direct correspondence between the encoded value and its textual counterpart.
    It's a bit weird to always create a mapped attribute, no? I mean, isn't that part of the semantics of the "Value" to be either encoded or to have a litteral version ?
    For instance in the INSEE dico, they always propose:
    <code of the variable>;<label of the variable>;<code of the modality (value)><label of the value>

@samthiriot
Copy link
Collaborator Author

thinking about it: typically to write the content of a value in a generated sample, one would like to also write the encoded value, not (always) the long version.
in this case we need to be able to retrieve the short version (encoded) for a value; I'm not sure how to do it using a mapped attribute.

@samthiriot
Copy link
Collaborator Author

(I'll think about it, no worry & thanks)

@chapuisk
Copy link
Contributor

Thinking about it and saw one good reason not to encode various codes for one attribute. In many case, data "simple" encoding like {"1", "2", ...} are used for several different attributes: e.g. boolean are 1 and 2; range are 1, 2 ... x and so on. Hence they can be confusion on translation: that is which "1" code will be related to which "complex" encoding ? The unique way to solve the problem is to bind modalities or codes. In that case, and if you use the mapped version of the attribute, you can choose between simple code or complex one (using DemographicAttribute#findMappedAttributeValues(IValue))

@chapuisk chapuisk added this to the Data Structure/Architecture milestone Dec 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants