can we still encode Range values in samples? #49

samthiriot · 2017-12-12T14:06:34Z

When we were creating Range attributes before the huge refactoring, it was possible to give both a list of codes (like "1","2"...) and the corresponding textual counterparts ("less than 10m","11 to 16"...).
Now we are only constructing these ranges with the textual version.
This works well to read aggregate stats from CSV files where we expect all the columns to explicitly contain "less than 10m"; but for sample files, the values is often encoded as "1","2"...

is it still possible to deal with that?

Tks !

chapuisk · 2017-12-12T14:19:20Z

Hey,
I am not quite sure to understand the problem. If attribute is encoded as "1", "2" ... you can go for int attribute or if it is not integer value per se, for an ordered value attribute. If this is just another way to encode range attribute, then use a mapped attribute with a record mapper where you can define a mapping like: {1 : less than 10; 2 : 11 to 16 ...}. This option force you to define two attributes: referent range attribute and mapped "int to range" (or "ordered to range") attribute. Hope it can help you to overcome you issue.

samthiriot · 2017-12-12T14:28:45Z

thanks ! I think you understood my question ^^
The cases are, as you say:

range: never stored in samples as "0 to 10" or "11 to 15" but 0 or 1
boolean: stored as 0 or 1 and not FALSE or TRUE
only for integers and double we have a direct correspondence between the encoded value and its textual counterpart.
It's a bit weird to always create a mapped attribute, no? I mean, isn't that part of the semantics of the "Value" to be either encoded or to have a litteral version ?
For instance in the INSEE dico, they always propose:
<code of the variable>;<label of the variable>;<code of the modality (value)><label of the value>

samthiriot · 2017-12-12T14:33:12Z

thinking about it: typically to write the content of a value in a generated sample, one would like to also write the encoded value, not (always) the long version.
in this case we need to be able to retrieve the short version (encoded) for a value; I'm not sure how to do it using a mapped attribute.

samthiriot · 2017-12-12T14:40:31Z

(I'll think about it, no worry & thanks)

chapuisk · 2017-12-13T02:47:33Z

Thinking about it and saw one good reason not to encode various codes for one attribute. In many case, data "simple" encoding like {"1", "2", ...} are used for several different attributes: e.g. boolean are 1 and 2; range are 1, 2 ... x and so on. Hence they can be confusion on translation: that is which "1" code will be related to which "complex" encoding ? The unique way to solve the problem is to bind modalities or codes. In that case, and if you use the mapped version of the attribute, you can choose between simple code or complex one (using DemographicAttribute#findMappedAttributeValues(IValue))

samthiriot assigned chapuisk Dec 12, 2017

chapuisk added the question label Dec 21, 2017

chapuisk added this to the Data Structure/Architecture milestone Dec 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can we still encode Range values in samples? #49

can we still encode Range values in samples? #49

samthiriot commented Dec 12, 2017

chapuisk commented Dec 12, 2017

samthiriot commented Dec 12, 2017 •

edited

Loading

samthiriot commented Dec 12, 2017

samthiriot commented Dec 12, 2017

chapuisk commented Dec 13, 2017

can we still encode Range values in samples? #49

can we still encode Range values in samples? #49

Comments

samthiriot commented Dec 12, 2017

chapuisk commented Dec 12, 2017

samthiriot commented Dec 12, 2017 • edited Loading

samthiriot commented Dec 12, 2017

samthiriot commented Dec 12, 2017

chapuisk commented Dec 13, 2017

samthiriot commented Dec 12, 2017 •

edited

Loading