Skip to content
This repository has been archived by the owner on Apr 12, 2023. It is now read-only.

handle data lines with less than expected number of elements #17

Open
aleksandervines opened this issue May 24, 2016 · 3 comments
Open

Comments

@aleksandervines
Copy link

The CTD software SD200W does not always produce standard/compliant csv files - well, the main problem with csv, I guess, is that there is no real standard to follow.

The files that are produced sometimes has no data other than the pressure for some measurements, and it does not say N/A or alike for the data which is missing.

Example:

Press;Sal.;Temp;Ox %;mg/l;Density;S. vel.;
0,10
0,20
0,30;32,21;8,651;370,23;34,98;24,989;1481,47;
0,40;32,21;8,655;370,23;34,98;24,987;1481,49;
0,50;32,21;8,659;370,23;34,98;24,986;1481,50;
0,60;32,20;8,663;370,23;34,97;24,984;1481,51;
0,70;32,20;8,667;370,23;34,97;24,983;1481,53;
0,80;32,20;8,671;370,23;34,97;24,981;1481,54;

The first two data lines in this file causes an error since they got one element.

I'd suggest add an option to ignore lines which does not comply to number of elements, or assume the first elements are in the correct place, and add nan for the rest? I don't know which would be preferred in this case.

@aleksandervines aleksandervines changed the title handle data lines with no data handle data lines with less than expected number of elements May 24, 2016
@lesserwhirls
Copy link
Collaborator

I think the best option is to keep it an option. Currently, if you mark those two lines as "header" lines in the wizard interface, they should be ignored. We should also give the option of including them, but padding them out with missing values where needed.

@aleksandervines
Copy link
Author

Currently, if you mark those two lines as "header" lines in the wizard interface, they should be ignored.

This works for each specific case, but it requires for this to be checked on every file.
I wouldn't put it past this silly program which creates these csv files to also have some lines like that in the middle or at the end of the file.

And yes, I think it should be an option, as different users would require different solutions, some would want for this to fail as they then would need to quality check the input files which should be on a correct format.

@lesserwhirls
Copy link
Collaborator

Ah yes, you would need to do that for each file...and, as you say, if those lines happen to be deep in the data block, then all bets are off. A checkbox, enabled by default, called something like "Insert missing values into incomplete data rows" should do the trick. What do you think? Does that text make sense?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants