Skip to content

Commit

Permalink
Went through it
Browse files Browse the repository at this point in the history
  • Loading branch information
richelbilderbeek committed Aug 25, 2022
1 parent 0ac6d99 commit 5953e2d
Showing 1 changed file with 51 additions and 41 deletions.
92 changes: 51 additions & 41 deletions article.tex
Original file line number Diff line number Diff line change
Expand Up @@ -1031,42 +1031,47 @@ \section{Conclusions}
always be published together with a scholarly article,
as, if the paper and code differ, it is the code that generated the results.

Code has additional useful information, similar to confidence intervals,
that allow a reader to gauge how much he/she trusts the results.
The more of the best practices are followed,
the more trustworthy are the results.
The most important best practices discussed in this paper are
automatic testing, having a high code coverage and code
of low (cyclomatic) complexity.
In academia, to uncover the truth, code correctness is essential,
similar to cell biologists working sterile to not contaminate their
cell cultures.
These best practices are vital for a computational biologist.

For research to be reproducible, one ideally has access to
both the data used and the code.
In some fields, such as genetic epidemiology, the data is
sensitive, hence cannot be released,
yet there are methods being devised to run code on sensitive
data with assured privacy \cite{zhang2016review,azencott2018machine}.
Additionally, to ensure that

Code has additional useful information, similar to confidence intervals,
that allow a reader to gauge how much he/she trusts the results.
The most important way to determine the quality of code
is the amount of unit tests.
When following a set of best practices, such as DevOps, TDD, Agile,
writing unit tests is an essential
part in writing code.
The amount of unit tests is an honest signal
for code correctness (i.e. it does what it is supposed to do, as opposed
to 'it does something').
In academia, to uncover the truth, code correctness is essential,
similar to cell biologists working sterile to not contaminate their
cell cultures.
Unit tests are a vital practice for a computational biologist.
Additionally, it follows software development's best practices
(especially the practice to test code automatically)
to release a simulated/public dataset, with
the additional benefit that this dataset can be used for comparisons.

Code is harder to preserve than an English text
and preserving code is rarely done \cite{barnes2010publish}.

Although code is the primary actor in computational experiments,
there is no incentive to submit code alongside a publication.
Most academic journal do not require authors to submit their code,
nor it the submitted code peer reviewed.

Although the code of computation experiments can be archived well,
there is no incentive to do so.

Although a runnable version of a computation experiment can be archived well,
there is no incentive to do so.
nor is the submitted code peer reviewed,
rOpenSci \cite{ram2018community} being the pleasant exception.

Code used in academic research complies badly with FAIR principles,
where unpublished code is the worst offender.
However, any idealistic researcher does not have the
tools to following FAIR principles in an exemplary way:
there are no standards in how a scholarly article
refers to academic code (i.e. in a machine-friendly way).
Creating a (Singularity) container is the best a researcher
can do to ensure his/her code is reusable,
yet, there are no incentives to make code accessible, interoperable
or re-usable, with standarized metadata attributes being absent.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Discussion}
Expand All @@ -1077,35 +1082,40 @@ \section{Discussion}
Code may or may not be paradata, depending on how the definition
is interpreted.
Here we repeat the definition of 'paradata' and discuss
its (numbered) constituents.
This paper defined paradata as 'data (1) about the collecting (2) of the data (3)',
where (1) code must be seen as data, (2) downloading raw data
and doing calculations must be seen as collecting, and (3) the
results of an experiment must be seen as data.
This paper argues, that (1) code is data in the form of text spread
over one or more files, that has useful measurable properties,
(2) downloading raw datas and doing calculations, such as a T-test,
does describe how the bits and pieces of an end result is collected,
and (3) an experimental results is data, as it can be measured and
used as the raw data of a next experiment.
its constituents.
This paper defined paradata as 'data about the collecting of the data'.
This implies that (1) code must be seen as data.
This paper argues, that code is data in the form of text spread
over one or more files with useful measurable properties.
(2) downloading raw data and doing calculations must be seen as collecting.
This paper argues that downloading raw datas and doing calculations,
such as a T-test,
does describe how the bits and pieces of an end result are collected.
(3) the results of an experiment must be seen as data.
This paper argues that an experimental results is data,
as it can be measured and can be used as the raw data of a next experiment.

% \paragraph{The drawbacks of publishing code}

Publishing code may be disadvantageous for an author.
For science, yes, as this allows reproducible research.
For science, code should be published,
this allows reproducible research
(again, see \cite{haibe2020importance} for a tragic example).
For an author, publishing code alongside an experiment opens up
the possibily to receive questions regarding that code.
Note, however, that not publishing code will always thwart
the reproduction of incorrect code, at the cost of a scientific
career \cite{baggerly2009deriving}.
Note, however, that not publishing code may put
oneself in the focus of attention
and -after much effect by others reproducing an incorrect result-
at the cost of a scientific career \cite{baggerly2009deriving}.

% \paragraph{The drawbacks of publishing version-controlled code}

Is it worth it to publish version-controlled code?
For an author,
there is additional training involved, and also here,
publishing code alongside an experiment opens up
the possibily to receive questions regarding that code.
the possibily to receive questions regarding that code,
as well as other interactions, including helpful contributions.

% \paragraph{The drawbacks of publishing a running version of code}

Expand Down Expand Up @@ -1160,7 +1170,7 @@ \section{Discussion}
The world of science would be a more open, humble, trustworthy, truthful
and helpful would the code that accompanies a scientific paper
be treated like a first class citizen. As doing so in an exemplary way
in yet to be rewarded, hence it has to be the idealististic scientists
is yet to be rewarded, hence it has to be the idealististic scientists
to wage this battle. I feel the truth and science are worth fighting for
and I hope this paper helps others to join.

Expand Down

0 comments on commit 5953e2d

Please sign in to comment.