Skip to content
smalley edited this page Jul 19, 2017 · 16 revisions

Welcome to the Parallel.GAMIT wiki!

Install PostgreSQL 9.6.

To import Demian’s metadata database

From PostgreSQL folder run

cat /Users/robertsmalley/DemiansGamitPackage/gnss_data | psql -U postgres gnss_data

where uncompressed database (text, exported from Demián’s copy?) gnss_data (from gnss_data.gz) is input into PostgreSQL (or “pgAdmin 4”, after install it from pgadmin.org). It updates and puts the database out into the PostgreSQL structure under name gnss_data (another file location and structure).

Notes from discussion with Demián on 18 Jul, 2017:

(Dangerous but using the default user and password in PostgreSQL)

user – postgres, password – postgres

Installed “pgAdmin 4”

Start pgAdmin 4 and click on servers, shows default local server PostgreSQL 9.6 with red X, the red X means you are not connected. To connect click on it and enter the password.

Then click on Databases. Will see list of databases, all but one with red X’s because not connected (will be connected to default postgres database).

Click on “gnss_data” to connect to it, and then the + of gnss_data to list the stuff (have to figure out what the names of the various things in the database are) in it. Can see info about database.

Important one for now is Schemas. Click on the + of Schemas and will show item “public”. Click the + of Schemas and will list its “Functions”, “Tables” and “Trigger Functions”

Look in Tables– has all tables with metadata, some examples

Networks – right click on Networks to get pop up menu to select what you want to do – eg edit or view data. To view first 100 lines, select that. If click on the + will get gory details stuff.

Get table with columns of network codes and their names, associated along rows. Shows the SQL commands at the top

SELECT * FROM public.networks ORDER BY “NetworkCode” ASC LIMIT 100

In the SQL command, splat (*) is the unix wildcard, character parameters/text strings go in “…”, numeric data goes straight in (100). Stuff in purple are SQL commands, stuff in black is from user.

The network codes are 3 characters and are self explanatory, but you have the names on the right column.

Stuff needed here: To add SOG, Demián will show how to do it Don’t input everything from Eric into database (way too much stuff) “gatekeeper” (Mike’s name) – program to control getting stuff into archive

PostgreSQL database does not follow UNIX philosophy, protects user from themselves

Can’t delete stuff accidently When make change, to name for example, propagates through whole database

Right click on Stations, show first 100 lines

Network_name station_name start_date end_date x y z Harpos_coeff_otl (Demián used grdtab in Gamit to get this info) lat lon ht

Can’t duplicate station codes within a network (can’t duplicate “primary key”)

I will need another network for GFZ (CAP and SAGA have a bunch of name conflicts in survey and continuous stations)

Example of network names – arg for Argentine CGPS and ars for Argentine Survey sites.

Station name field generally empty (did not want to type in)

Network codes are 3 characters, station codes are 4 characters.

The start and end times can be set automatically by program, ignore holes, is date of first bit to date of last.

To put new data in archive – gatekeeper, checks location of candidate against existing sites – if <100 away declares is the same. If already data from this station, network and code, in the database will save rinex in archive and update database. If no data from this station so far, will put station into ??? network.

Now rinex – have date metdata in various formats – YYMMDDD, YY.yy, start, end, etc.

Info from rinex hader? (to start?)

rinexextra – where multiple files go (duplicate names, multiple sampling intervals, multiple old ashtech files due to power failures, etc.)

database cannot handle repetitions of rinex files: 1 file/day/station different names does not fix problem (letting station in based on location, not file name). If has duplicate problem puts file in rinexextra. Will swap between archive and rinexextra such that one in archive is longer.

Have to clean up rinexextra by hand (does not get erased automatically)

stationinfo – metadata for files saved in rinex table. (is/from?) actual station.info file in database parsed into database.

Can’t guarantee is good information – only check is rinex file header is in the database and the receiver and antenna s/ns.

Rinex_tank_struct – metadata about organization/structure of tank/archive where rinex files are actually stored. Demián uses Network Yr Doy

Table for earthquakes – date, lat, lon, depth, mag, etc, M≥6.

Events – important table, useful for debugging, but complicated. History of what done to the db.

gamit_soln made from sinex output file (which is made from h file). Has column for project id or stn/net code. 1 entry per day. Is in inner-coordinates from LS solution, but given in position (x,y,z or lat, lon, ht) not baseline lengths

ppp_soln xyz, sigmas, time, reference frame, ORBITS, CLOCKS, etc. used to make the calc.

Intro Demián’s package.

Look in gmss_data.cfg This is a postgres config file. Sections marked by [XXX], means something to postgres

for comments

Have to edit/change stuff (directories, etc. for local setup)

-bash:classes:508 $ cat gnss_data.cfg
[postgres]
# information to connect to the database (self explanatory)
hostname = 127.0.0.1
username = postgres
password = postgres
database = gnss_data


hostname = IP address, the 127.0.0.1 is some sort self reference address?, or “local” (maybe)
username = postgres (dangerous)
password = postgres (dangerous)
database = gns_data

above basic info to connect (note as mentioned before not very secure, with default user and password)

need some paths to base of rinex archive/tank also needs rinex archive/tank structure the depository is a place to deposit files to be added to the database (the “gateway” program pyArchiveService.py will look there for rinex files (anything ending in d.Z), send them off to ppp, figure out if already in the archive, if not will put them there, store metadata in database, build station.info information,

[archive]
# absolute location of the rinex tank
path = /Users/gomez.124/mounts/qnap/ign/archive
repository = /Users/gomez.124/mounts/qnap/ign/repository

pyScanArvhive.ps will what there is to add (??).

Can add data to archive several ways – start with data in repository and do everything, start with data in directory structure same as archive in the archive tree and run pyScanArvhive.ps. Will run ppp as needed, etc.

Set up for parallel execution

# parallel execution of certain tasks
parallel = True
cpus = 4

Set up paths for orbits, etc. NOTE: Demiáns scripts don’t download orbits, clocks, eop, etc.

# absolute location of the broadcast orbits
brdc = /Users/gomez.124/mounts/qnap/igs/brdc/$year

# absolute location of the sp3 orbits
sp3 = /Users/gomez.124/mounts/qnap/igs/orbits/$gpsweek

Handling of orbits (does not download, but will look for different kinds) Type 1 – IGS repro orbits Type 2 – IGS regular orbits (not reprocessed) – “precise” Type 3 – IGS rapid orbits sp3 format for all the types altr are alternate repro or regular from JPL ultrarapid don’t work (did not try predicted)

orbit center type precedence:

type_1 has precedence over type_2. If type_1 is found, search is over

if type_1 is not found, then algorithm searches for type_2

up to 3 types allowed

If PPP fails to process using either type, then it will try with altr_1,2,etc

sp3_type_1 = ig2 sp3_type_2 = igs sp3_type_3 = igr sp3_altr_1 = jp2 sp3_altr_2 = jpl

Ocean loading stuff – where grdtab I and where grd is

[otl]

location of grdtab to compute OTL

grdtab = /Users/gomez.124/gamit/gamit/bin/grdtab

location of the grid to be used by grdtab

otlgrid = /Users/gomez.124/gamit/tables/otl.grid

Where to find PPP stuff and some GAMIT/GLOBK information

[ppp] ppp_path = /Users/gomez.124/PPP_NRCAN ppp_exe = /Users/gomez.124/PPP_NRCAN/source/ppp34613 institution = The Ohio State University #institution = Instituto Geografico Nacional info = —- #info = Av. Cabildo 381 CABA C1426AAD, Buenos Aires, Argentina (dgomez@ign.gob.ar)

atx has to be compatible with the orbits – e.g. if using igs08 orbits hae to use igs08.atx file. (not fatal if don’t use correct pair, but don’t get best results).

atx = /Users/gomez.124/PPP_NRCAN/igs08_1930.atxneed to provide atx file

Begin testing Demian’s “parallel gamit” package

First – Demián sent me the missing.py routine/file to be put in the directory classes, which just about everything needs, and things now work. This routine is from Abel and just about everything uses it.

After discussion, Demián updated github package – downloaded update and replaced (kept old as backup).

Jump to future – when update from GitHub have to bring over config files (gnss_data.cfg) with local information about directories, etc.

Most of Demián’s routines have help.

3 main programs in package run by “python pyXXX.py” where XXX is

1st main program – pyScanArchive.py 2nd main program – pyArchiveService.py 3rd main program – pyIntegrityCheck.py

Rinex archive associated with the PostgreSQL metadata database “gnss_data” (the metadata database is a PostgreSQL relational database, the Rinex archive is “external” – just a regular old unix directory structure available to fuck up without the help of PostgreSQL).

If you build, update and maintain the Rinex archive through the parallel gamit package, only good Rinex data files (successful PPP and Gamit processing, meets size, etc. minimums) will be saved in the archive.

Two ways to put Rinex data into archive –

  1. put the Rinex files into the appropriate places in the directory structure and then use PyScanArchive.py and PyIntegrityCheck.py to update the PostgreSQL database and flag bad data (you will have to remove it by hand, it does not do it automatically yet).
  2. put the Rinex files you want to add in the “repository” (eventually there will be a background service that looks here and sucks stuff in automatically – not yet implemented, so for now have to run it). The repository directory (and the archive directory, etc.) is defined in the gnss_data.cfg file. Then run PyArchiveService.py. This will look in the repository, find all the *d.Z files, assume they are Hatanaka and UNIX compressed Rinex files, run them through PPP, check the results, update “gnss_data”, move the Rinex file to the archive (or a tank with “problem” data – bad, duplicate, etc.).

Seems like have to be in (or put in path?) /Users/robertsmalley/DemiansGamitPackage/Parallel.GAMIT-master/classes to run Demian’s package.

(see “jump to future” above – did not run after package update until I copied correct “gnss_data.cfg” file into updated classes directory.)

pyArchiveService.py

smalleymacbookpro15:-bash:classes:515 $ python pyArchiveService.py
the provided argument is not a folder
  usage: 
  pyArchiveService : scan for rinex file in [repo directory]/data_in
    The repository should have the following folders (created if they don't exist):
    - data_in      : a folder to put incoming data (in any structure).
    - data_in_retry: that has some failure and that was moved out of the directory to allow the used to identify problems
                     will be moved back into data_in when the program is restarted.
    - data_reject  : rejected data due to not having been able to run ppp on it.

If put directly into archive, then run pyScanArchive.py, can do a step at a time, or all at once, with the switches. (I think

smalleymacbookpro15:-bash:classes:511 $ python pyScanArchive.py 
Scan the archive using configuration file gnss_data.cfg
  usage: 
         --rinex  : scan for rinex
         --rnxcft : resolve rinex conflicts (multiple files per day)
         --otl    : calculate OTL parameters for stations in the database
         --stninfo: scan for station info files in the archive
                    if no arguments, searches the archive for station info files and uses their location to determine network
                    else, use: --stninfo_path --stn --network, where
                    --stninfo_path: path to a dir with station info files, or single station info file. Leave empy to use stdin
                    --stn         : station to search for in the station info, of list of stations separated by comma, no spaces between ('all' will try to add all of them)
                    --net         : network name that has to be used to add the station information
         --ppp    : run ppp to the rinex files in the archive
         --all    : do all of the above
smalleymacbookpro15:-bash:classes:512 $ python pyIntegrityCheck.py 
Integrity check utility of the database.
  usage: 
         --stn [net.]stn                    : Station to run integrity check on, comma separated stations allowed. If 'all', integrity check is run for all stations.
         --net network                      : Network of stations in --stn (if --stn is not in net.stn format). If --stn is not set, checks all stations in the network.
         --date StartDate[,EndDate]         : Date range to work on; can be yyyy/mm/dd or yyyy.doy or 'all'. If not specified, 'all' is assumed
         --stninfo_rinex                    : Check that the receiver serial number in the rinex headers agrees with the station info receiver serial number. Output message if it doesn't.
         --stninfo_proposed [--ignore days] : Output a proposed station.info using the RINEX metadata. Optional, specify --ignore to ignore station.info records <= days.
         --stninfo                          : Check the consistency of the station information records in the database. Date range does not apply.
         --gaps [--ignore days]             : Check the RINEX files in the database and look for gaps (missing days). Optional, specify --ignore with the smallest gap to display.
         --spatial_coherence [--fix/del]    : Check that the RINEX files correspond to the stations they are linked to using their PPP coordinate.
                                              Add --fix to try to solve problems. In case the problem cannot be solved, add the RINEX file to the excluded table.
                                              Add --del to delete problems instead of moving data to the excluded table.
         --print_stninfo long|short         : Output the station info to stdout. long outputs the full line of the station info. short outputs a short version (better for screen visualization).
         --rename [dest net].[dest stn]     : Takes the data from station --stn --net and renames (merges) it to [dest net].[dest stn].
                                              It also changes the rinex filenames in the archive to match those of the new destiny station.
                                              If multiple stations are given as origins, all of them will be renamed as [dest net].[dest stn].
                                              Limit the date range using the --date option

Directory structure for archive – can be anything you want, but there are two in use Demián’s (actually IGN?) format:

/volume/network/year/doy/(Holds Rinex for all sites for that day)

This makes getting all the data for one day, for processing, easy (but other things hard).

OSU format:

/volume/network/stn/year/doy/(Holds Rinex for 1 sites for that day, typically one file)

threaten, but not in archive yet to put station.info (1 line for that day) and apr file (1 line for PPP based apr for that day), so there will be 3 files. Highrate stored somewhere else, unusual to have more than 1 rinex file per directory. This format is a bit better for looking at data from one station.

[My proposed solution is Demián’s structure for the actual archive on the disk, with a parallel OSU structure done with soft links. Will need a link maintenance program.]

---staninfo flag – be careful. If have stn info for each rinex file – not used in Demián’s method – but makes sense in OSU method.

If stn info file inside network directory puts it there (¿don’t understand my notes here) [read “man” entry]

How to make archive – have to copy to new 16TB server in computer room.

  1. Demián can send me what he has and I add to it.
  2. start my own by using what I already have, using ArchiveServer to build
  3. take Demián’s and add campaign data, South Georgia, etc.

The pyIntegrityCheck.py can check for gaps, “spatial coherence” (file name, rinex header location, PPP location, all check out), station info (no overlaps of definition, etc.)

Clone this wiki locally