Skip to content

Commit

Permalink
Update start-here.md with clone and checksum instructions
Browse files Browse the repository at this point in the history
This cl:
- Adds instructions for cloning the repo if needed.
- Adds the command for `md5sum`
- Adds `sahel-sh` to repro log section
  • Loading branch information
sahel-sh committed Jul 21, 2023
1 parent b9524cd commit 475910c
Showing 1 changed file with 12 additions and 3 deletions.
15 changes: 12 additions & 3 deletions docs/start-here.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,14 @@ Bringing together everything we've discussed so far, a test collection consists

Here, we're going to introduce the [MS MARCO passage ranking test collection](https://microsoft.github.io/msmarco/).

In these instructions we're going to use Anserini's root directory as the working directory.
Assuming you've cloned the repo already...
If you haven't cloned the [anserini](https://github.com/castorini/anserini) repository already, clone it and get its `tools` submodule:
```bash
git clone https://github.com/castorini/anserini.git
cd anserini
git submodule update --init --recursive
```

In these instructions we're going to use Anserini's root directory as the working directory.
First, we need to download and extract the data:

```bash
Expand All @@ -130,7 +135,10 @@ wget https://msmarco.blob.core.windows.net/msmarcoranking/collectionandqueries.t
tar xvfz collections/msmarco-passage/collectionandqueries.tar.gz -C collections/msmarco-passage
```

To confirm, `collectionandqueries.tar.gz` should have MD5 checksum of `31644046b18952c1386cd4564ba2ae69`.
To confirm, `collectionandqueries.tar.gz` should have MD5 checksum of `31644046b18952c1386cd4564ba2ae69`:
```bash
md5sum collections/msmarco-passage/collectionandqueries.tar.gz
```

If you peak inside the collection:

Expand Down Expand Up @@ -269,4 +277,5 @@ From here, you're now ready to proceed to try and reproduce the [BM25 Baselines
](experiments-msmarco-passage.md).
## Reproduction Log[*](reproducibility.md)
- `sahel-sh` on 2023-07-20

0 comments on commit 475910c

Please sign in to comment.