Skip to content

Examples of Crawlee scraper and Apify actor for NSWI126 course at MFF UK

License

Notifications You must be signed in to change notification settings

lhotanok/apify-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Example projects

Prerequisities:

Crawlee scraper my-crawler

This scraper was created from a Crawlee template TypeScript + CheerioCrawler using a command:

npx crawlee create my crawler

Installation

Install dependencies:

npm ci

Run

npm start

Apify actor tripadvisor-actor

This scraper was created from an Apify template TypeScript + CheerioCrawler using a command:

apify create tripadvisor actor

Installation

Install dependencies:

npm ci

Configuration

Initialize Apify storage directory:

apify init

An empty input file was generated by the apify init command. It should be located in storage/key_value_stores/default/INPUT.json. Fill in a JSON object in the following format:

{
    "startUrls": [
        "https://www.tripadvisor.com/Attractions-g274707-Activities-oa0-Prague_Bohemia.html",
        "https://www.tripadvisor.com/Attractions-g274707-Activities-oa30-Prague_Bohemia.html"
    ]
}

Provide at least 1 URL of an attraction listing page at Tripadvisor, such as Prague Attractions from the example above.

Results will be stored into storage/datasets/default directory. Each dataset item will have its own JSON file.

Run

If you omit the -p (--purge) flag, a storage won't be cleared before starting your next run. If you already processed some requests in the earlier run, those requests will be considered completed.

apify run -p

Deploy

You can deploy the actor to your Apify account with the following command:

apify push

Alternatively, you can provide a link to a GitHub / GitLab repository and build the project on the platform. The up-to-date code will be fetched from a remote repository.

Deploy an actor to Apify cloud platform