The Crawler is built with Node.js and development is primarily supported for Visual Studio Code.
After Cloning or Pulling ¶
Install node modules:
cd crawler npm install
Setting up Editor ¶
Install all recommended extensions when prompted to by Visual Studio Code after opening the workspace. These are configured in
.vscode/extensions.json and include:
- markdownlint: Provides live feedback about style rules and potential code issues while editing Markdown sources
- Node Debug: Provides for live debugging of Node.js scripts
test-organizations.json file at
crawler/lib/repositories/organizations/__fixtures__/test-organizations.json is available with a minimal set of organizations for testing.
Debugging Within Visual Studio Code ¶
Crawl * launch configurations can be used to run partial or full crawls with interactive debugging available. Any breakpoints set within
run.js or any classes will pause.
These are configured to use the abbreviated
test-organizations.json file described again, comment out the argument to run a full crawl.
Debugging on the Command Line ¶
You can run the crawler from the command line with interactive debugging enabled for attachment over TCP:
# export GITHUB_ACTOR and GITHUB_TOKEN if you git rate limits node --inspect-brk \ crawler/run.js \ --all \ --commit-to='snapshot/v1' \ --commit-orgs-to='cfapi/orgs/v1' \ --orgs-source='crawler/lib/repositories/organizations/__fixtures__/test-organizations.json'
Running Tests ¶
Testing Within Visual Studio Code ¶
Crawler: Jest Current File debug configuration selected, you can press F5 or otherwise run the Start Debugging command to execute the
*.test.js file you have open. In most cases, you can do this with a source file open too and Jest will find the associated tests. Breakpoints in both test files and sources should work when running tests this way.
Crawler: Jest All debug configuration to run all available tests.
Testing on the Command Line ¶
cd crawler npm run test
The main entrypoint for running The Crawler’s command. Yargs is used to parse arguments and implement the command. Run
node run.js --help to see all available options.
Repository classes provide for interaction with specific populations of records.
Parser classes help read records from raw data.
Tracks runtime and development node module dependencies for The Crawler
Jest tests, (mostly) aligned with the source files they cover.
Global configuration for Jest.
Script configured to run before every test suite.
ESLint configuration that should work with Visual Studio Code’s ESLint extension