Deployment¶

Architecture diagram¶

Technology stack¶

Current deployment is using the following technologies

Docker
- Package models and dependencies in a single container
- Distribute docker images in Docker Hub
AWS
- AWS Lambda
  - Handle GitHub web hooks
- AWS API Gateway
  - Expose lambda handler as API
- AWS Batch
  - Schedule webcompat-ml tasks
- AWS S3
  - Store webcompat-ml tasks results
GitHub API / webhooks
- Extract data to build datasets
- Consume webhook events to trigger the automation

Even though most of the services are deployed in the cloud, all the primitives can be self hosted. The idea is that a webhook from GitHub triggers the automation and a simple HTTP API handles the request and spawns a task.

Infrastructure as Code¶

Dependencies¶

About¶

All the infrastructure is managed as code and the codebase lives under mozilla/webcompat-ml-deploy.

To avoid over-complicating things, terraform is maintained in the git repository encrypted using git-crypt.

Important

For each change maintainers should make sure that the state is also checked in the repository. The state also leaks credentials so its important to always make sure that the state is encrypted before pushing.

All ML tasks should be described as a Dockerfile under docker/ and should have the ML model prebundled.

Examples¶

Regular maintenance tasks¶

Build the needsdiagnosis model dataset

$ webcompat-ml-needsdiagnosis build-dataset --es-url "<URL>" --es-index-name="<INDEX>" --es-doc-type="<TYPE>" --output "</path/to/dataset.csv>"

Train the needsdiagnosis model

$ webcompat-ml-needsdiagnosis train --data "</path/to/dataset.csv>" --output "</path/to/model.bin>"

Releasing a new needsdiagnosis task image

$ cd webcompat-ml-deploy/docker/needsdiagnosis
$ docker build . -t ml-task:needsdiagnosis --build-arg MODEL_PATH="</path/to/model.bin>"
$ docker tag ml-task:needsdiagnosis mozillawebcompat/ml-task:needsdiagnosis
$ docker push mozillawebcompat/ml-task:needsdiagnosis

Applying a terraform change

$ git-crypt unlock
$ terraform plan
$ terraform apply
$ git add .
$ git add terraform.tfstate
$ git add terraform.tfstate.backup
$ git commit -m '<change applied>'