Architecture diagram


Technology stack

Current deployment is using the following technologies

  • Docker
    • Package models and dependencies in a single container
    • Distribute docker images in Docker Hub
  • AWS
    • AWS Lambda
      • Handle GitHub web hooks
    • AWS API Gateway
      • Expose lambda handler as API
    • AWS Batch
      • Schedule webcompat-ml tasks
    • AWS S3
      • Store webcompat-ml tasks results
  • GitHub API / webhooks
    • Extract data to build datasets
    • Consume webhook events to trigger the automation

Even though most of the services are deployed in the cloud, all the primitives can be self hosted. The idea is that a webhook from GitHub triggers the automation and a simple HTTP API handles the request and spawns a task.


Infrastructure as Code


All the infrastructure is managed as code and the codebase lives under mozilla/webcompat-ml-deploy.

To avoid over-complicating things, terraform is maintained in the git repository encrypted using git-crypt.


For each change maintainers should make sure that the state is also checked in the repository. The state also leaks credentials so its important to always make sure that the state is encrypted before pushing.

All ML tasks should be described as a Dockerfile under docker/ and should have the ML model prebundled.


Regular maintenance tasks

Build the needsdiagnosis model dataset

$ webcompat-ml-needsdiagnosis build-dataset --es-url "<URL>" --es-index-name="<INDEX>" --es-doc-type="<TYPE>" --output "</path/to/dataset.csv>"

Train the needsdiagnosis model

$ webcompat-ml-needsdiagnosis train --data "</path/to/dataset.csv>" --output "</path/to/model.bin>"

Releasing a new needsdiagnosis task image

$ cd webcompat-ml-deploy/docker/needsdiagnosis
$ docker build . -t ml-task:needsdiagnosis --build-arg MODEL_PATH="</path/to/model.bin>"
$ docker tag ml-task:needsdiagnosis mozillawebcompat/ml-task:needsdiagnosis
$ docker push mozillawebcompat/ml-task:needsdiagnosis

Applying a terraform change

$ git-crypt unlock
$ terraform plan
$ terraform apply
$ git add .
$ git add terraform.tfstate
$ git add terraform.tfstate.backup
$ git commit -m '<change applied>'