AWS Cost Analysis

For the KOMP platform proof-of-concept to capture compute and storage cost-related
metrics we set up the following infrastructure in AWS that was used to run the
could-native containers for the website. An EKS master plane and 2 x t3.xlarge EC2
(reserved instance pricing) EKS worker nodes to provide a Kubernetes environment.
A single t3.medium EC2 (reserved instance pricing) for use as a Jenkins server that
was also used as a bastion host. A t4g.large EC2 (reserved instance pricing) to run a
self-hosted MongoDB database node. In addition, we created several S3 buckets to
house data, most notably: s3://komp-data hosting ~3.7M image objects (1.6 TB) – set
up as a public bucket with direct static content serving capabilities turned on. We also
used CloudWatch to collect infrastructure and application logs and several ELBs to
expose the services along with a Route53 hosted zone.

Cost Analysis

We analysed components of the IMPC infrastructure to determine feasibility for cloud
hosting, comprising: the relational database underlying the website, APIs, SOLR
cores, the gene, production and phenotype status tracking resource (GenTaR),
downloadable data releases including >750K 2D and 3D images and the data
release processes currently running on premise at EMBL-EBI. Data releases are
both current – 2 per annum and historical (32 releases). Our analysis determined that
the graphical user interface (GUI) was well suited to a cloud deployment as the most
frequently accessed infrastructure component therefore also likely to provide useful
information for metrics. GenTaR is primarily used by intra IMPC Consortium users
rather than the wider scientific community. Image data – currently 15TB and stored
in a local OMERO instance. A move from OMERO is desirable as technology is old
but that cloud deployment would not be cost effective given the volume of images vs.
their relative use.

Deployment of IMPC’s new graphical user interface to AWS instance. The
GUI is designed based on user needs and user experience testing (see Main
Award RPPR) and optimises performance of common user queries- ensuring
that metrics gained are representative of the current user needs and AWS
deployment choices. Our intention is that the AWS hosted GUI will be the
primary website for IMPC going forward.

  • Four types of costs were captured and analysed and we concluded the
    overall costs associated with the long-term sustainability of the resource in
    AWS, assuming one environment, are estimated at around $1800/ month
    including the reserved instance compute model costs). We consider this
    viable for future site hosting on commercial cloud.
    • Compute cost: Over six months from October 2023 – March 2024 EC2
      instance cost was constant at about ~$35/day or $1100/month for the
      on-demand pricing model, but with the reserved instance model we
      saved 17.7% of the on-demand pricing. The EC2-EKS was constant at
      $2.23 / day, leading to a total monthly cost of $69.
    • Storage costs: S3: $45 / month. This includes only S3, using standard
      S3 buckets, without intelligent tiering. EBS:VolumeUsage.gp2 ~ $50
      /month. EBS:SnapshotUsage ~ $6 / month
    • Data transfer costs: $45 for each data release in S3 transfer fees. We
      have modelled costs associated with data transfer for 10,000
      individual users over short periods (days). During the experiment
      costs increased by about $11.5 / day, leading to a projected total
      monthly cost of ~$345.
    • Other variable costs: At this point in the PoC the additional costs of
      interest were:
      • CloudWatch metric monitoring ~$100 / month, VPC ~ $35 /
        month, NAT Gateways ~$50 / month. Load balancers ~$50 /
        month