One of the challenges of migrating data to AWS is that you need to use the AWS APIs that might not be natively supported by an application or less known to the application team than traditional protocols like SFTP. AWS has resolved this limitation by introducing the AWS Transfer Family service. It is an SFTP or FTP gateway for S3 buckets or EFS, fully managed by AWS and integrated into the platform, esp. regarding access, permissions or security.
SFTP stands for Secure File Transfer Protocol and indeed in contrast to FTP the transfer of commands and data is cryptographically secured. Another advantage - especially compared to FTPS or Secure FTP - is that no problems can arise with regard to FTP passive and active mode if the communication has to pass through firewalls or NAT routers. In addition to the secure SFTP protocol, this service comes with AWS native access control features like IAM roles and policies, logical directories for S3 and Security Groups.
The service allows you to create custom user accounts, there is no dependency on the IAM accounts. Each user needs to generate the SSH key pair and share only the public key with the SFTP server. But the access to S3 is completely governed by the IAM role & policy, which is the standard in accessing AWS resources.
There are several ways to deploy the SFTP server, including publicly accessible or VPC hosted. The VPC hosted option enables the SFTP server to access the service from the internal network (including on-prem datacenter) via endpoint ENIs in the respective VPC subnet. This means that communication is not routed over the public network, but instead over AWS VPC endpoint.
Since AWS Transfer Family supports hosted server endpoints in centrally managed as well as shared Amazon VPC environments, which will become important next when we get to our client-specific implementation.
Our client here is from the enterprise segment and Telco domain currently undergoing a large-scale cloud transformation. The client has centrally maintained VPCs on AWS and uses the VPC sharing feature of AWS for making those VPCs accessible to project accounts. Also, Service Catalog is used as part of the landing zone of the client.
So, the idea was to offer a fully integrated data transfer area as a product via the service catalog, which consists of an AWS Transfer for SFTP server within the shared corporate VPC, a secured S3, the corresponding roles and a security group.
The SFTP server receives IP addresses from the subnet through the corresponding VPC endpoint ENIs, which are connected through Direct Connect with the on-premises datacenter. The user connects via the corresponding IP address to the SFTP server and starts the data migration. The data is securely stored on an S3 Bucket which comes with features like versioning, lifecycle management, replication or event-based notification. Subsequently the data can be automatically processed and loaded into AWS native services like for example RDS or Glacier.
Below figure shows the solution architecture. The whole solution is realized as a Cloudformation template (Infrastructure as Code), which is the most straightforward way to integrate a product into the service catalog. Maintenance of the solution is supported by a CI/CD pipeline.
With the possibility to exchange data over known protocols like SFTP, migrations to AWS can be started more easily for many users. This simplifies migration scenarios and provides on-premise access to virtually unlimited cloud storage.
To make it even more user-friendly, the SFTP server can also be addressed using a custom DNS name, for this purpose all IP addresses of the SFTP server can be linked to a DNS A record in a private hosted zone.
For more information on how to set it up you can follow the AWS documentation.