S3 Uploads

Document uploads use presigned URLs for direct browser-to-S3 transfers, bypassing API Gateway and Lambda payload limits.

Upload Flow

  1. The frontend calls POST /intake/presigned-url with the filename.
  2. The getPresignedUrl Lambda generates a presigned PUT URL for S3 and returns it along with the S3 key and submission ID.
  3. The browser uploads the .docx file directly to S3 using the presigned URL.
  4. After upload completes, the frontend submits the intake form with the S3 key.
  5. During async processing, the documentAnalyzer Lambda downloads the file from S3 using the same S3 key.

Why Presigned URLs?

API Gateway has a 10 MB request payload limit. Lambda synchronous invocations have a 6 MB payload limit. Dissertation documents can be up to 50 MB. Presigned URLs bypass both limits by having the browser upload directly to S3.

Bucket Configuration

Setting Value
Bucket name dissertation-editor-uploads-dev
Region us-east-1
Accepted file types .docx only
Maximum file size 50 MB
Presigned URL expiry 5 minutes

Object Key Structure

Files are stored with the following key pattern:

uploads/{submissionId}/{originalFileName}.docx

Example: uploads/abc123/my-dissertation.docx

CORS Configuration

The S3 bucket has a CORS policy that allows the frontend to upload files directly from the browser:

[
  {
    "AllowedHeaders": ["*"],
    "AllowedMethods": ["PUT"],
    "AllowedOrigins": [
      "http://localhost:3000",
      "https://*.amplifyapp.com",
      "https://dissertation-editor.com"
    ],
    "ExposeHeaders": ["ETag"],
    "MaxAgeSeconds": 3600
  }
]

If you add a new frontend domain, update the CORS configuration to include it. CORS errors are the most common issue during development -- see Troubleshooting.

Lifecycle Policy

The bucket has a lifecycle policy to manage storage costs:

Rule Timing Action
Transition to Glacier 90 days after upload Moves objects to Glacier Flexible Retrieval for cheaper storage
Expiration 365 days after upload Permanently deletes objects

This means uploaded documents are available for immediate access for 90 days, archived in Glacier for the next 275 days, and automatically deleted after 1 year.

Retrieving Archived Files

If you need to access a file that has been transitioned to Glacier (after 90 days):

# Initiate a restore request (takes 3-5 hours for Standard retrieval)
aws s3api restore-object \
  --bucket dissertation-editor-uploads-dev \
  --key uploads/abc123/my-dissertation.docx \
  --restore-request '{"Days":7,"GlacierJobParameters":{"Tier":"Standard"}}' \
  --profile dissertation-editor

File Validation

The getPresignedUrl function validates that the filename ends in .docx before generating the presigned URL. No other file types are accepted.

The presigned URL also includes a Content-Type condition requiring application/vnd.openxmlformats-officedocument.wordprocessingml.document, which provides a second layer of file type validation at the S3 level.

File size is enforced via a Content-Length condition on the presigned URL, capping uploads at 50 MB.

Security

  • Presigned URLs are time-limited (5 minutes) and scoped to a specific S3 key.
  • The S3 bucket is not publicly accessible. Only presigned URLs and IAM-authenticated SDK calls can access objects.
  • Objects are encrypted at rest using S3-managed encryption (SSE-S3).
  • The bucket policy blocks non-HTTPS access.