S3 Uploads
Document uploads use presigned URLs for direct browser-to-S3 transfers, bypassing API Gateway and Lambda payload limits.
Upload Flow
- The frontend calls
POST /intake/presigned-urlwith the filename. - The
getPresignedUrlLambda generates a presigned PUT URL for S3 and returns it along with the S3 key and submission ID. - The browser uploads the
.docxfile directly to S3 using the presigned URL. - After upload completes, the frontend submits the intake form with the S3 key.
- During async processing, the
documentAnalyzerLambda downloads the file from S3 using the same S3 key.
Why Presigned URLs?
API Gateway has a 10 MB request payload limit. Lambda synchronous invocations have a 6 MB payload limit. Dissertation documents can be up to 50 MB. Presigned URLs bypass both limits by having the browser upload directly to S3.
Bucket Configuration
| Setting | Value |
|---|---|
| Bucket name | dissertation-editor-uploads-dev |
| Region | us-east-1 |
| Accepted file types | .docx only |
| Maximum file size | 50 MB |
| Presigned URL expiry | 5 minutes |
Object Key Structure
Files are stored with the following key pattern:
uploads/{submissionId}/{originalFileName}.docx
Example: uploads/abc123/my-dissertation.docx
CORS Configuration
The S3 bucket has a CORS policy that allows the frontend to upload files directly from the browser:
[
{
"AllowedHeaders": ["*"],
"AllowedMethods": ["PUT"],
"AllowedOrigins": [
"http://localhost:3000",
"https://*.amplifyapp.com",
"https://dissertation-editor.com"
],
"ExposeHeaders": ["ETag"],
"MaxAgeSeconds": 3600
}
]
If you add a new frontend domain, update the CORS configuration to include it. CORS errors are the most common issue during development -- see Troubleshooting.
Lifecycle Policy
The bucket has a lifecycle policy to manage storage costs:
| Rule | Timing | Action |
|---|---|---|
| Transition to Glacier | 90 days after upload | Moves objects to Glacier Flexible Retrieval for cheaper storage |
| Expiration | 365 days after upload | Permanently deletes objects |
This means uploaded documents are available for immediate access for 90 days, archived in Glacier for the next 275 days, and automatically deleted after 1 year.
Retrieving Archived Files
If you need to access a file that has been transitioned to Glacier (after 90 days):
# Initiate a restore request (takes 3-5 hours for Standard retrieval)
aws s3api restore-object \
--bucket dissertation-editor-uploads-dev \
--key uploads/abc123/my-dissertation.docx \
--restore-request '{"Days":7,"GlacierJobParameters":{"Tier":"Standard"}}' \
--profile dissertation-editor
File Validation
The getPresignedUrl function validates that the filename ends in .docx before generating the presigned URL. No other file types are accepted.
The presigned URL also includes a Content-Type condition requiring application/vnd.openxmlformats-officedocument.wordprocessingml.document, which provides a second layer of file type validation at the S3 level.
File size is enforced via a Content-Length condition on the presigned URL, capping uploads at 50 MB.
Security
- Presigned URLs are time-limited (5 minutes) and scoped to a specific S3 key.
- The S3 bucket is not publicly accessible. Only presigned URLs and IAM-authenticated SDK calls can access objects.
- Objects are encrypted at rest using S3-managed encryption (SSE-S3).
- The bucket policy blocks non-HTTPS access.