🚀 Feature: Store Data in S3 #238

Open
opened 2025-10-07 00:06:57 +03:00 by OVERLORD · 9 comments
Owner

Originally created by @wrenix on GitHub.

Feature description

I like to run this container stateless (with an external database/postgresql) and storage somewhere else (S3).

Pitch

Store all data (exclude sqlite) in a S3 Storage (like uploads/avatars) and the keys maybe into the database or also into S3

Originally created by @wrenix on GitHub. ### Feature description I like to run this container stateless (with an external database/postgresql) and storage somewhere else (S3). ### Pitch Store all data (exclude sqlite) in a S3 Storage (like uploads/avatars) and the keys maybe into the database or also into S3
OVERLORD added the open to pull requests label 2025-10-07 00:06:57 +03:00
Author
Owner

@kmendell commented on GitHub:

generated masterkey and crypt the keys before upload them into DB or S3

You would still have the problem of safely storing the "master key"

The biggest concern here, as if we implement this, then its a matter of we have to support Key Vault, Hashicorp vault , and more as if someone doesnt use that one they want theirs, and that adds a lot of complexity to stuff. As I'm assuming all of they use there own API's on how to store stuff.

Either way when i talked to elias, we aren't sure about this one yet, as it seems overkill for the project currently as the only benefit (for S3) is running the container stateless.

We are leaving it open and checking if people want this, but it will have to very popular for us to consider implementing it.

@kmendell commented on GitHub: > > generated masterkey and crypt the keys before upload them into DB or S3 > > You would still have the problem of safely storing the "master key" The biggest concern here, as if we implement this, then its a matter of we have to support Key Vault, Hashicorp vault , and more as if someone doesnt use that one they want theirs, and that adds a lot of complexity to stuff. As I'm assuming all of they use there own API's on how to store stuff. Either way when i talked to elias, we aren't sure about this one yet, as it seems overkill for the project currently as the only benefit (for S3) is running the container stateless. We are leaving it open and checking if people want this, but it will have to very popular for us to consider implementing it.
Author
Owner

@wrenix commented on GitHub:

I understand your problem. We need a solution for the keys, that is correct (generated masterkey and crypt the keys before upload them into DB or S3 - or use of a external key management like hashicorp vault or kubernetes-secrets ... ). My wish is to be scaleable without hassle of an HDD writeable from multiple instances/pods.

Maybe we should focus on the first step of uploads.

@wrenix commented on GitHub: I understand your problem. We need a solution for the keys, that is correct (generated masterkey and crypt the keys before upload them into DB or S3 - or use of a external key management like hashicorp vault or kubernetes-secrets ... ). My wish is to be scaleable without hassle of an HDD writeable from multiple instances/pods. Maybe we should focus on the first step of uploads.
Author
Owner

@ItalyPaleAle commented on GitHub:

I would not recommend storing private keys into S3. Even a database is a bit iffy. If anything, I'd suggest adding support for things like AWS KMS (and Azure Key Vault, Hashicorp Vault, etc)

@ItalyPaleAle commented on GitHub: I would not recommend storing private keys into S3. Even a database is a bit iffy. If anything, I'd suggest adding support for things like AWS KMS (and Azure Key Vault, Hashicorp Vault, etc)
Author
Owner

@ItalyPaleAle commented on GitHub:

generated masterkey and crypt the keys before upload them into DB or S3

You would still have the problem of safely storing the "master key"

Maybe we should focus on the first step of uploads.

👍

@ItalyPaleAle commented on GitHub: > generated masterkey and crypt the keys before upload them into DB or S3 You would still have the problem of safely storing the "master key" > Maybe we should focus on the first step of uploads. 👍
Author
Owner

@nicolerenee commented on GitHub:

The biggest concern here, as if we implement this, then its a matter of we have to support Key Vault, Hashicorp vault , and more as if someone doesnt use that one they want theirs, and that adds a lot of complexity to stuff. As I'm assuming all of they use there own API's on how to store stuff.

If you do decide to pursue this I'd recommend taking a look at https://gocloud.dev/howto/secrets/. It provides a clean abstraction to managing secrets across multiple different secret stores.

@nicolerenee commented on GitHub: > The biggest concern here, as if we implement this, then its a matter of we have to support Key Vault, Hashicorp vault , and more as if someone doesnt use that one they want theirs, and that adds a lot of complexity to stuff. As I'm assuming all of they use there own API's on how to store stuff. If you do decide to pursue this I'd recommend taking a look at https://gocloud.dev/howto/secrets/. It provides a clean abstraction to managing secrets across multiple different secret stores.
Author
Owner

@ItalyPaleAle commented on GitHub:

Running on serverless is really interesting. But I would be careful with using the GCS FUSE adapter for SQLite: seems that there can be a bunch of problems

I think the FUSE adapter would probably be ok (maybe with sub-par performance) for the rest of the data Pocket ID stores on the FS (images etc), but I really wouldn't store SQLite there. You can use Postgres for the main DB.

@ItalyPaleAle commented on GitHub: Running on serverless is really interesting. But I would be careful with using the GCS FUSE adapter for SQLite: seems that there [can be a bunch of problems](https://www.reddit.com/r/googlecloud/comments/1afm74h/sqlite_running_in_cloud_run_with_gcs_volume_mount/) I think the FUSE adapter would probably be ok (maybe with sub-par performance) for the rest of the data Pocket ID stores on the FS (images etc), but I really wouldn't store SQLite there. You can use Postgres for the main DB.
Author
Owner

@kmendell commented on GitHub:

Running on serverless is really interesting. But I would be careful with using the GCS FUSE adapter for SQLite: seems that there can be a bunch of problems

I think the FUSE adapter would probably be ok (maybe with sub-par performance) for the rest of the data Pocket ID stores on the FS (images etc), but I really wouldn't store SQLite there. You can use Postgres for the main DB.

Test in Production :D -- Just kidding :P

@kmendell commented on GitHub: > Running on serverless is really interesting. But I would be careful with using the GCS FUSE adapter for SQLite: seems that there [can be a bunch of problems](https://www.reddit.com/r/googlecloud/comments/1afm74h/sqlite_running_in_cloud_run_with_gcs_volume_mount/) > > I think the FUSE adapter would probably be ok (maybe with sub-par performance) for the rest of the data Pocket ID stores on the FS (images etc), but I really wouldn't store SQLite there. You can use Postgres for the main DB. Test in Production :D -- Just kidding :P
Author
Owner

@crcastle commented on GitHub:

Regarding using Pocket ID in a serverless environment, I've deployed it to Google Cloud Run. To persist data, it's using a "volume" backed by Google Clound Storage (like AWS S3) via Google's FUSE filesystem adapter. So the uploads folder, keys folder, and SQLite DB file are all stored on that.

It's been running for about a week with no issues. Cold starts take no more than 3 sec (95th percentile). I haven't had any file corruption problems yet (fingers crossed). I have limited Cloud Run to only ever scale up to one instance so that there are not multiple Pocket ID processes trying to access the same SQLite DB.

I'm unsure how file locking works (or if it even works at all), with this FUSE adapter. Also I don't know how or if it guarantees order of reads and writes. I'm testing this on some non-critical systems and backing up the Google Cloud Storage bucket frequently should the DB file get corrupted.

Blog post tutorial is forthcoming, but I'm waiting a week or two to make sure I'm really not getting any DB file corruption.

@crcastle commented on GitHub: Regarding using Pocket ID in a serverless environment, I've deployed it to Google Cloud Run. To persist data, it's using a "volume" backed by Google Clound Storage (like AWS S3) via [Google's FUSE filesystem adapter](https://github.com/GoogleCloudPlatform/gcsfuse). So the uploads folder, keys folder, and SQLite DB file are all stored on that. It's been running for about a week with no issues. Cold starts take no more than 3 sec (95th percentile). I haven't had any file corruption problems yet (fingers crossed). I have limited Cloud Run to only ever scale up to one instance so that there are not multiple Pocket ID processes trying to access the same SQLite DB. I'm unsure how file locking works (or if it even works at all), with this FUSE adapter. Also I don't know how or if it guarantees order of reads and writes. I'm testing this on some non-critical systems and backing up the Google Cloud Storage bucket frequently should the DB file get corrupted. Blog post tutorial is forthcoming, but I'm waiting a week or two to make sure I'm really not getting any DB file corruption.
Author
Owner

@crcastle commented on GitHub:

But I would be careful with using the GCS FUSE adapter for SQLite: seems that there can be a bunch of problems

Saw that Reddit thread too, and I generally share the same trepidation. I haven't experienced any problems yet though, so I'm going to keep testing. I also wonder if a lot of the posts in that Reddit thread were before v2 of the GCS FUSE adapter was released. It seems that v2 has improved it quite a bit -- not just with new features, but also with stability. Also the current size of my Pocket ID's SQLite file is 300 KB, so that can move pretty quickly back and forth between the FUSE filesystem and the bucket.

You can use Postgres for the main DB.

I considered that, but I'm cheap ¯\_(ツ)_/¯ (at least for my current use of Pocket ID). Didn't want to pay for another server or instance or a managed Postgres DB.

I'm definitely not advocating others try doing what I'm doing, but it's been an interesting test.

@crcastle commented on GitHub: > But I would be careful with using the GCS FUSE adapter for SQLite: seems that there [can be a bunch of problems](https://www.reddit.com/r/googlecloud/comments/1afm74h/sqlite_running_in_cloud_run_with_gcs_volume_mount/) Saw that Reddit thread too, and I generally share the same trepidation. I haven't experienced any problems yet though, so I'm going to keep testing. I also wonder if a lot of the posts in that Reddit thread were before [v2 of the GCS FUSE adapter](https://github.com/GoogleCloudPlatform/gcsfuse?tab=readme-ov-file#new-cloud-storage-fuse-v2x-features) was released. It seems that v2 has improved it quite a bit -- not just with new features, but also with stability. Also the current size of my Pocket ID's SQLite file is 300 KB, so that can move pretty quickly back and forth between the FUSE filesystem and the bucket. > You can use Postgres for the main DB. I considered that, but I'm cheap `¯\_(ツ)_/¯` (at least for my current use of Pocket ID). Didn't want to pay for another server or instance or a managed Postgres DB. I'm definitely not advocating others try doing what I'm doing, but it's been an interesting test.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/pocket-id#238