🐛 Bug Report: PocketID gets indexed by web crawlers #119

Closed
opened 2025-10-09 16:27:26 +03:00 by OVERLORD · 9 comments
Owner

Originally created by @Etienne-bdt on GitHub.

Reproduction steps

Not really related to the app itself but, have a pocketid instance for some time and bad luck.
Google your domain
Tadah !

Expected behavior

It should not be indexed using robots.txt (iirc)

Actual Behavior

It gets crawled and indexed on google

Version and Environment

Caddy reverse proxy with the latest pocketid

Log Output

No response

Originally created by @Etienne-bdt on GitHub. ### Reproduction steps Not really related to the app itself but, have a pocketid instance for some time and bad luck. Google your domain Tadah ! ### Expected behavior It should not be indexed using robots.txt (iirc) ### Actual Behavior It gets crawled and indexed on google ### Version and Environment Caddy reverse proxy with the latest pocketid ### Log Output _No response_
Author
Owner

@ElioDiNino commented on GitHub:

Ah yeah, the noindex meta tag should be enough to prevent indexing (although it doesn't stop Google and others from crawling). Not sure how @Etienne-bdt had his instance shown on Google 🧐

@ElioDiNino commented on GitHub: Ah yeah, the `noindex` meta tag should be enough to prevent indexing (although it doesn't stop Google and others from crawling). Not sure how @Etienne-bdt had his instance shown on Google 🧐
Author
Owner

@bluewalk commented on GitHub:

Don't really think this is a bug though.

I've added a route block to Caddyfile for robots.txt, just to be sure it wouldn't get indexed.

route /robots.txt {
   header Content-Type "text/plain; charset=utf-8"
   respond <<EOT
User-agent: *
Disallow: /
EOT
}
@bluewalk commented on GitHub: Don't really think this is a bug though. I've added a route block to Caddyfile for robots.txt, just to be sure it wouldn't get indexed. ``` route /robots.txt { header Content-Type "text/plain; charset=utf-8" respond <<EOT User-agent: * Disallow: / EOT } ```
Author
Owner

@Etienne-bdt commented on GitHub:

Yeah, it might not really qualify as a bug, didn't know what to put in there
Thing is I never served robots.txt and never got indexed on my other services, but thanks I'll give your solution a try !

@Etienne-bdt commented on GitHub: Yeah, it might not really qualify as a bug, didn't know what to put in there Thing is I never served robots.txt and never got indexed on my other services, but thanks I'll give your solution a try !
Author
Owner

@ElioDiNino commented on GitHub:

It would be nice if there was an environment variable for whether to serve robots.txt or, alternatively, serve a default with the option to override. Most other self-hosted services block indexing by default.

@ElioDiNino commented on GitHub: It would be nice if there was an environment variable for whether to serve robots.txt or, alternatively, serve a default with the option to override. Most other self-hosted services block indexing by default.
Author
Owner

@stonith404 commented on GitHub:

Pocket ID shouldn't be indexed by crawlers. Are you sure your instance gets indexed?

https://github.com/pocket-id/pocket-id/blob/main/frontend/src/app.html#L7

@stonith404 commented on GitHub: Pocket ID shouldn't be indexed by crawlers. Are you sure your instance gets indexed? https://github.com/pocket-id/pocket-id/blob/main/frontend/src/app.html#L7
Author
Owner

@Etienne-bdt commented on GitHub:

I added my domain to the google search console and got "some" insight. According to the console indexation and exploration are both allowed.
If it's alright I'll open a PR to add the robots.txt with both disallow and no index directives.

@Etienne-bdt commented on GitHub: I added my domain to the google search console and got "some" insight. According to the console indexation and exploration are both allowed. If it's alright I'll open a PR to add the robots.txt with both disallow and no index directives.
Author
Owner

@kmendell commented on GitHub:

Added in https://github.com/pocket-id/pocket-id/pull/806

@kmendell commented on GitHub: Added in https://github.com/pocket-id/pocket-id/pull/806
Author
Owner

@Etienne-bdt commented on GitHub:

Right, I just saw the noindex tag.
I'm positive it got indexed however.

Image

Google doesn't provide a lot of info aside that they indexed the website...

@Etienne-bdt commented on GitHub: Right, I just saw the `noindex` tag. I'm positive it got indexed however. <img width="1080" height="257" alt="Image" src="https://github.com/user-attachments/assets/1698f304-da93-459d-8988-cbea54871a32" /> Google doesn't provide a lot of info aside that they indexed the website...
Author
Owner

@ItalyPaleAle commented on GitHub:

I guess we could still create a static robots.txt to include in our app... Wouldn't hurt, right?

@ItalyPaleAle commented on GitHub: I guess we could still create a static `robots.txt` to include in our app... Wouldn't hurt, right?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/pocket-id-pocket-id-2#119