mirror of
https://github.com/pocket-id/pocket-id.git
synced 2025-12-09 22:52:58 +03:00
🐛 Bug Report: PocketID gets indexed by web crawlers #119
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Etienne-bdt on GitHub.
Reproduction steps
Not really related to the app itself but, have a pocketid instance for some time and bad luck.
Google your domain
Tadah !
Expected behavior
It should not be indexed using robots.txt (iirc)
Actual Behavior
It gets crawled and indexed on google
Version and Environment
Caddy reverse proxy with the latest pocketid
Log Output
No response
@ElioDiNino commented on GitHub:
Ah yeah, the
noindexmeta tag should be enough to prevent indexing (although it doesn't stop Google and others from crawling). Not sure how @Etienne-bdt had his instance shown on Google 🧐@bluewalk commented on GitHub:
Don't really think this is a bug though.
I've added a route block to Caddyfile for robots.txt, just to be sure it wouldn't get indexed.
@Etienne-bdt commented on GitHub:
Yeah, it might not really qualify as a bug, didn't know what to put in there
Thing is I never served robots.txt and never got indexed on my other services, but thanks I'll give your solution a try !
@ElioDiNino commented on GitHub:
It would be nice if there was an environment variable for whether to serve robots.txt or, alternatively, serve a default with the option to override. Most other self-hosted services block indexing by default.
@stonith404 commented on GitHub:
Pocket ID shouldn't be indexed by crawlers. Are you sure your instance gets indexed?
https://github.com/pocket-id/pocket-id/blob/main/frontend/src/app.html#L7
@Etienne-bdt commented on GitHub:
I added my domain to the google search console and got "some" insight. According to the console indexation and exploration are both allowed.
If it's alright I'll open a PR to add the robots.txt with both disallow and no index directives.
@kmendell commented on GitHub:
Added in https://github.com/pocket-id/pocket-id/pull/806
@Etienne-bdt commented on GitHub:
Right, I just saw the
noindextag.I'm positive it got indexed however.
Google doesn't provide a lot of info aside that they indexed the website...
@ItalyPaleAle commented on GitHub:
I guess we could still create a static
robots.txtto include in our app... Wouldn't hurt, right?