247 Commits

Author SHA1 Message Date
Paul Pfeister 4e2a4f6b66 Merge pull request #2919 from quan-nguyen-2110/fix-cracked-forum-false-positive
Fix Cracked Forum false positives
2026-05-04 23:28:52 -04:00
Paul Pfeister 2b985b57ad Merge pull request #2921 from quan-nguyen-2110/fix-akniga-false-negative
Fix akniga false negatives
2026-05-04 23:28:14 -04:00
Paul Pfeister ed0865363f Merge pull request #2929 from mohamedsolaiman/fix/false-positives
fix: resolve false positives for ArtStation, GeeksforGeeks, and LushStories
2026-05-04 23:23:43 -04:00
Paul Pfeister 43a354b235 Merge pull request #2853 from salmanrajz/fix/unicode-decode-error-special-chars
fix: handle UnicodeDecodeError on usernames with special characters
2026-05-04 23:12:52 -04:00
Paul Pfeister aa5c3b0010 Merge pull request #2930 from mohamedsolaiman/feature/new-sites
feat: add Carrd, SpaceHey, and Substack as supported sites
2026-05-04 23:07:07 -04:00
Siddharth Dushantha 2df7c61be8 Merge pull request #2939 from sherlock-project/fix-vuln
Fix command injection vuln
2026-05-02 09:46:59 +02:00
Siddharth Dushantha 61aae782ee version bump 2026-05-02 09:42:36 +02:00
Siddharth Dushantha 6eaec5cccd Fix command injection vuln 2026-05-02 09:27:28 +02:00
Mohamed Solaiman dca64e35d3 feat: add Carrd, SpaceHey, and Substack as supported sites
- Carrd: Simple website builder with profiles at {username}.carrd.co.
  Uses status_code detection (404 for non-existing profiles).

- SpaceHey: Retro social network inspired by MySpace.
  Uses message detection ("Not Found (Error 404) | SpaceHey" title
  for non-existing profiles).

- Substack: Newsletter/publishing platform with profiles at
  {username}.substack.com. Uses status_code detection (404 for
  non-existing publications).
2026-04-28 17:03:23 +00:00
Mohamed Solaiman 2e2248a8a6 fix: resolve false positives for ArtStation, GeeksforGeeks, and LushStories
- ArtStation: Add urlProbe using the JSON API endpoint
  (https://www.artstation.com/users/{}.json) which returns proper
  404 for non-existing users, instead of the main page which
  returns 200 for both existing and non-existing profiles.
  Closes #2714

- GeeksforGeeks: Switch from status_code to message detection.
  Both existing and non-existing profiles return HTTP 200, but
  non-existing profiles have "false" in the page title.
  Closes #2782

- LushStories: Switch from status_code to response_url detection.
  Non-existing profiles redirect (302) to /login while existing
  profiles return 200. Closes #2371
2026-04-28 17:01:37 +00:00
QuanNguyen a9960ff9a4 Fix akniga false negatives
Made-with: Cursor
2026-04-26 16:00:27 +02:00
QuanNguyen d731f715bf Fix Cracked Forum false positives
Made-with: Cursor
2026-04-26 15:44:27 +02:00
Siddharth Dushantha 271608fb22 Merge pull request #2898 from sherlock-project/improvements
Make Minor Improvements
2026-04-12 17:54:11 +02:00
Siddharth Dushantha eb79980c33 Remove unused line of code 2026-04-12 17:48:42 +02:00
Siddharth Dushantha e2a225697f Fix missing punctuation 2026-04-12 17:38:01 +02:00
Siddharth Dushantha 173ae5b824 Update usage 2026-04-12 17:35:14 +02:00
Siddharth Dushantha dcb935337c Remove --no-txt
It was removed a long time ago but the argumenet still exists.
2026-04-12 17:32:35 +02:00
Siddharth Dushantha ed883ad7c8 fix copy paste error 2026-04-12 16:55:54 +02:00
Siddharth Dushantha a68ea46fb4 Removed unesseary unnecessary returns 2026-04-12 16:54:42 +02:00
Siddharth Dushantha ed73b175d7 Use data.sherlockproject.xyz
I've created data.sherlockproject.xyz so that it will be eaiser for
people use Sherlock's data in other projects if needed.
2026-04-12 16:49:37 +02:00
Siddharth Dushantha a192cb4bfe Merge pull request #2897 from sherlock-project/clean-up
Minor clean up
2026-04-12 13:54:08 +02:00
Siddharth Dushantha b5e891550c Cleaned up footer text 2026-04-12 13:52:18 +02:00
Siddharth Dushantha 190c2af514 Mention uv as pip alternative 2026-04-12 13:50:55 +02:00
Siddharth Dushantha 8175af39ae Removed Apify Actor Usage 2026-04-12 13:47:20 +02:00
salmanrajz 32fde9bfc6 fix: update NSFW tests to use sites not in exclusions list
Pornhub was added to the remote false_positive_exclusions.txt, causing
test_remove_nsfw and test_nsfw_explicit_selection to fail since the
site gets filtered out before the test runs. Replaced with Xvideos and
Erome which are NSFW-flagged but not excluded.
2026-03-31 20:11:55 +04:00
salmanrajz 4656d95702 fix: handle UnicodeDecodeError on usernames with special characters
Fixes #2730. Usernames containing non-ASCII characters (e.g. 'Émile')
can trigger a UnicodeDecodeError inside the requests library during
redirect handling. This exception is not a subclass of
requests.exceptions.RequestException, so it escaped all existing
except blocks in get_response() and crashed the program.

Added a catch for UnicodeError (parent of both UnicodeDecodeError and
UnicodeEncodeError) so these sites are gracefully skipped instead of
crashing the entire scan.

Added regression tests in tests/test_unicode.py.
2026-03-31 19:57:54 +04:00
Paul Pfeister 574aeb4ac5 Merge pull request #2824 from vatsalgargg/fix-linkedin-waf
fix: bypass LinkedIn WAF blocking requests (HTTP 999)
2026-03-17 02:26:18 -04:00
vatsalgargg 382bc3210a fix: bypass LinkedIn WAF with standard browser headers 2026-03-12 23:14:40 +05:30
Paul Pfeister 17c443af19 Merge pull request #2812 from cclauss/patch-5
Test on Python 3.14 and free-threaded Python 3.14t
2026-03-08 17:34:47 -07:00
Paul Pfeister 9d6c47fdb4 Merge pull request #2811 from danielalanbates/fix/chess-com-case-sensitivity
fix: allow uppercase letters in Chess.com username regex
2026-02-28 22:18:26 -05:00
Christian Clauss 10bed20e70 Test on Python 3.14 and free-threaded Python 3.14t 2026-02-20 12:14:34 +01:00
Your Name fd3833b744 fix: allow uppercase letters in Chess.com username regex
Fixes #2799

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 11:24:27 -08:00
Paul Pfeister 8f8ebf3c15 Merge pull request #2794 from ellieplayswow/feature/add-wow-sites
data: add Wowhead, Wago addons, CurseForge
2026-02-04 17:39:01 -08:00
ellieplayswow 4253014085 data: add Wowhead, Wago addons, CurseForge 2026-02-03 21:06:11 +00:00
Paul Pfeister 725c68907a Merge pull request #2791 from amydosomething/add-sites
Add Shelf support
2026-01-24 00:49:10 -08:00
amydosomething c66d10bfed Remove Fiverr and Substack from PR 2026-01-20 19:53:27 +05:30
amydosomething e0002779b4 Add sites Fiverr, Substack and Shelf to data.json 2026-01-20 18:43:49 +05:30
Paul Pfeister 8f1308b90d Merge pull request #2758 from Aaditya-Chunekar/patch-2
Add Credly data to JSON resource
2025-12-29 19:54:44 -08:00
Paul Pfeister e856b05c2c Merge pull request #2636 from simplyNour/Bug/fix-gradle-false-pos-test-failure
Bug: Fix local variable scoping issue affecting false-pos test output
2025-12-29 18:56:30 -08:00
Aaditya fe9e750dab Add Credly data to JSON resource 2025-11-14 09:27:07 +05:30
Paul Pfeister 842ae1f754 Merge pull request #2733 from Aaditya-Chunekar/patch-1
Add Nothing Community data to data.json
2025-10-29 16:34:10 -07:00
Paul Pfeister 339634f7bc Merge pull request #2737 from Nolanp123/fix-minecraft-regex
Fix Minecraft False Positives for Long Usernames
2025-10-28 20:47:32 -07:00
Nolan Parker c1632693bb Add regexCheck to Minecraft to prevent false positives for long usernames 2025-10-28 20:39:53 -05:00
Aaditya e19cb32009 Add Nothing Community data to data.json 2025-10-27 11:20:30 +05:30
Paul Pfeister b69c8ef940 Merge pull request #2710 from Aaditya-Chunekar/add-sites
hacktoberfest: Added sites support
2025-10-26 00:16:29 -07:00
Aaditya-Chunekar 2724711060 feat: add tmdb 2025-10-26 09:49:31 +05:30
Paul Pfeister 0a68ab7f4c Merge pull request #2709 from Aaditya-Chunekar/add-topmate
hacktoberfest: Add topmate.io support
2025-10-24 20:15:02 -07:00
Paul Pfeister 8675178be1 Merge pull request #2705 from Aaditya-Chunekar/add-site-seoforum
hacktoberfest: Add SEO Forum Support
2025-10-24 20:12:50 -07:00
Aaditya-Chunekar 9bafb8a280 feat: add n8n, HackerSploit, Arduino Forum 2025-10-24 09:37:40 +05:30
Aaditya-Chunekar 8e5549862a feat: add topmate.io 2025-10-24 09:14:42 +05:30
Aaditya-Chunekar 8797fcd517 feat: add SEOForum 2025-10-24 08:46:23 +05:30
Paul Pfeister 0995d4d669 chore: reformat 2025-10-23 19:39:05 -04:00
Paul Pfeister 6c0c273a0b Merge pull request #2695 from simplyNour/Bug/urls-are-not-clickable-in-excel-file
Make urls clickable when saved to excel
2025-10-23 16:25:17 -07:00
Paul Pfeister 3eeba790fd Merge pull request #2722 from VivekGaddam/Twitch_Added
Added Twitch Platform Support to Sherlock
2025-10-23 15:28:01 -07:00
Paul Pfeister 61a29ec373 Merge pull request #2723 from imhiteshgarg/adding_lemmy
adding lemmy
2025-10-23 15:26:57 -07:00
Paul Pfeister 9fbbbf7c73 Merge pull request #2724 from obiwan04kanobi/feat/add-codolio
feat: add Codolio to supported sites
2025-10-23 15:26:16 -07:00
obiwan04kanobi 331b68d909 feat: add Codolio to supported sites
Add Codolio (coding portfolio tracker) as a new site target for username detection.

Detection method: Message-based using title tag differences
- Existing profiles: '<title>Username | Codolio</title>'
- Non-existing profiles: '<title>Page Not Found | Codolio</title>'

Tested with multiple usernames to confirm accurate detection.
2025-10-23 22:42:06 +05:30
Hitesh Garg 8c3e093561 adding lemmy
adding lemmy
2025-10-23 21:38:18 +05:30
vivekgaddam e35e5e3af1 corrected Twitch 2025-10-23 19:41:00 +05:30
vivekgaddam 906287b305 added twitch 2025-10-23 19:18:31 +05:30
Matheus Felipe 0dbb6abcc5 Fix Minor Capitalization Issue in README.md (#2716) 2025-10-23 09:08:29 -03:00
Matheus Felipe 03e097cc82 Reorder Terraria Forums to correct alphabetical position (#2700) 2025-10-23 08:53:50 -03:00
Matheus Felipe 91c1964918 Add GameFaqs support (#2721)
Co-authored-by: Maquinero123456 <jimenanavarrodavid@uma.es>
2025-10-23 08:04:41 -03:00
Matheus Felipe 373f3d389a Added support for Trovo (#2720) 2025-10-23 06:17:28 -03:00
SirAzako 828c47109d Added support for Trovo 2025-10-23 06:10:20 -03:00
Matheus Felipe 94245b25df Add OpenGameArt support (#2719)
Co-authored-by: Horațiu Mlendea <Horatiu.Mlendea@ProtonMail.com>
2025-10-23 05:03:35 -03:00
Matheus Felipe 734542f0af Add mstdn.social (#2718) 2025-10-23 04:19:10 -03:00
Matheus Felipe 1f8166ba9f Remove unclaimed username entry for mstdn.social 2025-10-23 03:41:21 -03:00
MagicLike 6f1ddaa615 Added mstdn.social
Added another Mastodon instance: mstdn.social
2025-10-23 03:32:54 -03:00
Nolan Parker 7ee2891517 Fix Minor Capitalization Issue in README.md 2025-10-22 22:16:13 -05:00
Paul Pfeister b893e4aa20 Merge pull request #2711 from imhiteshgarg/add_observablehq
Adding ObservableHQ site
2025-10-21 23:04:24 -07:00
Hitesh Garg eff869906a Adding ObservableHQ site
Adding ObservableHQ site
2025-10-22 10:58:31 +05:30
Paul Pfeister 2a0107e189 Merge pull request #2702 from ABSCP4/patch-1
Update README.md
2025-10-20 15:33:36 -07:00
ABSCP4 5d8c4de212 Update README.md
fixed typo
2025-10-20 11:01:32 -07:00
Nolan Parker 1f9d7e8373 Reorder Terraria Forums to correct alphabetical position 2025-10-19 15:53:09 -05:00
Paul Pfeister 184470f871 Merge pull request #2699 from Nolanp123/fix-codesandbox-name
Fix site name formatting for CodeSandbox
2025-10-19 13:14:14 -07:00
Nolan Parker 342dbc85cc Fix site name formatting for CodeSandbox 2025-10-19 14:44:47 -05:00
Paul Pfeister 457e16e84f Merge pull request #2670 from simplyNour/Bug/fix-false-positive-for-topcoder
fix: false positive for Topcoder
2025-10-18 23:47:34 -07:00
Paul Pfeister 43b3736b75 Merge pull request #2697 from raman1236/add-odysee-support
Add Odysee support
2025-10-18 23:06:15 -07:00
Paul Pfeister 64a49ffe17 Merge pull request #2698 from KaiAllAlone/KaiAllAlone-warframe-market
Add Warframe Market support
2025-10-18 22:48:00 -07:00
rvasikarla 0afd2006c6 Add Odysee support
- Add Odysee platform to sherlock database- Uses canonical link detection for non-existent users- URL pattern: https://odysee.com/@\{username\}- Detects error via canonical redirect to main site
2025-10-18 16:47:27 -05:00
rvasikarla 3c270173a7 Add Odysee support
- Add Odysee platform to sherlock database- Uses canonical link detection for non-existent users- URL pattern: https://odysee.com/@\{username\}- Detects error via canonical redirect to main site
2025-10-18 16:44:10 -05:00
rvasikarla 8d73f9ef4c Add Odysee support
- Add Odysee platform to sherlock database- Uses canonical link detection for non-existent users- URL pattern: https://odysee.com/@\{username\}- Detects error via canonical redirect to main site
2025-10-18 16:37:31 -05:00
Debanuj Roy 472c086805 Update data.jsonfixed syntax error 2025-10-19 03:06:25 +05:30
Debanuj Roy 400c277f24 more robust 2025-10-19 03:00:43 +05:30
Debanuj Roy e759564550 Update data.jsonupdate matching logic 2025-10-19 02:55:33 +05:30
Debanuj Roy deebe7137c Added Warframe Market 2025-10-19 02:45:07 +05:30
nour cb14ccbaaf Make urls clickable when saved to excel 2025-10-18 15:21:36 +03:00
Paul Pfeister eb892795e9 Merge pull request #2683 from 403Code/patch-1
Add: Cfx.re Forum
2025-10-15 10:52:32 -07:00
Rizey (Nantaaaaaaaaaa) 09de90066b Update data.json 2025-10-15 13:39:44 +07:00
Rizey (Nantaaaaaaaaaa) cd1f27c12b Update data.json 2025-10-15 13:29:42 +07:00
Rizey (Nantaaaaaaaaaa) b837de8358 Add Cfx.re Forum 2025-10-15 13:22:09 +07:00
Paul Pfeister 7a70f35883 Merge pull request #2680 from bjornmorten/add/norwegian-forums
Add Norwegian forum sites (diskusjon.no & forum.kvinneguiden.no)
2025-10-14 11:25:31 -07:00
bjornmorten 4b17dae385 fix: regex max length for kvinneguiden 2025-10-14 19:48:02 +02:00
Paul Pfeister efefe3f54a Merge pull request #2682 from bjornmorten/add/cryptohack
Add: CryptoHack
2025-10-14 10:41:41 -07:00
Paul Pfeister 4b70a1fc25 Merge pull request #2681 from bjornmorten/add/hackmd
Add: HackMD
2025-10-14 10:41:31 -07:00
bjornmorten a7893f399e add: CryptoHack 2025-10-14 19:28:53 +02:00
bjornmorten 1cb6c12851 add: HackMD 2025-10-14 19:21:36 +02:00
bjornmorten c4f7485ecf fix: alphabetical ordering 2025-10-14 19:10:57 +02:00
bjornmorten 228f50413e add: diskusjon.no and forum.kvinneguiden.no 2025-10-14 19:08:35 +02:00
Paul Pfeister d1867b1b51 Merge pull request #2679 from aryanj10/fix-fasle-positive-for-lesswrong
Fix LessWrong detection Issue #2634
2025-10-14 09:58:56 -07:00
Aryan Jain 6d2497582e Fix LessWrong detection Issue #2634 2025-10-14 11:04:15 -04:00
Paul Pfeister 885c43b8af Merge pull request #2677 from spmedia/patch-9
Add: BreachSta.rs Forum
2025-10-13 16:12:36 -07:00
Edmond Major III 8ad47b0b23 Update data.json 2025-10-13 17:23:10 -05:00
Edmond Major III e93af99424 Update data.json
remix based off title instead of text in body
2025-10-13 17:20:50 -05:00
Edmond Major III 5862ab4f92 Update data.json
Add in BreachSta.rs forum - a popular cybercrime forum

https://breachsta.rs/profile/Sleepybubble - returns valid profile

https://breachsta.rs/profile/asdfasdfasdf - returns "Not found
This page doesn't exist"
2025-10-13 17:15:26 -05:00
Paul Pfeister 4110cac45c Merge pull request #2661 from KaiAllAlone/terraria-forums
Site Added:Terraria forums
2025-10-13 15:07:31 -07:00
Paul Pfeister d66b18e8ae Merge pull request #2676 from spmedia/patch-8
Add: Patched.sh
2025-10-13 14:53:19 -07:00
Edmond Major III b532fc6a38 Add: Patched.sh
Add Patched, a popular cybercrime forum.

https://patched.sh/User/blue = valid user

https://patched.sh/User/khjasjkdhfa38a = not a valid user and displays "The member you specified is either invalid or doesn't exist."
2025-10-13 13:20:03 -05:00
Paul Pfeister 99cf073835 Merge pull request #2674 from spmedia/patch-6
Add: Cracked.sh
2025-10-13 10:41:46 -07:00
Edmond Major III ec7e1b8b81 Update data.json
Trailing / was the issue so removed it
2025-10-13 12:30:50 -05:00
Edmond Major III a4aab38901 Update data.json
Remove www
2025-10-13 12:24:02 -05:00
Edmond Major III 5202900618 Update data.json
Updated error msg on no user
2025-10-13 12:16:09 -05:00
Edmond Major III 26444a98ad Update data.json
Add Cracked.sh - a popular skid hacker website

Examples of profiles:

Claimed: https://cracked.sh/Blue - gives status code of 200

Unclaimed: https://cracked.sh/noonewouldeverusethis7 - gives status code of 404
2025-10-13 12:12:43 -05:00
Paul Pfeister bced3242f3 Merge pull request #2668 from simplyNour/Bug/fix-false-positive-for-hackerearth
fix:  false positive for hackerearth
2025-10-13 10:03:00 -07:00
Paul Pfeister 08aabdad76 Merge pull request #2673 from simplyNour/Deprecate/pepper-site-is-no-longer-operating
Deprecate: Pepper.it closed its doors on August2025
2025-10-13 10:00:45 -07:00
Paul Pfeister 170ee0b928 Merge branch 'master' into Deprecate/pepper-site-is-no-longer-operating 2025-10-13 09:58:47 -07:00
Paul Pfeister 2c9a54438a Merge pull request #2672 from simplyNour/Feature/add-pepper-global-sites
Feat: Add pepper stores worldwide websites
2025-10-13 09:57:36 -07:00
nour 84f4886809 Feat: Add pepper stores worldwide websites 2025-10-13 17:46:38 +03:00
nour e26fd6b643 Fix: false positive for topcoder due to invalid regex 2025-10-13 16:27:02 +03:00
Paul Pfeister ce5de20f80 Merge pull request #2659 from faizan842/re-enable-opencollective-powershell-realmeye
Re-enable OpenCollective and Realmeye
2025-10-12 19:01:46 -07:00
Paul Pfeister 3ff2d135b5 Merge branch 'master' into re-enable-opencollective-powershell-realmeye 2025-10-12 18:58:04 -07:00
Paul Pfeister 1e65b4a209 Merge pull request #2657 from KaiAllAlone/patch-1
Add Pokemon Forums
2025-10-12 18:55:13 -07:00
Debanuj Roy db3545b7b0 Added more robust message 2025-10-12 16:31:27 +05:30
Debanuj Roy 1898a0c4a9 Add Terraria Forums 2025-10-12 16:27:30 +05:30
Faizan Habib 0d32357b10 Re-enable OpenCollective and Realmeye
- Updated OpenCollective to use status_code detection (previously used message detection)
- Added Realmeye with message detection

Both sites were previously removed due to false positives but have been verified to work correctly now:
- OpenCollective: Returns 200 for existing profiles, 404 for non-existent
- Realmeye: Shows 'Sorry, but we either:' error message for non-existent players

Tested with known usernames:
- OpenCollective: sindresorhus
- Realmeye: rotmg

Note: PowerShell Gallery was initially included but removed after discovering their /profiles/ endpoint no longer works.
2025-10-12 13:57:22 +05:30
Debanuj Roy 1be2abb056 Resolved wrong urlMain 2025-10-12 13:39:55 +05:30
Debanuj Roy fb392534ef Add Pokemon Forums 2025-10-12 08:03:23 +05:30
Paul Pfeister bd49aac9d1 Merge pull request #2606 from Fandroid745/fix/babyru-false-positive
fix: Add error messages to BabyRu to prevent false positives
2025-10-11 18:10:54 -04:00
Matheus Felipe 94838863fd Cleanup site-list.py (#2307) 2025-10-11 15:30:08 -03:00
Matheus Felipe 79973a58ea Update file handling to include encoding and correct comments 2025-10-11 15:21:36 -03:00
Fandroid745 b9a72b55ca fix: use Unicode escape sequences for BabyRu error messages 2025-10-11 23:14:43 +05:30
Paul Pfeister ef55f7ddd3 chore: reformat json 2025-10-11 13:34:45 -04:00
Paul Pfeister 28b78e7ddd Merge pull request #2633 from VivekGaddam/add-tiktok-support
Add TikTok (tiktok.com) to supported sites
2025-10-11 13:33:39 -04:00
Paul Pfeister d2072e2cac chore: rem tiktok for improved rev 2025-10-11 13:32:51 -04:00
Paul Pfeister 3edb73cb23 Merge pull request #2650 from Nirzak/patch-1
Added classifiers for supported python version
2025-10-11 13:30:20 -04:00
Paul Pfeister 6d1280ee9d Merge pull request #2651 from aryanj10/add-tiktok-pinterest
Added support for TikTok & Pinterest
2025-10-11 13:12:13 -04:00
Dhanush Sugganahalli 0c457e590a Merge branch 'master' into fix/babyru-false-positive 2025-10-11 21:24:18 +05:30
Aryan Jain dc307fc0fd feat: add TikTok and Pinterest site detection support 2025-10-11 10:34:48 -04:00
Nirjas Jakilim d6256e9fc6 classifiers for supported python version 2025-10-11 20:27:27 +06:00
Aryan Jain 1645828527 Add TikTok site support 2025-10-11 09:25:00 -04:00
Matheus Felipe e774b08dc5 Add imood.com support (#2647) 2025-10-11 09:28:06 -03:00
Matheus Felipe 99067b2e59 Add imood.com support
resolve #2646
2025-10-11 09:23:52 -03:00
nour f039b50c4e Deprecate: Pepper closed its doors on August 14th 2025. 2025-10-11 08:29:32 +03:00
nour 7d5bd97142 fix: false positive for hackerearth 2025-10-11 07:17:01 +03:00
vivekgaddam 70b5055631 corrected india F+ prevent 2025-10-11 08:54:40 +05:30
Paul Pfeister 1be25e70df Merge pull request #2621 from MaxwellOldshein/feat/validate-remote-manifest-with-local-schema-before-validate-target-test-suite
feat: GitHub Actions - Validate Remote Manifest Against Local Schema Before Running Validate Modified Targets Test Suite
2025-10-10 20:41:58 -04:00
Paul Pfeister 9000575f7c Merge pull request #2631 from simplyNour/Add-Vjudge-Support-to-Sherlock
Add Vjudge to the sites source
2025-10-10 20:38:16 -04:00
Paul Pfeister 220ebf935c Merge pull request #2640 from sctech-tr/patch-1
add status cafe (status.cafe)
2025-10-10 20:22:44 -04:00
sctech 959c4a2b26 change method for status.cafe 2025-10-10 20:38:08 +03:00
sctech 443d43df21 add status cafe 2025-10-10 20:09:45 +03:00
Paul Pfeister 80080cd57c Merge pull request #2638 from simplyNour/Bug/fix-false-positive-for-kaskus 2025-10-10 12:51:15 -04:00
nour 80922a93fa fix: false positive for kaskus 2025-10-10 18:53:28 +03:00
nour 45494fc74b bug: fix local variable scoping issue in test validate targets 2025-10-10 06:29:55 +03:00
nour d92e2339a1 feat: add vjudge 2025-10-10 05:28:28 +03:00
vivekgaddam 659bf92d99 corrected the errorMsg 2025-10-09 19:50:43 +05:30
vivekgaddam 3e4d9bcd85 Add TikTok support to Sherlock 2025-10-09 17:57:15 +05:30
Matheus Felipe d3076cdfe0 Add Ifunny (#2632) 2025-10-09 09:16:41 -03:00
Derick Kunz 51436cefe8 Add Ifunny 2025-10-09 08:51:13 -03:00
Paul Pfeister 08a8177286 Merge pull request #2610 from eslteacher902010/add-musescore-clean 2025-10-09 06:19:35 -04:00
Paul Pfeister e6d5fd64e0 Merge pull request #2622 from akh7177/Add-support-for-Discord.bio
Add support for Discord.bio
2025-10-08 13:03:57 -04:00
Abhyuday K Hegde ac9f3a7fd5 Add support for Discord.bio 2025-10-08 11:21:53 +05:30
Paul Pfeister 289ab28b98 Merge pull request #2576 from obiwan04kanobi/add-aws-skills-profile-site
Add AWS Skills Profile site to Sherlock
2025-10-07 19:46:54 -04:00
Maxwell Oldshein 46ad6c9a5e Fix whitespace. 2025-10-07 14:53:47 -04:00
Maxwell Oldshein d20dcbe8db Retain original whitespace 2025-10-07 14:52:53 -04:00
Maxwell Oldshein 70c3c84196 Update validation logic placement in workflow 2025-10-07 14:50:54 -04:00
Dhanush Sugganahalli 53840c6a98 Merge branch 'master' into fix/babyru-false-positive 2025-10-07 14:41:12 +05:30
Fandroid745 068fff8711 fix:Remove regexCheck field and changed encoding to UTF-8 2025-10-07 14:33:32 +05:30
Maxwell Oldshein 5735d01804 Validate remote manifest against local schema 2025-10-06 23:52:14 -04:00
Paul Pfeister f60de0d8f8 Merge pull request #2616 from akh7177/Add-new-sites-to-data.json 2025-10-06 13:39:04 -04:00
Paul Pfeister cb3ab91492 Merge pull request #2485 from manjushsh/code-sandbox 2025-10-06 13:30:10 -04:00
paul_kniaz 4eea79ed6a MuseScore: use GET for status_code via request_method to avoid 403 on HEAD 2025-10-06 13:07:45 -04:00
Abhyuday K Hegde 03c051a525 Add new sites to Sherlock 2025-10-06 18:47:38 +05:30
Aniket eccdf80b95 Add Pronouns.page (#2419)
* Add support for Pronouns.page (#2418)

* Update the url
2025-10-06 09:52:56 -03:00
Manjush Shetty eb51bf9b1a misc: remove isnsfw from hive 2025-10-06 17:15:44 +05:30
Manjush Shetty 5d7b438fd6 add urlProbe 2025-10-06 17:11:50 +05:30
Manjush Shetty ef0b97fb57 chore: try with api instead 2025-10-06 16:54:07 +05:30
Manjush Shetty c6c3522159 chore: add custom regex for codesandbox usernames 2025-10-06 16:45:53 +05:30
Manjush Shetty 2908c8eaa8 chore: try with different message 2025-10-06 16:40:59 +05:30
Manjush S f05b8e0ed6 Merge branch 'sherlock-project:master' into code-sandbox 2025-10-06 16:21:40 +05:30
Fandroid745 01bca6b39f fix: corrected the regexCheck field value to an empty string 2025-10-06 08:57:11 +05:30
Paul Pfeister d2835e56a4 Merge pull request #2568 from shreyasNaik0101/fix/remediate-blitztactics
fix(sites): Remediate false positive for Blitz Tactics
2025-10-05 14:17:43 -04:00
shreyasNaik0101 0cf110e69e Merge branch 'master' into fix/remediate-blitztactics 2025-10-05 22:56:59 +05:30
Paul Pfeister a88adb0488 Merge pull request #2559 from frogtheastronaut/master
Removed duplicate Bluesky entry in data.json
2025-10-05 13:23:53 -04:00
Fandroid745 4010a58dde fix: changed the username_claimed to example placeholder 2025-10-05 22:23:17 +05:30
Paul Pfeister b9e28b9b23 Merge pull request #2588 from shreyasNaik0101/fix/correct-ci-diff
fix(ci): Use merge-base for correct target validation
2025-10-05 12:49:58 -04:00
Paul Pfeister d0e005da23 Merge pull request #2609 from akh7177/Add-support-for-WakaTime
Add support for WakaTime
2025-10-05 12:30:24 -04:00
paul_kniaz 7a4f19e6b3 Fix MuseScore URL endpoint 2025-10-05 12:27:30 -04:00
paul_kniaz f958e7b96f update MuseScore username_claimed to arrangeme (valid profile) 2025-10-05 12:13:37 -04:00
paul_kniaz 4c99bf3b75 Add MuseScore site (clean version) 2025-10-05 10:44:55 -04:00
Fandroid745 e3066a1d7a fix:added the username_claimed field 2025-10-05 18:59:04 +05:30
Abhyuday K Hegde f0510a169a Add support for WakaTime 2025-10-05 15:52:56 +05:30
manjushsh 738df6c362 chore: add error message to the codesandbox 2025-10-05 15:22:37 +05:30
Paul Pfeister 83a38db110 Merge pull request #2582 from dollaransh17/fix/boardgamegeek-false-positive
fix(sites): Update BoardGameGeek URL structure and detection method
2025-10-05 02:39:29 -04:00
dollaransh17 9e3448d992 fix(sites): So , Implemented BoardGameGeek using username validation API
- Added BoardGameGeek back using the new API endpoint suggested by @ppfeister
- Uses https://api.geekdo.com/api/accounts/validate/username?username={} for detection
- errorMsg checks for '"isValid":true' to detect valid usernames
- This approach avoids the previous issues with:
  * HTML parsing returning false positives
  * User API returning JSON with '[]' substrings that caused detection problems
- Successfully tested with both valid (blue) and invalid usernames

Thanks @ppfeister for the API suggestion and @akh7177 for the initial guidance
2025-10-05 11:59:41 +05:30
shreyasNaik0101 70e3c0ddd8 fix(ci): Address review feedback for correctness and efficiency 2025-10-05 11:00:14 +05:30
Fandroid745 017c08a45d fix: Add error messages to BabyRu to prevent false positives 2025-10-05 10:53:59 +05:30
Paul Pfeister f32f4ffaee Merge pull request #2595 from obiwan04kanobi/feature/issue-2196-ci-docker-build-test
Add Docker build test to CI workflow (#2196)
2025-10-04 21:09:04 -04:00
Paul Pfeister 7379ba7b19 Merge branch 'remove-tor' 2025-10-04 20:52:40 -04:00
Paul Pfeister 3aeb6d6356 Merge pull request #2602 from sherlock-project/feat/no-txt
chore: make default --no-txt
2025-10-04 20:36:33 -04:00
Paul Pfeister 4246a7b16f chore: make default --no-txt
Workflows where a txt file is still required should use --txt
2025-10-04 20:32:16 -04:00
Paul Pfeister e44fe49c8f Merge pull request #2601 from sherlock-project/feat/graceful-skip
feat: gracefully skip sites with invalid errorType
2025-10-04 20:23:07 -04:00
Paul Pfeister 52cd5fdfc1 feat: gracefully skip sites with invalid errorType 2025-10-04 20:22:34 -04:00
Paul Pfeister 947f1ad2b6 Merge pull request #2574 from dollaransh17/fix/http-request-timeouts
Security Fix: Add timeout parameters to HTTP requests
2025-10-04 18:42:13 -04:00
shreyasNaik0101 4d00884d8c fix(ci): Implement secure diff logic per feedback 2025-10-05 03:00:21 +05:30
Paul Pfeister cfcc82aaca Merge pull request #2597 from sherlock-project/feat/multiple-types
Support multiple errorType checks
2025-10-04 17:21:26 -04:00
Paul Pfeister 0794e02b52 feat: support multiple errorTypes 2025-10-04 16:53:30 -04:00
Paul Pfeister 975965abed Merge pull request #2589 from dollaransh17/fix/threads-false-positive
fix(sites): Fix Threads false positive detection
2025-10-04 15:44:04 -04:00
Paul Pfeister a678bed154 Merge pull request #2587 from akh7177/remediate-cyberdefenders-fp
fix(sites):  Remediate False Positives for CyberDefenders
2025-10-04 15:43:48 -04:00
Paul Pfeister 4ec6f1eec0 Merge pull request #2585 from akh7177/remediate-slideshare-fp
fix(sites):  Remediate False Positive for SlideShare
2025-10-04 15:43:36 -04:00
Paul Pfeister d1527376e7 Merge pull request #2584 from akh7177/remediate-roblox-fp
fix(sites):  Remediate False Positive for Roblox
2025-10-04 15:43:29 -04:00
obiwan04kanobi b99719ce60 Add Docker build test to CI workflow
- Adds docker-build-test job to regression.yml
- Runs on push/merge to master and release branches
- Extracts VERSION_TAG from pyproject.toml for build
- Tests that Docker image builds and runs successfully
- Resolves dockerfile syntax warnings
- Resolves #2196"
2025-10-05 00:22:12 +05:30
dollaransh17 dc869852bc fix(sites): Fix Threads false positive detection
Threads was showing false positives for non-existent users because
the error message detection was incorrect.

Updated errorMsg:
- Old: "<title>Threads</title>" (generic, matches valid pages too)
- New: "<title>Threads • Log in</title>" (specific to non-existent users)

When a user doesn't exist, Threads redirects to a login page with the
title "Threads • Log in". Valid user profiles have titles like
"Username (@username) • Threads, Say more".

Tested with:
- Invalid user (impossibleuser12345): Correctly not found
- Valid user (zuck): Correctly found

This fixes the false positive issue where non-existent Threads profiles
were being reported as found.
2025-10-04 17:22:50 +05:30
shreyasNaik0101 3079e7a218 fix(ci): Use merge-base for correct target validation 2025-10-04 15:25:30 +05:30
Abhyuday K Hegde 5cd769c2f4 Remediate False Positives for CyberDefenders 2025-10-04 15:12:20 +05:30
Abhyuday K Hegde 977ad5c1a4 Remediate False Positive for SlideShare 2025-10-04 14:48:37 +05:30
Abhyuday K Hegde 57a0ccef38 Remediate False Positive for Roblox 2025-10-04 14:30:40 +05:30
dollaransh17 94c013886a fix(sites): Remove BoardGameGeek due to incompatible detection
BoardGameGeek cannot be reliably detected with Sherlock's current capabilities:

- Original HTML detection: Returns false positives
- API endpoint approach: The API returns status 200 for both valid and invalid users
  - Invalid user: Returns exactly '[]'
  - Valid user: Returns JSON containing '[]' substrings (e.g., "adminBadges":[])

Since Sherlock's 'message' errorType uses substring matching, it incorrectly
identifies valid users as "not found" when checking for '[]' in the response.

The site's API response format is fundamentally incompatible with Sherlock's
detection methods (message/status_code/response_url), so removal is the only
viable solution to prevent false positives and false negatives.

Addresses false positive issue originally reported in testing.
2025-10-04 11:33:27 +05:30
dollaransh17 c5e209d78e fix(sites): Implement BoardGameGeek API detection as suggested
Using the API endpoint suggested by akh7177:
https://api.geekdo.com/api/users?username={}

However, there's an edge case where valid users contain empty arrays
in their JSON response (adminBadges[], userMicrobadges[], supportYears[])
which causes Sherlock's substring matching to incorrectly flag them
as 'not found' when looking for the '[]' error pattern.

The API correctly returns:
- Valid user: JSON object with user data (but contains [] substrings)
- Invalid user: Exactly '[]' (2 characters total)

This needs further refinement to distinguish between the exact '[]'
response vs JSON containing '[]' substrings.
2025-10-04 11:23:55 +05:30
dollaransh17 3e653c46b0 fix(sites): Remove BoardGameGeek - unreliable detection
BoardGameGeek returns identical pages for both existing and non-existing
users, making reliable username detection impossible with HTTP-based
methods. The site likely uses JavaScript to load user-specific content
dynamically.
2025-10-04 03:12:47 +05:30
dollaransh17 91f3b16993 fix(sites): Update BoardGameGeek URL structure and detection method
BoardGameGeek changed from /user/{} to /profile/{} URL structure.
Also updated from message to status_code detection as the site
no longer returns clear error messages for non-existent users.
2025-10-04 02:55:57 +05:30
obiwan04kanobi 0f3df0f4da **PR description:**
This PR adds AWS Skills Profile to Sherlock’s supported sites in data.json. The configuration uses a unique substring (`shareProfileAccepted":false`) for reliable detection of non-existent usernames, addressing the challenge of JavaScript-rendered error messages.
- Site details and detection logic follow Sherlock’s contributing guidelines and Code of Conduct.
- No changes to core logic; only a new site entry.
- Reviewed for schema compliance and duplicate key cleanup as noted.
2025-10-03 13:46:53 +05:30
dollaransh17 0e7219b191 Security Fix: Add timeout parameters to HTTP requests
This fix addresses a critical security vulnerability where HTTP requests
could hang indefinitely, potentially causing denial of service.

Changes:
- Added 10-second timeout to version check API call
- Added 10-second timeout to GitHub pull request API call
- Added 30-second timeout to data file downloads (larger timeout for data)
- Added 10-second timeout to exclusions list download

Impact:
- Prevents infinite hangs that could freeze the application
- Improves user experience with predictable response times
- Fixes security issue flagged by Bandit static analysis (B113)
- Makes the application more robust in poor network conditions

The timeouts are conservative enough to work with slow connections
while preventing indefinite blocking that could be exploited.
2025-10-03 13:41:43 +05:30
Paul Pfeister 1d2c4b134f Merge pull request #2570 from shreyasNaik0101/fix/remediate-applediscussions
fix(sites): Remediate false positive for Apple Discussions
2025-10-02 20:30:57 -04:00
shreyasNaik0101 b245c462c9 fix(sites): Remediate false positive for Apple Discussions 2025-10-03 05:56:52 +05:30
shreyasNaik0101 876e58b159 fix(sites): Remediate false positive for Blitz Tactics 2025-10-03 05:45:43 +05:30
Paul Pfeister 66d9733da7 Merge pull request #2565 from shreyasNaik0101/fix/remediate-mydramalist
fix(sites): Remediate false positive for Mydramalist
2025-10-02 19:40:47 -04:00
Paul Pfeister c55deab3a2 Merge pull request #2561 from shreyasNaik0101/fix/remediate-deviantart
fix(sites): Remediate false positive for DeviantArt
2025-10-02 19:37:00 -04:00
Paul Pfeister edcb697793 Merge pull request #2564 from shreyasNaik0101/fix/remediate-allmylinks
fix(sites): Remediate false positive for AllMyLinks
2025-10-02 19:36:43 -04:00
shreyasNaik0101 d314d75db1 fix(sites): Remediate false positive for Mydramalist 2025-10-03 04:43:05 +05:30
shreyasNaik0101 c89a52caf7 fix(sites): Remediate false positive for AllMyLinks 2025-10-03 04:25:46 +05:30
Paul Pfeister 9c18cfe273 Merge pull request #2563 from sherlock-project/chore/update-co
chore: update code owners
2025-10-02 18:25:59 -04:00
shreyasNaik0101 779d4c33f4 fix: Remove username_unclaimed as requested 2025-10-03 03:55:03 +05:30
Paul Pfeister 072c24687b Merge pull request #2558 from hanjm-github/master
feat: Add some popular website in Korea
2025-10-02 18:22:42 -04:00
shreyasNaik0101 355bfbd328 fix(sites): Remediate false positive for DeviantArt 2025-10-03 00:42:07 +05:30
JongMyeong HAN 7b3632bdad Add comment to site 'namuwiki'
Co-authored-by: Paul Pfeister <code@pfeister.dev>
2025-10-03 04:00:41 +09:00
Ethan Zhang 4fe41f09ff Removed duplicate Bluesky entry in data.json 2025-10-02 12:42:47 +10:00
JongMyeong HAN cd7c52e4fa Feat: Add tistory 2025-10-01 00:44:55 +09:00
JongMyeong HAN 86140af50e feat: Add SOOP 2025-10-01 00:44:02 +09:00
JongMyeong HAN e5cd5e5bfe feat: Add namuwiki 2025-10-01 00:43:21 +09:00
JongMyeong HAN dc89f1cd27 feat: Add dcinside 2025-10-01 00:41:23 +09:00
manjushsh 4706323976 data: add hive blog 2025-06-27 20:05:01 +05:30
manjushsh 4721c7f553 data: Add code sandboxio 2025-06-27 19:42:23 +05:30
Pallavi Kathait 193de54b6d Update site-list.py
These changes improve readability and maintain the functionality of the original code.
2024-09-29 21:31:19 +05:30
Paul Pfeister 2016892e64 Remove torrequest dep
Not sure why it's not in my patch file, but I was removing via sed in my spec instead.
2024-06-28 23:39:38 -04:00
Paul Pfeister 44ad8f506a Lint 2024-06-28 23:38:44 -04:00
Siddharth Dushantha cfa4097df9 removed support for tor 2024-06-26 21:57:11 +02:00
15 changed files with 909 additions and 379 deletions
+1 -1
View File
@@ -65,7 +65,7 @@ The Actor provides three types of outputs:
| Field | Type | Required | Description | | Field | Type | Required | Description |
|-------|------|----------|-------------| |-------|------|----------|-------------|
| `username` | string | Yes | Username the search was conducted for | | `username` | string | Yes | Username the search was conducted for |
| `links` | arrray | Yes | Array with found links to the social media | | `links` | array | Yes | Array with found links to the social media |
| `links[]`| string | No | URL to the account | `links[]`| string | No | URL to the account
### Example Dataset Item (JSON) ### Example Dataset Item (JSON)
+31 -6
View File
@@ -11,6 +11,7 @@ on:
- '**/*.py' - '**/*.py'
- '**/*.ini' - '**/*.ini'
- '**/*.toml' - '**/*.toml'
- 'Dockerfile'
push: push:
branches: branches:
- master - master
@@ -21,15 +22,17 @@ on:
- '**/*.py' - '**/*.py'
- '**/*.ini' - '**/*.ini'
- '**/*.toml' - '**/*.toml'
- 'Dockerfile'
jobs: jobs:
tox-lint: tox-lint:
# Linting is ran through tox to ensure that the same linter is used by local runners
runs-on: ubuntu-latest runs-on: ubuntu-latest
# Linting is run through tox to ensure that the same linter
# is used by local runners
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- name: Set up linting environment - name: Set up linting environment
uses: actions/setup-python@v5 uses: actions/setup-python@v6
with: with:
python-version: '3.x' python-version: '3.x'
- name: Install tox and related dependencies - name: Install tox and related dependencies
@@ -41,7 +44,8 @@ jobs:
tox-matrix: tox-matrix:
runs-on: ${{ matrix.os }} runs-on: ${{ matrix.os }}
strategy: strategy:
fail-fast: false # We want to know what specicic versions it fails on # We want to know what specific versions it fails on
fail-fast: false
matrix: matrix:
os: [ os: [
ubuntu-latest, ubuntu-latest,
@@ -53,11 +57,13 @@ jobs:
'3.11', '3.11',
'3.12', '3.12',
'3.13', '3.13',
'3.14',
'3.14t',
] ]
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- name: Set up environment ${{ matrix.python-version }} - name: Set up environment ${{ matrix.python-version }}
uses: actions/setup-python@v5 uses: actions/setup-python@v6
with: with:
python-version: ${{ matrix.python-version }} python-version: ${{ matrix.python-version }}
- name: Install tox and related dependencies - name: Install tox and related dependencies
@@ -67,3 +73,22 @@ jobs:
pip install tox-gh-actions pip install tox-gh-actions
- name: Run tox - name: Run tox
run: tox run: tox
docker-build-test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v6
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Get version from pyproject.toml
id: get-version
run: |
VERSION=$(grep -m1 'version = ' pyproject.toml | cut -d'"' -f2)
echo "version=$VERSION" >> $GITHUB_OUTPUT
- name: Build Docker image
run: |
docker build \
--build-arg VERSION_TAG=${{ steps.get-version.outputs.version }} \
-t sherlock-test:latest .
- name: Test Docker image runs
run: docker run --rm sherlock-test:latest --version
+40 -13
View File
@@ -17,29 +17,41 @@ jobs:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v5 uses: actions/checkout@v5
with: with:
# Checkout the base branch but fetch all history to avoid a second fetch call
ref: ${{ github.base_ref }} ref: ${{ github.base_ref }}
fetch-depth: 1 fetch-depth: 0
persist-credentials: false
- name: Set up Python - name: Set up Python
uses: actions/setup-python@v6 uses: actions/setup-python@v6
with: with:
python-version: '3.13' python-version: "3.13"
- name: Install Poetry - name: Install Poetry
uses: abatilo/actions-poetry@v4 uses: abatilo/actions-poetry@v4
with: with:
poetry-version: 'latest' poetry-version: "latest"
- name: Install dependencies - name: Install dependencies
run: | run: |
poetry install --no-interaction --with dev poetry install --no-interaction --with dev
- name: Drop in place updated manifest from base - name: Prepare JSON versions for comparison
run: | run: |
cp sherlock_project/resources/data.json data.json.base # Fetch only the PR's branch head (single network call in this step)
git fetch origin pull/${{ github.event.pull_request.number }}/head:pr --depth=1 git fetch origin pull/${{ github.event.pull_request.number }}/head:pr
git show pr:sherlock_project/resources/data.json > sherlock_project/resources/data.json
cp sherlock_project/resources/data.json data.json.head # Find the merge-base commit between the target branch and the PR branch
MERGE_BASE=$(git merge-base origin/${{ github.base_ref }} pr)
echo "Comparing PR head against merge-base commit: $MERGE_BASE"
# Safely extract the file from the PR's head and the merge-base commit
git show pr:sherlock_project/resources/data.json > data.json.head
git show $MERGE_BASE:sherlock_project/resources/data.json > data.json.base
# CRITICAL FIX: Overwrite the checked-out data.json with the one from the PR
# This ensures that pytest runs against the new, updated file.
cp data.json.head sherlock_project/resources/data.json
- name: Discover modified targets - name: Discover modified targets
id: discover-modified id: discover-modified
@@ -47,8 +59,16 @@ jobs:
CHANGED=$( CHANGED=$(
python - <<'EOF' python - <<'EOF'
import json import json
with open("data.json.base") as f: base = json.load(f) import sys
with open("data.json.head") as f: head = json.load(f) try:
with open("data.json.base") as f: base = json.load(f)
with open("data.json.head") as f: head = json.load(f)
except FileNotFoundError as e:
print(f"Error: Could not find {e.filename}", file=sys.stderr)
sys.exit(1)
except json.JSONDecodeError as e:
print(f"Error: Could not decode JSON from a file - {e}", file=sys.stderr)
sys.exit(1)
changed = [] changed = []
for k, v in head.items(): for k, v in head.items():
@@ -63,12 +83,19 @@ jobs:
echo -e ">>> Changed targets: \n$(echo $CHANGED | tr ',' '\n')" echo -e ">>> Changed targets: \n$(echo $CHANGED | tr ',' '\n')"
echo "changed_targets=$CHANGED" >> "$GITHUB_OUTPUT" echo "changed_targets=$CHANGED" >> "$GITHUB_OUTPUT"
- name: Validate modified targets - name: Validate remote manifest against local schema
if: steps.discover-modified.outputs.changed_targets != '' if: steps.discover-modified.outputs.changed_targets != ''
continue-on-error: true run: |
poetry run pytest tests/test_manifest.py::test_validate_manifest_against_local_schema
# --- The rest of the steps below are unchanged ---
- name: Validate modified targets
env:
CHANGED_TARGETS: ${{ steps.discover-modified.outputs.changed_targets }}
run: | run: |
poetry run pytest -q --tb no -rA -m validate_targets -n 20 \ poetry run pytest -q --tb no -rA -m validate_targets -n 20 \
--chunked-sites "${{ steps.discover-modified.outputs.changed_targets }}" \ --chunked-sites "$CHANGED_TARGETS" \
--junitxml=validation_results.xml --junitxml=validation_results.xml
- name: Prepare validation summary - name: Prepare validation summary
+1 -1
View File
@@ -4,7 +4,7 @@
# 3. Build image with BOTH latest and version tags # 3. Build image with BOTH latest and version tags
# i.e. `docker build -t sherlock/sherlock:0.16.0 -t sherlock/sherlock:latest .` # i.e. `docker build -t sherlock/sherlock:0.16.0 -t sherlock/sherlock:latest .`
FROM python:3.12-slim-bullseye as build FROM python:3.12-slim-bullseye AS build
WORKDIR /sherlock WORKDIR /sherlock
RUN pip3 install --no-cache-dir --upgrade pip RUN pip3 install --no-cache-dir --upgrade pip
+17 -11
View File
@@ -1,39 +1,45 @@
#!/usr/bin/env python #!/usr/bin/env python
# This module generates the listing of supported sites which can be found in # This module generates the listing of supported sites which can be found in
# sites.md. It also organizes all the sites in alphanumeric order # sites.mdx. It also organizes all the sites in alphanumeric order
import json import json
import os import os
DATA_REL_URI: str = "sherlock_project/resources/data.json" DATA_REL_URI: str = "sherlock_project/resources/data.json"
DEFAULT_ENCODING = "utf-8"
# Read the data.json file # Read the data.json file
with open(DATA_REL_URI, "r", encoding="utf-8") as data_file: with open(DATA_REL_URI, "r", encoding=DEFAULT_ENCODING) as data_file:
data: dict = json.load(data_file) data: dict = json.load(data_file)
# Removes schema-specific keywords for proper processing # Removes schema-specific keywords for proper processing
social_networks: dict = dict(data) social_networks = data.copy()
social_networks.pop('$schema', None) social_networks.pop('$schema', None)
# Sort the social networks in alphanumeric order # Sort the social networks in alphanumeric order
social_networks: list = sorted(social_networks.items()) social_networks = sorted(social_networks.items())
# Make output dir where the site list will be written # Make output dir where the site list will be written
os.mkdir("output") os.mkdir("output")
# Write the list of supported sites to sites.md # Write the list of supported sites to sites.mdx
with open("output/sites.mdx", "w") as site_file: with open("output/sites.mdx", "w", encoding=DEFAULT_ENCODING) as site_file:
site_file.write("---\ntitle: 'List of supported sites'\nsidebarTitle: 'Supported sites'\nicon: 'globe'\ndescription: 'Sherlock currently supports **400+** sites'\n---\n\n") site_file.write("---\n")
site_file.write("title: 'List of supported sites'\n")
site_file.write("sidebarTitle: 'Supported sites'\n")
site_file.write("icon: 'globe'\n")
site_file.write("description: 'Sherlock currently supports **400+** sites'\n")
site_file.write("---\n\n")
for social_network, info in social_networks: for social_network, info in social_networks:
url_main = info["urlMain"] url_main = info["urlMain"]
is_nsfw = "**(NSFW)**" if info.get("isNSFW") else "" is_nsfw = "**(NSFW)**" if info.get("isNSFW") else ""
site_file.write(f"1. [{social_network}]({url_main}) {is_nsfw}\n") site_file.write(f"1. [{social_network}]({url_main}) {is_nsfw}\n")
# Overwrite the data.json file with sorted data # Overwrite the data.json file with sorted data
with open(DATA_REL_URI, "w") as data_file: with open(DATA_REL_URI, "w", encoding=DEFAULT_ENCODING) as data_file:
sorted_data = json.dumps(data, indent=2, sort_keys=True) sorted_data = json.dumps(data, indent=2, sort_keys=True)
data_file.write(sorted_data) data_file.write(sorted_data)
data_file.write("\n") data_file.write("\n") # Keep the newline after writing data
print("Finished updating supported site listing!") print("Finished updating supported site listing!")
+19 -47
View File
@@ -23,17 +23,17 @@
> [!WARNING] > [!WARNING]
> Packages for ParrotOS and Ubuntu 24.04, maintained by a third party, appear to be __broken__. > Packages for ParrotOS and Ubuntu 24.04, maintained by a third party, appear to be __broken__.
> Users of these systems should defer to pipx/pip or Docker. > Users of these systems should defer to [`uv`](https://docs.astral.sh/uv/)/`pipx`/`pip` or Docker.
| Method | Notes | | Method | Notes |
| - | - | | - | - |
| `pipx install sherlock-project` | `pip` may be used in place of `pipx` | | `pipx install sherlock-project` | `pip` or [`uv`](https://docs.astral.sh/uv/) may be used in place of `pipx` |
| `docker run -it --rm sherlock/sherlock` | | `docker run -it --rm sherlock/sherlock` |
| `dnf install sherlock-project` | | | `dnf install sherlock-project` | |
Community-maintained packages are available for Debian (>= 13), Ubuntu (>= 22.10), Homebrew, Kali, and BlackArch. These packages are not directly supported or maintained by the Sherlock Project. Community-maintained packages are available for Debian (>= 13), Ubuntu (>= 22.10), Homebrew, Kali, and BlackArch. These packages are not directly supported or maintained by the Sherlock Project.
See all alternative installation methods [here](https://sherlockproject.xyz/installation) See all alternative installation methods [here](https://sherlockproject.xyz/installation).
## General usage ## General usage
@@ -51,70 +51,42 @@ Accounts found will be stored in an individual text file with the corresponding
```console ```console
$ sherlock --help $ sherlock --help
usage: sherlock [-h] [--version] [--verbose] [--folderoutput FOLDEROUTPUT] usage: sherlock [-h] [--version] [--verbose] [--folderoutput FOLDEROUTPUT] [--output OUTPUT] [--csv] [--xlsx] [--site SITE_NAME] [--proxy PROXY_URL] [--dump-response]
[--output OUTPUT] [--tor] [--unique-tor] [--csv] [--xlsx] [--json JSON_FILE] [--timeout TIMEOUT] [--print-all] [--print-found] [--no-color] [--browse] [--local] [--nsfw] [--txt] [--ignore-exclusions]
[--site SITE_NAME] [--proxy PROXY_URL] [--json JSON_FILE]
[--timeout TIMEOUT] [--print-all] [--print-found] [--no-color]
[--browse] [--local] [--nsfw]
USERNAMES [USERNAMES ...] USERNAMES [USERNAMES ...]
Sherlock: Find Usernames Across Social Networks (Version 0.14.3) Sherlock: Find Usernames Across Social Networks (Version 0.16.0)
positional arguments: positional arguments:
USERNAMES One or more usernames to check with social networks. USERNAMES One or more usernames to check with social networks. Check similar usernames using {?} (replace to '_', '-', '.').
Check similar usernames using {?} (replace to '_', '-', '.').
optional arguments: options:
-h, --help show this help message and exit -h, --help show this help message and exit
--version Display version information and dependencies. --version Display version information and dependencies.
--verbose, -v, -d, --debug --verbose, -v, -d, --debug
Display extra debugging information and metrics. Display extra debugging information and metrics.
--folderoutput FOLDEROUTPUT, -fo FOLDEROUTPUT --folderoutput FOLDEROUTPUT, -fo FOLDEROUTPUT
If using multiple usernames, the output of the results will be If using multiple usernames, the output of the results will be saved to this folder.
saved to this folder.
--output OUTPUT, -o OUTPUT --output OUTPUT, -o OUTPUT
If using single username, the output of the result will be saved If using single username, the output of the result will be saved to this file.
to this file.
--tor, -t Make requests over Tor; increases runtime; requires Tor to be
installed and in system path.
--unique-tor, -u Make requests over Tor with new Tor circuit after each request;
increases runtime; requires Tor to be installed and in system
path.
--csv Create Comma-Separated Values (CSV) File. --csv Create Comma-Separated Values (CSV) File.
--xlsx Create the standard file for the modern Microsoft Excel --xlsx Create the standard file for the modern Microsoft Excel spreadsheet (xlsx).
spreadsheet (xlsx). --site SITE_NAME Limit analysis to just the listed sites. Add multiple options to specify more than one site.
--site SITE_NAME Limit analysis to just the listed sites. Add multiple options to
specify more than one site.
--proxy PROXY_URL, -p PROXY_URL --proxy PROXY_URL, -p PROXY_URL
Make requests over a proxy. e.g. socks5://127.0.0.1:1080 Make requests over a proxy. e.g. socks5://127.0.0.1:1080
--dump-response Dump the HTTP response to stdout for targeted debugging.
--json JSON_FILE, -j JSON_FILE --json JSON_FILE, -j JSON_FILE
Load data from a JSON file or an online, valid, JSON file. Load data from a JSON file or an online, valid, JSON file. Upstream PR numbers also accepted.
--timeout TIMEOUT Time (in seconds) to wait for response to requests (Default: 60) --timeout TIMEOUT Time (in seconds) to wait for response to requests (Default: 60)
--print-all Output sites where the username was not found. --print-all Output sites where the username was not found.
--print-found Output sites where the username was found. --print-found Output sites where the username was found (also if exported as file).
--no-color Don't color terminal output --no-color Don't color terminal output
--browse, -b Browse to all results on default browser. --browse, -b Browse to all results on default browser.
--local, -l Force the use of the local data.json file. --local, -l Force the use of the local data.json file.
--nsfw Include checking of NSFW sites from default list. --nsfw Include checking of NSFW sites from default list.
--txt Enable creation of a txt file
--ignore-exclusions Ignore upstream exclusions (may return more false positives)
``` ```
## Apify Actor Usage [![Sherlock Actor](https://apify.com/actor-badge?actor=netmilk/sherlock)](https://apify.com/netmilk/sherlock?fpr=sherlock)
<a href="https://apify.com/netmilk/sherlock?fpr=sherlock"><img src="https://apify.com/ext/run-on-apify.png" alt="Run Sherlock Actor on Apify" width="176" height="39" /></a>
You can run Sherlock in the cloud without installation using the [Sherlock Actor](https://apify.com/netmilk/sherlock?fpr=sherlock) on [Apify](https://apify.com?fpr=sherlock) free of charge.
``` bash
$ echo '{"usernames":["user123"]}' | apify call -so netmilk/sherlock
[{
"username": "user123",
"links": [
"https://www.1337x.to/user/user123/",
...
]
}]
```
Read more about the [Sherlock Actor](../.actor/README.md), including how to use it programmatically via the Apify [API](https://apify.com/netmilk/sherlock/api?fpr=sherlock), [CLI](https://docs.apify.com/cli/?fpr=sherlock) and [JS/TS and Python SDKs](https://docs.apify.com/sdk?fpr=sherlock).
## Credits ## Credits
@@ -124,7 +96,7 @@ Thank you to everyone who has contributed to Sherlock! ❤️
<img src="https://contrib.rocks/image?&columns=25&max=10000&&repo=sherlock-project/sherlock" alt="contributors"/> <img src="https://contrib.rocks/image?&columns=25&max=10000&&repo=sherlock-project/sherlock" alt="contributors"/>
</a> </a>
## Star history ## Star History
<picture> <picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=sherlock-project/sherlock&type=Date&theme=dark" /> <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=sherlock-project/sherlock&type=Date&theme=dark" />
@@ -135,7 +107,7 @@ Thank you to everyone who has contributed to Sherlock! ❤️
## License ## License
MIT © Sherlock Project<br/> MIT © Sherlock Project<br/>
Original Creator - [Siddharth Dushantha](https://github.com/sdushantha) Creator - [Siddharth Dushantha](https://github.com/sdushantha)
<!-- Reference Links --> <!-- Reference Links -->
+5 -5
View File
@@ -8,7 +8,7 @@ source = "init"
[tool.poetry] [tool.poetry]
name = "sherlock-project" name = "sherlock-project"
version = "0.16.0" version = "0.16.1"
description = "Hunt down social media accounts by username across social networks" description = "Hunt down social media accounts by username across social networks"
license = "MIT" license = "MIT"
authors = [ authors = [
@@ -29,6 +29,10 @@ classifiers = [
"Natural Language :: English", "Natural Language :: English",
"Operating System :: OS Independent", "Operating System :: OS Independent",
"Programming Language :: Python :: 3", "Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Topic :: Security" "Topic :: Security"
] ]
homepage = "https://sherlockproject.xyz/" homepage = "https://sherlockproject.xyz/"
@@ -46,14 +50,10 @@ PySocks = "^1.7.0"
requests = "^2.22.0" requests = "^2.22.0"
requests-futures = "^1.0.0" requests-futures = "^1.0.0"
stem = "^1.8.0" stem = "^1.8.0"
torrequest = "^0.1.0"
pandas = "^2.2.1" pandas = "^2.2.1"
openpyxl = "^3.0.10" openpyxl = "^3.0.10"
tomli = "^2.2.1" tomli = "^2.2.1"
[tool.poetry.extras]
tor = ["torrequest"]
[tool.poetry.group.dev.dependencies] [tool.poetry.group.dev.dependencies]
jsonschema = "^4.0.0" jsonschema = "^4.0.0"
rstr = "^3.2.2" rstr = "^3.2.2"
+2 -9
View File
@@ -37,7 +37,6 @@ class QueryNotify:
self.result = result self.result = result
# return
def start(self, message=None): def start(self, message=None):
"""Notify Start. """Notify Start.
@@ -56,7 +55,6 @@ class QueryNotify:
Nothing. Nothing.
""" """
# return
def update(self, result): def update(self, result):
"""Notify Update. """Notify Update.
@@ -75,7 +73,6 @@ class QueryNotify:
self.result = result self.result = result
# return
def finish(self, message=None): def finish(self, message=None):
"""Notify Finish. """Notify Finish.
@@ -94,7 +91,6 @@ class QueryNotify:
Nothing. Nothing.
""" """
# return
def __str__(self): def __str__(self):
"""Convert Object To String. """Convert Object To String.
@@ -137,7 +133,6 @@ class QueryNotifyPrint(QueryNotify):
self.print_all = print_all self.print_all = print_all
self.browse = browse self.browse = browse
return
def start(self, message): def start(self, message):
"""Notify Start. """Notify Start.
@@ -163,7 +158,6 @@ class QueryNotifyPrint(QueryNotify):
# An empty line between first line and the result(more clear output) # An empty line between first line and the result(more clear output)
print('\r') print('\r')
return
def countResults(self): def countResults(self):
"""This function counts the number of results. Every time the function is called, """This function counts the number of results. Every time the function is called,
@@ -238,7 +232,7 @@ class QueryNotifyPrint(QueryNotify):
Fore.WHITE + "]" + Fore.WHITE + "]" +
Fore.GREEN + f" {self.result.site_name}:" + Fore.GREEN + f" {self.result.site_name}:" +
Fore.YELLOW + f" {msg}") Fore.YELLOW + f" {msg}")
elif result.status == QueryStatus.WAF: elif result.status == QueryStatus.WAF:
if self.print_all: if self.print_all:
print(Style.BRIGHT + Fore.WHITE + "[" + print(Style.BRIGHT + Fore.WHITE + "[" +
@@ -254,10 +248,9 @@ class QueryNotifyPrint(QueryNotify):
f"Unknown Query Status '{result.status}' for site '{self.result.site_name}'" f"Unknown Query Status '{result.status}' for site '{self.result.site_name}'"
) )
return
def finish(self, message="The processing has been finished."): def finish(self, message="The processing has been finished."):
"""Notify Start. """Notify Finish.
Will print the last line to the standard output. Will print the last line to the standard output.
Keyword Arguments: Keyword Arguments:
self -- This object. self -- This object.
File diff suppressed because it is too large Load Diff
+143 -74
View File
@@ -1,80 +1,149 @@
{ {
"$schema": "https://json-schema.org/draft/2020-12/schema", "$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Sherlock Target Manifest", "title": "Sherlock Target Manifest",
"description": "Social media targets to probe for the existence of known usernames", "description": "Social media targets to probe for the existence of known usernames",
"type": "object", "type": "object",
"properties": { "properties": {
"$schema": { "type": "string" } "$schema": { "type": "string" }
}, },
"patternProperties": { "patternProperties": {
"^(?!\\$).*?$": { "^(?!\\$).*?$": {
"type": "object", "type": "object",
"description": "Target name and associated information (key should be human readable name)", "description": "Target name and associated information (key should be human readable name)",
"required": [ "url", "urlMain", "errorType", "username_claimed" ], "required": ["url", "urlMain", "errorType", "username_claimed"],
"properties": { "properties": {
"url": { "type": "string" }, "url": { "type": "string" },
"urlMain": { "type": "string" }, "urlMain": { "type": "string" },
"urlProbe": { "type": "string" }, "urlProbe": { "type": "string" },
"username_claimed": { "type": "string" }, "username_claimed": { "type": "string" },
"regexCheck": { "type": "string" }, "regexCheck": { "type": "string" },
"isNSFW": { "type": "boolean" }, "isNSFW": { "type": "boolean" },
"headers": { "type": "object" }, "headers": { "type": "object" },
"request_payload": { "type": "object" }, "request_payload": { "type": "object" },
"__comment__": { "__comment__": {
"type": "string", "type": "string",
"description": "Used to clarify important target information if (and only if) a commit message would not suffice.\nThis key should not be parsed anywhere within Sherlock." "description": "Used to clarify important target information if (and only if) a commit message would not suffice.\nThis key should not be parsed anywhere within Sherlock."
}, },
"tags": { "tags": {
"oneOf": [ "oneOf": [
{ "$ref": "#/$defs/tag" }, { "$ref": "#/$defs/tag" },
{ "type": "array", "items": { "$ref": "#/$defs/tag" } } { "type": "array", "items": { "$ref": "#/$defs/tag" } }
] ]
}, },
"request_method": { "request_method": {
"type": "string", "type": "string",
"enum": [ "GET", "POST", "HEAD", "PUT" ] "enum": ["GET", "POST", "HEAD", "PUT"]
}, },
"errorType": {
"oneOf": [
{
"type": "string",
"enum": ["message", "response_url", "status_code"]
},
{
"type": "array",
"items": {
"type": "string",
"enum": ["message", "response_url", "status_code"]
}
}
]
},
"errorMsg": {
"oneOf": [
{ "type": "string" },
{ "type": "array", "items": { "type": "string" } }
]
},
"errorCode": {
"oneOf": [
{ "type": "integer" },
{ "type": "array", "items": { "type": "integer" } }
]
},
"errorUrl": { "type": "string" },
"response_url": { "type": "string" }
},
"dependencies": {
"errorMsg": {
"oneOf": [
{ "properties": { "errorType": { "const": "message" } } },
{
"properties": {
"errorType": { "errorType": {
"type": "string", "type": "array",
"enum": [ "message", "response_url", "status_code" ] "contains": { "const": "message" }
},
"errorMsg": {
"oneOf": [
{ "type": "string" },
{ "type": "array", "items": { "type": "string" } }
]
},
"errorCode": {
"oneOf": [
{ "type": "integer" },
{ "type": "array", "items": { "type": "integer" } }
]
},
"errorUrl": { "type": "string" },
"response_url": { "type": "string" }
},
"dependencies": {
"errorMsg": {
"properties" : { "errorType": { "const": "message" } }
},
"errorUrl": {
"properties": { "errorType": { "const": "response_url" } }
},
"errorCode": {
"properties": { "errorType": { "const": "status_code" } }
} }
}, }
"if": { "properties": { "errorType": { "const": "message" } } }, }
"then": { "required": [ "errorMsg" ] }, ]
"else": { },
"if": { "properties": { "errorType": { "const": "response_url" } } }, "errorUrl": {
"then": { "required": [ "errorUrl" ] } "oneOf": [
}, { "properties": { "errorType": { "const": "response_url" } } },
"additionalProperties": false {
"properties": {
"errorType": {
"type": "array",
"contains": { "const": "response_url" }
}
}
}
]
},
"errorCode": {
"oneOf": [
{ "properties": { "errorType": { "const": "status_code" } } },
{
"properties": {
"errorType": {
"type": "array",
"contains": { "const": "status_code" }
}
}
}
]
} }
}, },
"additionalProperties": false, "allOf": [
"$defs": { {
"tag": { "type": "string", "enum": [ "adult", "gaming" ] } "if": {
"anyOf": [
{ "properties": { "errorType": { "const": "message" } } },
{
"properties": {
"errorType": {
"type": "array",
"contains": { "const": "message" }
}
}
}
]
},
"then": { "required": ["errorMsg"] }
},
{
"if": {
"anyOf": [
{ "properties": { "errorType": { "const": "response_url" } } },
{
"properties": {
"errorType": {
"type": "array",
"contains": { "const": "response_url" }
}
}
}
]
},
"then": { "required": ["errorUrl"] }
}
],
"additionalProperties": false
} }
},
"additionalProperties": false,
"$defs": {
"tag": { "type": "string", "enum": ["adult", "gaming"] }
}
} }
+68 -123
View File
@@ -136,6 +136,9 @@ def get_response(request_future, error_type, social_network):
except requests.exceptions.RequestException as err: except requests.exceptions.RequestException as err:
error_context = "Unknown Error" error_context = "Unknown Error"
exception_text = str(err) exception_text = str(err)
except UnicodeError as err:
error_context = "Encoding Error"
exception_text = str(err)
return response, error_context, exception_text return response, error_context, exception_text
@@ -171,8 +174,6 @@ def sherlock(
username: str, username: str,
site_data: dict[str, dict[str, str]], site_data: dict[str, dict[str, str]],
query_notify: QueryNotify, query_notify: QueryNotify,
tor: bool = False,
unique_tor: bool = False,
dump_response: bool = False, dump_response: bool = False,
proxy: Optional[str] = None, proxy: Optional[str] = None,
timeout: int = 60, timeout: int = 60,
@@ -188,8 +189,6 @@ def sherlock(
query_notify -- Object with base type of QueryNotify(). query_notify -- Object with base type of QueryNotify().
This will be used to notify the caller about This will be used to notify the caller about
query results. query results.
tor -- Boolean indicating whether to use a tor circuit for the requests.
unique_tor -- Boolean indicating whether to use a new tor circuit for each request.
proxy -- String indicating the proxy URL proxy -- String indicating the proxy URL
timeout -- Time in seconds to wait before timing out request. timeout -- Time in seconds to wait before timing out request.
Default is 60 seconds. Default is 60 seconds.
@@ -210,32 +209,9 @@ def sherlock(
# Notify caller that we are starting the query. # Notify caller that we are starting the query.
query_notify.start(username) query_notify.start(username)
# Create session based on request methodology
if tor or unique_tor:
try:
from torrequest import TorRequest # noqa: E402
except ImportError:
print("Important!")
print("> --tor and --unique-tor are now DEPRECATED, and may be removed in a future release of Sherlock.")
print("> If you've installed Sherlock via pip, you can include the optional dependency via `pip install 'sherlock-project[tor]'`.")
print("> Other packages should refer to their documentation, or install it separately with `pip install torrequest`.\n")
sys.exit(query_notify.finish())
print("Important!") # Normal requests
print("> --tor and --unique-tor are now DEPRECATED, and may be removed in a future release of Sherlock.") underlying_session = requests.session()
# Requests using Tor obfuscation
try:
underlying_request = TorRequest()
except OSError:
print("Tor not found in system path. Unable to continue.\n")
sys.exit(query_notify.finish())
underlying_session = underlying_request.session
else:
# Normal requests
underlying_session = requests.session()
underlying_request = requests.Request()
# Limit number of workers to 20. # Limit number of workers to 20.
# This is probably vastly overkill. # This is probably vastly overkill.
@@ -359,15 +335,10 @@ def sherlock(
# Store future in data for access later # Store future in data for access later
net_info["request_future"] = future net_info["request_future"] = future
# Reset identify for tor (if needed)
if unique_tor:
underlying_request.reset_identity()
# Add this site's results into final dictionary with all the other results. # Add this site's results into final dictionary with all the other results.
results_total[social_network] = results_site results_total[social_network] = results_site
# Open the file containing account links # Open the file containing account links
# Core logic: If tor requests, make them here. If multi-threaded requests, wait for responses
for social_network, net_info in site_data.items(): for social_network, net_info in site_data.items():
# Retrieve results again # Retrieve results again
results_site = results_total.get(social_network) results_site = results_total.get(social_network)
@@ -381,6 +352,8 @@ def sherlock(
# Get the expected error type # Get the expected error type
error_type = net_info["errorType"] error_type = net_info["errorType"]
if isinstance(error_type, str):
error_type: list[str] = [error_type]
# Retrieve future and ensure it has finished # Retrieve future and ensure it has finished
future = net_info["request_future"] future = net_info["request_future"]
@@ -425,58 +398,60 @@ def sherlock(
elif any(hitMsg in r.text for hitMsg in WAFHitMsgs): elif any(hitMsg in r.text for hitMsg in WAFHitMsgs):
query_status = QueryStatus.WAF query_status = QueryStatus.WAF
elif error_type == "message":
# error_flag True denotes no error found in the HTML
# error_flag False denotes error found in the HTML
error_flag = True
errors = net_info.get("errorMsg")
# errors will hold the error message
# it can be string or list
# by isinstance method we can detect that
# and handle the case for strings as normal procedure
# and if its list we can iterate the errors
if isinstance(errors, str):
# Checks if the error message is in the HTML
# if error is present we will set flag to False
if errors in r.text:
error_flag = False
else:
# If it's list, it will iterate all the error message
for error in errors:
if error in r.text:
error_flag = False
break
if error_flag:
query_status = QueryStatus.CLAIMED
else:
query_status = QueryStatus.AVAILABLE
elif error_type == "status_code":
error_codes = net_info.get("errorCode")
query_status = QueryStatus.CLAIMED
# Type consistency, allowing for both singlets and lists in manifest
if isinstance(error_codes, int):
error_codes = [error_codes]
if error_codes is not None and r.status_code in error_codes:
query_status = QueryStatus.AVAILABLE
elif r.status_code >= 300 or r.status_code < 200:
query_status = QueryStatus.AVAILABLE
elif error_type == "response_url":
# For this detection method, we have turned off the redirect.
# So, there is no need to check the response URL: it will always
# match the request. Instead, we will ensure that the response
# code indicates that the request was successful (i.e. no 404, or
# forward to some odd redirect).
if 200 <= r.status_code < 300:
query_status = QueryStatus.CLAIMED
else:
query_status = QueryStatus.AVAILABLE
else: else:
# It should be impossible to ever get here... if any(errtype not in ["message", "status_code", "response_url"] for errtype in error_type):
raise ValueError( error_context = f"Unknown error type '{error_type}' for {social_network}"
f"Unknown Error Type '{error_type}' for " f"site '{social_network}'" query_status = QueryStatus.UNKNOWN
) else:
if "message" in error_type:
# error_flag True denotes no error found in the HTML
# error_flag False denotes error found in the HTML
error_flag = True
errors = net_info.get("errorMsg")
# errors will hold the error message
# it can be string or list
# by isinstance method we can detect that
# and handle the case for strings as normal procedure
# and if its list we can iterate the errors
if isinstance(errors, str):
# Checks if the error message is in the HTML
# if error is present we will set flag to False
if errors in r.text:
error_flag = False
else:
# If it's list, it will iterate all the error message
for error in errors:
if error in r.text:
error_flag = False
break
if error_flag:
query_status = QueryStatus.CLAIMED
else:
query_status = QueryStatus.AVAILABLE
if "status_code" in error_type and query_status is not QueryStatus.AVAILABLE:
error_codes = net_info.get("errorCode")
query_status = QueryStatus.CLAIMED
# Type consistency, allowing for both singlets and lists in manifest
if isinstance(error_codes, int):
error_codes = [error_codes]
if error_codes is not None and r.status_code in error_codes:
query_status = QueryStatus.AVAILABLE
elif r.status_code >= 300 or r.status_code < 200:
query_status = QueryStatus.AVAILABLE
if "response_url" in error_type and query_status is not QueryStatus.AVAILABLE:
# For this detection method, we have turned off the redirect.
# So, there is no need to check the response URL: it will always
# match the request. Instead, we will ensure that the response
# code indicates that the request was successful (i.e. no 404, or
# forward to some odd redirect).
if 200 <= r.status_code < 300:
query_status = QueryStatus.CLAIMED
else:
query_status = QueryStatus.AVAILABLE
if dump_response: if dump_response:
print("+++++++++++++++++++++") print("+++++++++++++++++++++")
@@ -596,22 +571,6 @@ def main():
dest="output", dest="output",
help="If using single username, the output of the result will be saved to this file.", help="If using single username, the output of the result will be saved to this file.",
) )
parser.add_argument(
"--tor",
"-t",
action="store_true",
dest="tor",
default=False,
help="Make requests over Tor; increases runtime; requires Tor to be installed and in system path.",
)
parser.add_argument(
"--unique-tor",
"-u",
action="store_true",
dest="unique_tor",
default=False,
help="Make requests over Tor with new Tor circuit after each request; increases runtime; requires Tor to be installed and in system path.",
)
parser.add_argument( parser.add_argument(
"--csv", "--csv",
action="store_true", action="store_true",
@@ -720,11 +679,11 @@ def main():
) )
parser.add_argument( parser.add_argument(
"--no-txt", "--txt",
action="store_true", action="store_true",
dest="no_txt", dest="output_txt",
default=False, default=False,
help="Disable creation of a txt file", help="Enable creation of a txt file",
) )
parser.add_argument( parser.add_argument(
@@ -742,7 +701,7 @@ def main():
# Check for newer version of Sherlock. If it exists, let the user know about it # Check for newer version of Sherlock. If it exists, let the user know about it
try: try:
latest_release_raw = requests.get(forge_api_latest_release).text latest_release_raw = requests.get(forge_api_latest_release, timeout=10).text
latest_release_json = json_loads(latest_release_raw) latest_release_json = json_loads(latest_release_raw)
latest_remote_tag = latest_release_json["tag_name"] latest_remote_tag = latest_release_json["tag_name"]
@@ -755,22 +714,10 @@ def main():
except Exception as error: except Exception as error:
print(f"A problem occurred while checking for an update: {error}") print(f"A problem occurred while checking for an update: {error}")
# Argument check
# TODO regex check on args.proxy
if args.tor and (args.proxy is not None):
raise Exception("Tor and Proxy cannot be set at the same time.")
# Make prompts # Make prompts
if args.proxy is not None: if args.proxy is not None:
print("Using the proxy: " + args.proxy) print("Using the proxy: " + args.proxy)
if args.tor or args.unique_tor:
print("Using Tor to make requests")
print(
"Warning: some websites might refuse connecting over Tor, so note that using this option might increase connection errors."
)
if args.no_color: if args.no_color:
# Disable color output. # Disable color output.
init(strip=True, convert=False) init(strip=True, convert=False)
@@ -802,7 +749,7 @@ def main():
if args.json_file.isnumeric(): if args.json_file.isnumeric():
pull_number = args.json_file pull_number = args.json_file
pull_url = f"https://api.github.com/repos/sherlock-project/sherlock/pulls/{pull_number}" pull_url = f"https://api.github.com/repos/sherlock-project/sherlock/pulls/{pull_number}"
pull_request_raw = requests.get(pull_url).text pull_request_raw = requests.get(pull_url, timeout=10).text
pull_request_json = json_loads(pull_request_raw) pull_request_json = json_loads(pull_request_raw)
# Check if it's a valid pull request # Check if it's a valid pull request
@@ -871,8 +818,6 @@ def main():
username, username,
site_data, site_data,
query_notify, query_notify,
tor=args.tor,
unique_tor=args.unique_tor,
dump_response=args.dump_response, dump_response=args.dump_response,
proxy=args.proxy, proxy=args.proxy,
timeout=args.timeout, timeout=args.timeout,
@@ -888,7 +833,7 @@ def main():
else: else:
result_file = f"{username}.txt" result_file = f"{username}.txt"
if not args.no_txt: if args.output_txt:
with open(result_file, "w", encoding="utf-8") as file: with open(result_file, "w", encoding="utf-8") as file:
exists_counter = 0 exists_counter = 0
for website_name in results: for website_name in results:
@@ -973,8 +918,8 @@ def main():
{ {
"username": usernames, "username": usernames,
"name": names, "name": names,
"url_main": url_main, "url_main": [f'=HYPERLINK(\"{u}\")' for u in url_main],
"url_user": url_user, "url_user": [f'=HYPERLINK(\"{u}\")' for u in url_user],
"exists": exists, "exists": exists,
"http_status": http_status, "http_status": http_status,
"response_time_s": response_time_s, "response_time_s": response_time_s,
+3 -8
View File
@@ -8,7 +8,7 @@ import requests
import secrets import secrets
MANIFEST_URL = "https://raw.githubusercontent.com/sherlock-project/sherlock/master/sherlock_project/resources/data.json" MANIFEST_URL = "https://data.sherlockproject.xyz"
EXCLUSIONS_URL = "https://raw.githubusercontent.com/sherlock-project/sherlock/refs/heads/exclusions/false_positive_exclusions.txt" EXCLUSIONS_URL = "https://raw.githubusercontent.com/sherlock-project/sherlock/refs/heads/exclusions/false_positive_exclusions.txt"
class SiteInformation: class SiteInformation:
@@ -121,15 +121,10 @@ class SitesInformation:
# users from creating issue about false positives which has already been fixed or having outdated data # users from creating issue about false positives which has already been fixed or having outdated data
data_file_path = MANIFEST_URL data_file_path = MANIFEST_URL
# Ensure that specified data file has correct extension.
if not data_file_path.lower().endswith(".json"):
raise FileNotFoundError(f"Incorrect JSON file extension for data file '{data_file_path}'.")
# if "http://" == data_file_path[:7].lower() or "https://" == data_file_path[:8].lower():
if data_file_path.lower().startswith("http"): if data_file_path.lower().startswith("http"):
# Reference is to a URL. # Reference is to a URL.
try: try:
response = requests.get(url=data_file_path) response = requests.get(url=data_file_path, timeout=30)
except Exception as error: except Exception as error:
raise FileNotFoundError( raise FileNotFoundError(
f"Problem while attempting to access data file URL '{data_file_path}': {error}" f"Problem while attempting to access data file URL '{data_file_path}': {error}"
@@ -166,7 +161,7 @@ class SitesInformation:
if honor_exclusions: if honor_exclusions:
try: try:
response = requests.get(url=EXCLUSIONS_URL) response = requests.get(url=EXCLUSIONS_URL, timeout=10)
if response.status_code == 200: if response.status_code == 200:
exclusions = response.text.splitlines() exclusions = response.text.splitlines()
exclusions = [exclusion.strip() for exclusion in exclusions] exclusions = [exclusion.strip() for exclusion in exclusions]
+47
View File
@@ -0,0 +1,47 @@
"""Tests for handling usernames with special/unicode characters."""
from concurrent.futures import Future
from sherlock_project.sherlock import get_response
def _make_future_with_exception(exc):
"""Create a Future that raises the given exception."""
future = Future()
future.set_exception(exc)
return future
def test_get_response_handles_unicode_decode_error():
"""Regression test for issue #2730.
Usernames with special characters (e.g. 'Émile') can trigger a
UnicodeDecodeError inside the requests library during redirect
handling. This must not crash the program.
"""
future = _make_future_with_exception(
UnicodeDecodeError("utf-8", b"\xe9", 0, 1, "invalid continuation byte")
)
response, error_context, exception_text = get_response(
request_future=future,
error_type=["status_code"],
social_network="TestSite",
)
assert response is None
assert error_context == "Encoding Error"
assert "utf-8" in exception_text
def test_get_response_handles_unicode_encode_error():
"""UnicodeEncodeError should also be caught (subclass of UnicodeError)."""
future = _make_future_with_exception(
UnicodeEncodeError("ascii", "É", 0, 1, "ordinal not in range(128)")
)
response, error_context, exception_text = get_response(
request_future=future,
error_type=["status_code"],
social_network="TestSite",
)
assert response is None
assert error_context == "Encoding Error"
assert "ascii" in exception_text
+3 -3
View File
@@ -4,7 +4,7 @@ from sherlock_interactives import Interactives
from sherlock_interactives import InteractivesSubprocessError from sherlock_interactives import InteractivesSubprocessError
def test_remove_nsfw(sites_obj): def test_remove_nsfw(sites_obj):
nsfw_target: str = 'Pornhub' nsfw_target: str = 'Xvideos'
assert nsfw_target in {site.name: site.information for site in sites_obj} assert nsfw_target in {site.name: site.information for site in sites_obj}
sites_obj.remove_nsfw_sites() sites_obj.remove_nsfw_sites()
assert nsfw_target not in {site.name: site.information for site in sites_obj} assert nsfw_target not in {site.name: site.information for site in sites_obj}
@@ -12,8 +12,8 @@ def test_remove_nsfw(sites_obj):
# Parametrized sites should *not* include Motherless, which is acting as the control # Parametrized sites should *not* include Motherless, which is acting as the control
@pytest.mark.parametrize('nsfwsites', [ @pytest.mark.parametrize('nsfwsites', [
['Pornhub'], ['Xvideos'],
['Pornhub', 'Xvideos'], ['Xvideos', 'Erome'],
]) ])
def test_nsfw_explicit_selection(sites_obj, nsfwsites): def test_nsfw_explicit_selection(sites_obj, nsfwsites):
for site in nsfwsites: for site in nsfwsites:
+1
View File
@@ -16,6 +16,7 @@ def set_pattern_upper_bound(pattern: str, upper_bound: int = FALSE_POSITIVE_QUAN
"""Set upper bound for regex patterns that use quantifiers such as `+` `*` or `{n,}`.""" """Set upper bound for regex patterns that use quantifiers such as `+` `*` or `{n,}`."""
def replace_upper_bound(match: re.Match) -> str: # type: ignore def replace_upper_bound(match: re.Match) -> str: # type: ignore
lower_bound: int = int(match.group(1)) if match.group(1) else 0 # type: ignore lower_bound: int = int(match.group(1)) if match.group(1) else 0 # type: ignore
nonlocal upper_bound
upper_bound = upper_bound if lower_bound < upper_bound else lower_bound # type: ignore # noqa: F823 upper_bound = upper_bound if lower_bound < upper_bound else lower_bound # type: ignore # noqa: F823
return f'{{{lower_bound},{upper_bound}}}' return f'{{{lower_bound},{upper_bound}}}'