Passwords, API tokens, Secret access keys, license keys, etc are very crucial things needed for development, and sometimes we end up committing them to Github. It may happen accidentally or sometimes we intentionally do it just for testing purposes and think that we will remove it later. But here is a catch, even if you remove that sensitive data from GitHub, it will still be visible in your commit history.
Here you can see I added my AWS credentials to the repo, but also later removed them from the credentials file and committed again. Still, all my credentials are getting exposed in the commit history.
Like this, there are a couple of situations where you need to scrub your repo, filter out the commits holding sensitive data or even remove them. So, here are two tools that can help you achieve that.
✅ Prerequisites
To follow along you need to create a new repo just for testing out these tools before trying them in the actual repo. In the new repo do the following things to create a commit history filled with sensitive data.
Add a file named
aws-credentials
and add some random credentials that you may have generated like below. And, commit them to the repo."account": "012345698710" , "aws_access_key_id": "AKIAUZEKTKQIBVANVHDN" , "aws_secret_access_key": "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"
Empty the credentials from the credentials keys and again commit the changes to the repo.
So, now you have a commit history that has credentials in it.
➡️Getting Started with Trufflehog
Trufflehog will go through the entire commit history of each branch and check out each commit to find out if it holds any secret or credential. It has over 700 credential detectors that help in detecting what data may be a potential credential and to which organization it belongs.
Install Trufflehog on your CLI.
brew install trufflesecurity/trufflehog/trufflehog
For more installation methods refer here.
Scan the repo for potential secrets and credentials.
trufflehog git <GITHUB REPO URL> --only-verified
This will return a list of all the dirty commits and the secret they hold.
📌 NOTE:
--only-verified
tag helps to verify your secrets if they are still active or not. For eg. The aws credentials that we added are still active on my AWS account, so they come under verified results. But if I delete those credentials from my AWS account, then they will come under unverified results as they are not active.
➡️ Getting Started with BFG Repo-Cleaner
BFG repo-cleaner is an alternative to git-filter-branch and is used to clean your repo by removing big unnecessary files or by removing Passwords, Credentials & other Private data.
Install BFG repo-cleaner.
// using npm npm install bfg // using python pip3 install bfg
Clone the repo locally, and create a new file named
passwords.txt
just outside the repo, now whatever you write in this file is what that bfg tool will search for in the repo commits, and remove them in the further steps.I am going to put in my aws credentials like account no. and access keys in this file, but only some part of it because that would be enough to search for them in the repo, also I want to remove only some part of it.
0123456 AKIAUZEKT 7392+PCu1MuEkL
cd
into your repo in the terminal. Run this command to scan the repo and replace the credentials inpassword.txt
to***REMOVED***
in all the dirty commits.bfg --replace-text ../passwords.txt
You may receive an output like this 👇, which will tell you a bunch of things like, in which commits credentials were found and the commits that were protected and creds were replaced.
Now we need to push these changes to Github repo, so that they are visible in the commit history.
git reflog expire --expire=now --all && git gc --prune=now --aggressive git push -f
Now you see the changes in our commits like this 👇
Thank you for reading this blog, do like👍, and share it on socials if you found it informative!!
🚀 Feel free to contact me on my socials if you get stuck anywhere LinkedIn, Github, Twitter
🧑💻Happy Learning !!👩💻