Scrubbing Github History of Sensitive Data

Scrubbing Github History of Sensitive Data

Passwords, API tokens, Secret access keys, license keys, etc are very crucial things needed for development, and sometimes we end up committing them to Github. It may happen accidentally or sometimes we intentionally do it just for testing purposes and think that we will remove it later. But here is a catch, even if you remove that sensitive data from GitHub, it will still be visible in your commit history.

Here you can see I added my AWS credentials to the repo, but also later removed them from the credentials file and committed again. Still, all my credentials are getting exposed in the commit history.

Like this, there are a couple of situations where you need to scrub your repo, filter out the commits holding sensitive data or even remove them. So, here are two tools that can help you achieve that.

  1. TruffleHog

  2. BFG Repo-cleaner


Prerequisites

To follow along you need to create a new repo just for testing out these tools before trying them in the actual repo. In the new repo do the following things to create a commit history filled with sensitive data.

  1. Add a file named aws-credentials and add some random credentials that you may have generated like below. And, commit them to the repo.

     "account": "012345698710" ,
     "aws_access_key_id": "AKIAUZEKTKQIBVANVHDN" ,
     "aws_secret_access_key": "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"
    
  2. Empty the credentials from the credentials keys and again commit the changes to the repo.

  3. So, now you have a commit history that has credentials in it.


➡️Getting Started with Trufflehog

Trufflehog will go through the entire commit history of each branch and check out each commit to find out if it holds any secret or credential. It has over 700 credential detectors that help in detecting what data may be a potential credential and to which organization it belongs.

  1. Install Trufflehog on your CLI.

     brew install trufflesecurity/trufflehog/trufflehog
    

    For more installation methods refer here.

  2. Scan the repo for potential secrets and credentials.

     trufflehog git <GITHUB REPO URL> --only-verified
    

    This will return a list of all the dirty commits and the secret they hold.

  3. 📌 NOTE: --only-verified tag helps to verify your secrets if they are still active or not. For eg. The aws credentials that we added are still active on my AWS account, so they come under verified results. But if I delete those credentials from my AWS account, then they will come under unverified results as they are not active.


➡️ Getting Started with BFG Repo-Cleaner

BFG repo-cleaner is an alternative to git-filter-branch and is used to clean your repo by removing big unnecessary files or by removing Passwords, Credentials & other Private data.

  1. Install BFG repo-cleaner.

     // using npm 
     npm install bfg
    
     // using python
     pip3 install bfg
    
  2. Clone the repo locally, and create a new file named passwords.txt just outside the repo, now whatever you write in this file is what that bfg tool will search for in the repo commits, and remove them in the further steps.

    I am going to put in my aws credentials like account no. and access keys in this file, but only some part of it because that would be enough to search for them in the repo, also I want to remove only some part of it.

     0123456
     AKIAUZEKT
     7392+PCu1MuEkL
    
  3. cd into your repo in the terminal. Run this command to scan the repo and replace the credentials in password.txt to ***REMOVED*** in all the dirty commits.

     bfg --replace-text ../passwords.txt
    

    You may receive an output like this 👇, which will tell you a bunch of things like, in which commits credentials were found and the commits that were protected and creds were replaced.

  4. Now we need to push these changes to Github repo, so that they are visible in the commit history.

     git reflog expire --expire=now --all && git gc --prune=now --aggressive
     git push -f
    

    Now you see the changes in our commits like this 👇


Thank you for reading this blog, do like👍, and share it on socials if you found it informative!!

🚀 Feel free to contact me on my socials if you get stuck anywhere LinkedIn, Github, Twitter

🧑‍💻Happy Learning !!👩‍💻