bash getopts: manage command line options in your scripts

In the case of multi-stage shell scripts, it is often useful to break the script into distinct sections, where each function handles a specific task only. Production often tends to be a messy place, scripts can, and sometimes do break midway.

A system operator might need to only re-run a part of the script, or run it again from a specific point, without starting over from the beginning. This is the situation where structured CLI options become very useful.

Wouldn’t it be nice to just be able to add a flag (e.g. -i = insert, -d = download etc.) to your script for each function you want to run? That’s where getopts comes to the rescue.

But before diving in…

The “traditional” way

A lot of script authors attempt to use $# with a while loop and then for each option, use shift to remove the associated argument for each option after the option itself has been parsed.

Here’s how that would look like for a hypothetical example script:

while [ $# -gt 0 ]; do
    OPT=$1
    shift
    case ${OPT} in
        -d)
        provider="$1"
        download_src_files "$provider" ;;
        -i)
        target="$1"
        insert_from_csv "$target" ;;
        *) if [ "${OPT:0:1}" = "-" ]; then
            echo "Unknown option: $OPT"
        fi
        ;;
    esac
done

Of course this works fine, but it’s clunky and also pretty difficult to read (especially once the number of options grows more numerous).

There’s a couple of evident issues here:

-> No handle on the situation where the system operator forgets to specify any options at all. In fact, checking $? would report that the script went through successfully!

-> This construct doesn’t check whether an argument was provided with each option, that has to be done manually either here, or within the desired function(s), adding further to that clunky feel.

-> Handling the shifting of option arguments manually is difficult to follow mentally, especially if returning to refactor the script at a later date.

-> There’s no innate way to tell whether something is an option or an option argument, so if we want to check whether the system operator has tried to use an invalid option, we must build the if statement to check for this ourselves.

Let’s break down what actually happens here:

  1. While loop keeps going as long as there are CLI arguments remaining ($# is greater than zero).
  2. For each pass of the loop, we save the first argument as an option ($OPT), and remove it from the array of inputs using shift (this is destructive, i.e. the variable is permanently removed from the list of CLI arguments).
  3. Make a decision using the case statement, if $OPT matches with one of the known cases (like -d or -i), it should call the relevant function.
  4. If it doesn’t, check that it wasn’t an argument, and return an error stating that the option wasn’t recognized.

Using getopts

Now let’s reimagine the above situation using getopts:

while getopts "d:i:" OPT; do
    case $OPT in
        d) provider="$OPTARG"
           download_src_files "$provider" ;;
        i) target="$OPTARG"
           insert_from_csv "$target" ;;
    esac
done

The following facts are immediately evident:

-> This now looks much cleaner, and we can even drop the *) case entirely, as the getopts itself will already throw an error for the situation where the system operator calls the script with an invalid option.

-> getopts also checks whether an option was provided with an argument or not, and throws an error if not.

It’s easy to see the benefits: looks much cleaner, and a lot less mental overhead.

Let’s break down what actually happens here:

First of all, you’re probably wondering what this cryptic "d:i:" is all about.

This string is called the optstring, and it contains the option characters to be recognized by getopts. If a character is followed by a : it means that getopts should expect that option to have an argument, which should be supplied following the option. I.e. "d:" means that when using the option -d, we must also supply an argument, e.g. -d something.

If we want to add a new option that does not require an argument, then simply add a new option without the : after it. E.g. if we add a cleanup function to our hypothetical script:

while getopts "d:i:c" OPT; do
    case $OPT in
        d) provider="$OPTARG"
           download_src_files "$provider" ;;
        i) target="$OPTARG"
           insert_from_csv "$target" ;;
        c) run_cleanup ;;
    esac
done

The OPT following the optstring is referred to as the name, and it’s simply the name of the bash variable that getopts will be used to refer to the option during the current loop. If a : exists after an option in the optstring, then the value of the option argument will be set to $OPTARG.

There’s also a variable we’ve not yet touched on, which is $OPTIND. This variable refers to the index of the next argument to be passed. This is a variable we can use to fix a problem we’ve almost forgotten about: the system operator calling the script without any options.

Much like before, calling the script with no options would pass, and $? would report 0. The difference is that we can now easily fix this issue with $OPTIND. The value of $OPTIND should be more than 1 (the first index being the script itself) for options to be present:

if [[ $OPTIND -eq 1 ]]; then
    echo "$0: This script should be called with at least one option"
fi

This simple check is all we need. If $OPTIND is still 1 after parsing through the while loop, then it means that no valid options were encountered at all.

Multiple option arguments

If you need multiple arguments into one of your options, it is possible, though the way to go is far from intuitive, given that getopts expects one option argument per option.

We can circumvent this by redefining $OPTARG as an array.

Once we have $OPTARG as an array, we just need to check the contents of the input, and keep adding new elements until we hit something that starts with - (since that would be the next option).

We can then simply inject a function into the relevant case where we need extra arguments, to do this.

Here’s how that looks:

function getopts_extra_args() {
    declare -i i=1
    while [ "${OPTIND}" -le "$#" -a "${!OPTIND:0:1}" != "-" ]; do
        OPTARG[i]=${!OPTIND}
        (( i++, OPTIND++ ))
    done
}

Here it is good to note that the first value of OPTARG will be equal to what $OPTARG would normally be, and any additional values will only be stored in indices starting from 1. Therefore, when we unravel this array, we need to start from index=1 if we want only the additional arguments.

With that kept in mind, in the affected case we can call this function:

while getopts "d:i:c" OPT; do
    case $OPT in
        d) getopts_extra_args "$@"
           extra_settings=("${OPTARG[@]}")
           provider="$OPTARG"
           download_src_files "$provider" ;;
        i) target="$OPTARG"
           insert_from_csv "$target" ;;
        c) run_cleanup ;;
    esac
done

This now gives access to many arguments when the option -d is called:

function download_src_files() {
   extra_settings_len=${#extra_settings[*]}
   for (( x=1; x<$extra_settings_len; x++ )); do
       echo ${extra_settings[$x]}
   done
}

Let’s try it out, assuming we have an example function like

function download_src_files() {
   echo ${FUNCNAME[0]}
   echo $1
   extra_settings_len=${#extra_settings[*]}
   for (( x=1; x<$extra_settings_len; x++ )); do
       echo ${extra_settings[$x]}
   done
}

Running bash script.sh -d example setting1 setting2 should output:

download_src_files
example
setting1
setting2

If you needed multiple arguments for the i flag as well, simply do the same thing, but with a new variable other than extra_settings.

I.e.

while getopts "d:i:c" OPT; do
    case $OPT in
        d) getopts_extra_args "$@"
           extra_settings=("${OPTARG[@]}")
           provider="$OPTARG"
           download_src_files "$provider" ;;
        i) getopts_extra_args "$@"
           extra_settings2=("${OPTARG[@]}")
           target="$OPTARG"
           insert_from_csv "$target" ;;
        c) run_cleanup ;;
    esac
done

Adding an example function:

function insert_from_csv() {
   echo ${FUNCNAME[0]}
   echo $1
   extra_settings2_len=${#extra_settings2[*]}
   for (( x=1; x<$extra_settings2_len; x++ )); do
       echo ${extra_settings2[$x]}
   done
}

Running bash script.sh -d example setting1 setting2 -i example2 setting3 setting4 should output:

download_src_files
example
setting1
setting2
insert_from_csv
example2
setting3
setting4

This can be done to as many options as needed.

I hope this post has convinced you on the utility of getopts and will be a valuable addition to your future scripts.