Use AI and Go to program a command-line predictor

Programming Snapshot – Smart Predictions with Go

© Lead Image © alphaspirit, 123RF.com

© Lead Image © alphaspirit, 123RF.com

Article from Issue 263/2022
Author(s):

Because shell command sequences tend to reoccur, smart predictions can save you time typing. We first let the shell keep notes on what gets typed, before a Go program guesses the next command and runs it for you.

When I'm developing new Snapshot articles, I regularly catch myself typing the same commands in the terminal window time and time again. Text or code files modified by vi are sent to a staging area by git add foo.go, git commit feeds them to the local repository clone, and git push origin v1:v1 backs them up on the server. New builds of the Go source code in programming examples are triggered by the go build foo.go bar.go command, before tests are run by go test, and so on. Excessive typing like this needs to be automated. Because software development dinosaurs like myself keep fighting IDEs, I need a homegrown approach.

Although the shell history will find old commands, locating the command you need in this massive list, and running it again, requires some manual work. This is rarely worthwhile because retyping is often quicker than browsing 10 entries up the list or using a search string. The key is that you normally type shell commands in a defined order. For example, vi edits a Go file, then git saves the results, and go build compiles them. Learning this context, a smart tool would be quite capable of determining what comes next. Also the command sequences I use seem to depend on the directory in which I run them. In a Go project, I use the commands I listed earlier. For a text project, I would possibly use others, such as make publish to generate HTML or PDF files.

If a tool had access to the historical sequence of commands I issued in the past, and of the directories in which I ran them in, it could offer a good preselection of the commands likely to follow. In 90 percent of the cases, users would be able to find the next command and run it again. A dash of artificial intelligence accelerates and improves the whole thing, too. Figure 1 shows an example of a flowchart for a shell session. The edges in the graph mark the transitions between the commands and the percentages next to them the probability – derived from the history file – of a certain transition taking place. All paths originating from a state therefore add up to 100 percent.

Figure 1: A typical workflow in the terminal during development work.

Logger and Predictor

To analyze which command sequences the user has typed in the shell so far, I first need a process to continuously log every single manually typed command. The Bash or Z shell (Zsh) history mechanisms are not suitable for this, because they at best record the commands themselves along with a timestamp [1]. For the predictor, however, I at least want the tool to include the directory in which the command was run for useful suggestions to be generated later.

The newer Zsh offers a preexec() hook for general interception of a typed command. I assigned a function body to the hook in line 4 of Listing 1. The shell always triggers it just before executing a command line and passes the contents of the command line to it as a string in the first parameter. My preexec() hook in turn calls the cmdhook() function defined directly before it. It strings together the current time and directory, adds the command line after this, separates the three components with spaces, and appends the results as a new line at the end of the myhist.log file in my home directory. Listing 2 shows some entries that accumulated there after I spent some time writing this article.

Listing 1

zshrc.sh

01 cmdhook() {
02   echo "$(date +%s) $(pwd) $1" >>~/.myhist.log;
03 }
04 preexec() { cmdhook "$1"; }
05 function g() {
06   cmd=$(pick 3>&1 1>&2 2>&3);
07   cmdhook "$cmd";
08   eval $cmd;
09 }

Listing 2

myhist.log

1653801083 /home/mschilli vi .zshrc
1653801106 /home/mschilli/git/articles/predict vi t.pnd
1653801863 /home/mschilli/git/articles/predict ls eg
1653801870 /home/mschilli/git/articles/predict vi ~/.myhist.log

Line 5 in Listing 1 defines the shell function g(), which I'll call later to receive suggestions from the shell for the next command to execute. I wanted the command to be just one letter in length in order to avoid typing, and "g" makes sense if you're programming in Go.

After setting g() in motion with the g command followed by the Enter key, the shell function calls the pick command (line 6). This is a Go program (which you can see starting in Listing 4) that scans the myhist.log file, using an algorithm to decide on a list of the most likely commands to follow the last one.

Listing 3

bashrc.sh

[[ -f ~/.bash-preexec.sh ]] && source ~/.bash-preexec.sh

Listing 4

history.go

01 package main
02  import (
03    "bufio"
04    "os"
05    "regexp"
06    "strings"
07  )
08 type HistEntry struct {
09   Cwd string
10   Cmd string
11  }
12 func history(histFile string) []HistEntry{
13   f, err := os.Open(histFile)
14   if err != nil {
15     panic(err)
16   }
17   defer f.Close()
18   hist := []HistEntry{}
19   scanner := bufio.NewScanner(f)
20   cmdSane := regexp.MustCompile(`^\S`)
21   for scanner.Scan() {
22     // epoch cwd cmd
23     flds := strings.SplitN(scanner.Text(), " ", 3)
24     if len(flds) != 3 ||
25        !cmdSane.MatchString(flds[2]) ||
26        flds[2] == "g" {
27          continue
28        }
29        hist = append(hist, HistEntry{
30          Cwd: flds[1], Cmd: flds[2]
31        })
32   }
33   if err := scanner.Err(); err != nil {
34     panic(err)
35   }
36   return hist
37  }

From the list of likely commands, the user needs to select the desired command using the arrow keys (or the vi mappings J and K) and then press the Enter key (Figure 2). The shell then executes the selected command directly – it could hardly be more finger-friendly. To do so, the shell function in g() fields the command string returned by pick and executes it with the built-in eval function.

Figure 2: The shell function g lists commands that will probably follow.

Teaching a New Dog Old Tricks

Using an old trick from a Snapshot column three years ago [2], the compiled Go program (pick) outputs the user menu to Stdout (file descriptor number 1, because the promptui Go library I used can't do it any other way) and lets the user pick an item. It finally outputs the choice to Stderr (file descriptor number 2), which the g() shell function in Listing 1 then receives. The wacky 3>&1 1>&2 2>&3 construction (in line 6) redirects Stderr (number 2) back to Stdout (number 1), so that the command line to be executed ends up in the shell cmd variable (line 6). Last but not least, eval then takes the variable and executes the string it contains (line 8).

Figure 2 shows the predictive shell tool in action. For historical reasons, I still write articles in the plain new documentation (PND) format, which borrows slightly from Perl's plain old documentation (POD) format. After editing the article text in t.pnd, I call g, which offers the most likely subsequent commands for selection based on the shell history gleaned from myhist.log. These commands include git add for the text file, make (an alias named m for me) to generate an article from it, a re-edit of the file with vi, and finally the command

git add -p .

that I often use to interactively promote modified file contents to the staging area.

However, instead of Zsh, Linux distributions traditionally tend to use Bash, which does not offer the preexec() hook used by my new logging component. Lucky for me that someone on GitHub has gone through the trouble of porting this eminently useful function to Bash [3]. As the first step, I installed the shell script stored on GitHub. To run it on new logins to the shell, I inserted the line from Listing 3 into the .bash_profile file. After checking it's there, the second step involves loading the .bash-preexec.sh script and running it.

The algorithm that predicts what is likely to be the next user command learns from the sequence of previously entered shell commands that the preexec() hook has written to myhist.log. Listing 4 iterates through the logfile in the history() function, creating a HistEntry type entry from each line. This structure, defined in line 8, contains an attribute for the Cmd and Cwd fields, which are the command entered by the user and the directory where the shell was located when this happened, respectively.

In the for loop that starts in line 21, the scanner from the bufio package loads the logfile lines, ignores the timestamp in the first column, and checks whether the command in the third column looks okay. The loop also ignores all commands that only consist of the g shortcut; although preexec logs this too, the predictor runs aren't going to help the oracle with its predictions.

If an empty command makes its way into the logfile (e.g., because the user quit prediction mode by pressing Ctrl+C), continue skips the line in question. The history() function adds valid entries to the end of the hist array slice as HistEntry type variables, which return hist finally returns to the caller in line 36.

Memory Aid

Based on historic data, the predictor in Listing 5 now runs the predict() function for the current directory (cwd) to guesstimate the next command the user will probably want to run. It fields the array slice with the HistEntry structures and iterates through them in the for loop starting in line 8.

Listing 5

predict.go

01 package main
02 import (
03   "sort"
04 )
05 func predict(hist []HistEntry, cwd string) []string {
06   lastCmd := ""
07   followMap := map[string]map[string]int{}
08   for _, h := range hist {
09     if h.Cwd != cwd {
10       continue
11     }
12     if lastCmd == "" {
13       lastCmd = h.Cmd
14       continue
15     }
16     cmdMap, ok := followMap[lastCmd]
17     if !ok {
18       cmdMap = map[string]int{}
19       followMap[lastCmd] = cmdMap
20     }
21     cmdMap[h.Cmd] += 1
22     lastCmd = h.Cmd
23   }
24   if lastCmd == "" {
25     // first time in this dir
26     return []string{"ls"}
27   }
28   items := []string{}
29   follows, ok := followMap[lastCmd]
30   if !ok {
31     // no follow defined, just
32     // return all cmds known
33     for from, _ := range followMap {
34       items = append(items, from)
35     }
36     return items
37   }
38   // Return best-scoring follows
39   type score struct {
40     to     string
41     weight int
42   }
43   scores := []score{}
44   for to, v := range follows {
45     scores = append(scores, score{to: to, weight: v})
46   }
47   sort.Slice(scores, func(i, j int) bool {
48     return scores[i].weight > scores[j].weight
49   })
50   for _, score := range scores {
51     items = append(items, score.to)
52   }
53   return items
54 }

In each round, the predictor stores the shell command currently processed, which lies in h.Cmd, and saves it in the lastCmd variable, so that the next round of the loop can access the previous value. Starting in the second round, the code saves information about which command followed which previous one in a two-level hash map named followMap starting in line 16 and increments the associated integer value. In other words, at the end of the for loop, the program knows how often command B followed command A. Accordingly, the algorithm evaluates the probability of command B following command A.

If there is only a single command for the current directory in the logged history, the algorithm cannot do much and takes the diplomatic approach of suggesting ls in line 26. However, if followMap lists some commands that usually follow the preceding command stored in lastCmd, the algorithm dumps each of those subsequent commands into a structure with a counter that reflects their frequency. It then uses sort.Slice() to sort an array slice of these structures in descending order by the counter, starting in line 47. Sorting a hash map like this by its numeric values would be a snap in a scripting language such as Python, but Go requires significantly more overhead because of its strict type checking.

The output, at the end of the predict() function, is the items variable – an array slice containing the commands that, based on their order, are most likely to follow the current shell command. Finally, the pick program in Listing 6 offers them up to the user.

Listing 6

pick.go

01 package main
02 import (
03   "fmt"
04   "github.com/manifoldco/promptui"
05   "os"
06   "os/user"
07   "path"
08 )
09 func main() {
10   cwd, err := os.Getwd()
11   if err != nil {
12     panic(err)
13   }
14   usr, _ := user.Current()
15   logFile := path.Join(usr.HomeDir, ".myhist.log")
16   hist := history(logFile)
17   items := predict(hist, cwd)
18   prompt := promptui.Select{
19     Label: "Pick next command",
20     Items: items,
21     Size:  10,
22   }
23   _, result, err := prompt.Run()
24   if err == nil {
25     fmt.Fprintf(os.Stderr, "%s\n", result)
26   }
27 }

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Making History

    In the history log, the Bash shell records all commands typed by the user. Mike Schilli extracts this data with Go for a statistical analysis of his typing behavior.

  • File Inspector

    Spotify, the Internet music service, collects data about its users and their taste in music. Mike Schilli requested a copy of his files to investigate them with Go.

  • Atuin

    Atuin adds some handy queries to the shell history function, while letting you synchronize your command history across the network.

  • Open Heart Surgery

    Who is constantly creating the new processes that are paralyzing the system? Which process opens the most files and how many bytes is it reading or writing? Mike Schilli pokes inside the kernel to answer these questions with bpftrace and its code probes.

  • Command Line: Data Flow

    Working in the shell has many benefits. Pipelines, redirectors, and chains of commands give users almost infinite options.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News