27. October 2015

Automation and pty(4)

Terminal Programming with Python series 1: Automation and pty(4)

Introduction

Any command-line UNIX interface may be automated.

This article will demonstrate the use of pseudo-terminals, which cause programs to believe they are attached to a terminal, even when they are not!

At first, fooling programs into beleiving they are attached to a terminal may not seem useful, but it is used in a wide variety of software solutions. This programming technique is indespensible in automation and testing fields.

The case of color ls(1)

The command ls -G displays files with colors on OSX and FreeBSD only when standard input is attached to a terminal. When using the subprocess module, we will not see any of these qualities:

import subprocess
print(subprocess.check_output(['ls', '-G', '/dev']))

With an explicit -G parameter, the output of this program is still colorless. This quick example shows that some programs behave differently when attached to a terminal.

Interactive

Furthermore, some programs are only interactive when attached to a terminal. The python executable is an example of this. When we run python directly from a terminal, we receive an interactive REPL:

$ python
Python 3.5.0 (default, Oct 28 2015, 21:00:27)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print(4+4)
8
>>> exit()

If we run these commands by piping them to standard input, it will not display such decorators, demonstrated here using the standard shell:

$ printf 'print(2+2)\nexit()' | python
4

And strangely enough, executing Python from Python, using the subprocess module demonstrates the same output:

import subprocess, sys
python = subprocess.Popen(
    sys.executable, stdin=subprocess.PIPE,
    stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(python.communicate(input=b"print(2+2)\nexit()"))

(b'4\n', b'')

With a keyboard attached, a terminal may be expected to provide input at any non-determinate future time. Programs such as python test whether any of the standard file descriptors (stdin, stdout, stderr) are attached to a terminal to conditionally offer this behaviour.

We can reproduce this conditional check of isatty(3) easily from shell:

$ python -c 'import sys,os;print(os.isatty(sys.stdin.fileno()))'
True

$ echo | python -c 'import sys,os;print(os.isatty(sys.stdin.fileno()))'
False

As stdin is piped, this fails the test for isatty(3) test.

Cheating isatty(3)

The remainder of this article will focus on tricking isatty(3) into returning True even when the standard descriptors are not actually terminal. This peculiar behavior begins by a call to the standard python pty.fork function. This behaves exactly as os.fork, except that a pseudo terminal (pty(4)) is wedged between the child and parent process.

Why is this useful? Let's examine some programs that make use of pty(4) and fork(2) to explain for themselves:

tmux(1) and screen(1) make use of pty(4) to perform their magic: the real terminal may leave (detach), while the child continues to believe it is connected with a terminal.
script(1) records interactive sessions, ensuring all terminal sequences are written to file typescript for analysis.
ttyrec(1) records sessions like script(1), but with timing information. This is the driving technology behind https://asciinema.org/ for example.
IPython notebook executes programs through a pty(4) for color output.
Travis CI uses a pty(4) so test runners produce colorized output.

Finally, the traditional Unix expect(1) by Don Libes uses a pty(4) to allow "programmed dialogue with interactive programs". The remainder of this article will use pexpect: a variant of expect(1) authored by Noah Spurrier

The rainmaker

The telnet host rainmaker.wunderground.com offers weather reports and other various data by major U.S. Airport codes. We can use telnet(1) and summarize our session as follows:

send return
send sjc (airport code) and return
send return
send X and return

Using pipes, we could script this using only timed input: we must provide sufficient time to elapse for the appearance of each prompt:

(sleep 2
 echo
 sleep 1
 echo sjc
 sleep 1
 echo
 sleep 1
 echo X
) | telnet rainmaker.wunderground.com

By using pexpect to wait for a prompt before sending our input, we see a markable improvement in efficiency and fault tolerance. Our script would then read as follows:

import pexpect

def main(airport_code):
    output = ''
    telnet = pexpect.spawn('telnet rainmaker.wunderground.com',
                           encoding='latin1', timeout=4)
    telnet.expect('Press Return to continue:')
    telnet.sendline('')
    telnet.expect('enter 3 letter forecast city code')
    telnet.sendline(airport_code)
    while telnet.expect(['X to exit:', 'Press Return for menu:',
                         'Selection:']) != 2:
        output += telnet.before
        telnet.sendline('')
    output += telnet.before
    telnet.sendline('X')
    telnet.expect(pexpect.EOF)
    telnet.close()
    print(output.strip())

if __name__ == '__main__':
    import sys
    main(airport_code=sys.argv[1])

Closing thoughts

A REPL is a particularly interesting target. The SageMath project uses pexpect to bundle a great variety of math software by driving the REPL interface of a variety of mathematics programs, bypassing the need to link with software of other programming languages.

Software and language suites providing a shell or REPL may be functionally tested using pexpect, and this is where the library serves its purpose best. We can now write automated tests for the python interactive shell, for example.

In many industries where technology systems migrate slowly, it may become very useful to automate commercial or blackbox software systems that provide only a shell, such as mainframes or embedded control devices. With the technique of terminal automation, we may now provide a sensible REST API to such legacy systems!

→ ←

Jeff Quast

Articles