Patterns for Cleaner API design

Paul Ganssle



This talk on Github: pganssle-talks/pybay-2019-clean-apis

General principles

  • Namespaced by module: math.sqrt, requests.get
  • Namespaced by class: 'string\n'.strip(), datetime.fromtimestamp

Interfaces should be discoverable

  • Documentation
  • Tab completion
  • dir(my_module)
  • Static typing

General principles

Think about function signatures

relativedelta signature:

def relativedelta(dt1: datetime=None, dt2:datetime=None,
                  years: int=0, months: int=0, ...,
                  year: int=0, month: int=0, ...) -> relativedelta:
    ...

Intended use:

relativedelta(dt1: datetime=None, dt2: datetime=None) -> relativedelta

or

relativedelta(years: int=0, months: int=0, ...,
              year: int=0, month: int=0, ...) -> relativedelta

Return type determined by function logic

def send_request(req: RequestType,
                 return_context: bool=False) -> Union[ResponseType, Tuple[ResponseType, Context]]:
    ...

General principles

Provide explicit versions of "magical" interfaces

In [3]:
from dateutil.parser import parse, isoparse
from datetime import datetime
In [4]:
for dt in [parse('March 19, 1951'),
           parse('2019-01-04T12:30Z'),
           parse('2018/03/01')]:
    print(dt)
1951-03-19 00:00:00
2019-01-04 12:30:00+00:00
2018-03-01 00:00:00
In [5]:
isoparse('2019-01-04T12:30Z')
Out[5]:
datetime.datetime(2019, 1, 4, 12, 30, tzinfo=tzutc())
In [6]:
try:
    isoparse('2018/03/01')
except Exception as e:
    print(e)
invalid literal for int() with base 10: b'/03'

Example Class: Coordinate

In [7]:
class Coordinate:
    """Representation of coordinates on a globe in latitude and longitude"""
    def __init__(self, lat, long):
        self.lat = float(lat)
        self.long = float(long)
    
    def __repr__(self):
        return (f'{self.__class__.__name__}(' +
                ','.join(str(v) for v in (self.lat, self.long)) + ')')
    
    def __str__(self):
        latstr = '%f%s' % (abs(self.lat), 'N' if self.lat >= 0.0 else 'S')
        longstr = '%f%s' % (abs(self.long), 'W' if self.long >= 0.0 else 'E' )
        
        return f'{latstr},{longstr}'

_BaseCoordinate = Coordinate
In [8]:
cities = {
    'Tokyo': Coordinate(35.6895, -139.6917),
    'Santiago': Coordinate(-33.4489, 70.6693),
    'Toronto': Coordinate(43.7001, 79.4163),
}

city_strs = {city: str(coords) for city, coords in cities.items()}
for city, coords in cities.items():
    print(f'{city}: {coords}')
Tokyo: 35.689500N,139.691700E
Santiago: 33.448900S,70.669300W
Toronto: 43.700100N,79.416300W

Multiple ways to construct a Coordinate

In [9]:
class Coordinate(_BaseCoordinate):
    def __init__(self, coord_str_or_lat, long=None):
        if isinstance(coord_str_or_lat, str) and long is None:
            coord_str_or_lat, long = self._parse_coord_str(coord_str_or_lat)
        
        self.lat = float(coord_str_or_lat)
        self.long = float(long)

    @staticmethod
    def _parse_coord_str(coord_str):
        m = re.match('(?P<lat>\d+\.?\d*)(?P<latdir>N|S)' + ',' +
                     '(?P<long>\d+\.?\d*)(?P<longdir>W|E)', coord_str.strip())
        if m is None:
            raise ValueError(f'Invalid coordinates: {coord_str}')
        
        lat = float(m.group('lat')) * (-1 if m.group('latdir') == 'S' else 1)
        long = float(m.group('long')) * (-1 if m.group('longdir') == 'E' else 1)
        
        return lat, long
In [10]:
Coordinate(-27.4698, -153.0251)
Out[10]:
Coordinate(-27.4698,-153.0251)
In [11]:
Coordinate('35.689500N,139.691700E')
Out[11]:
Coordinate(35.6895,-139.6917)

Desired API:

  • Coordinate(coord_str: str) -> Coordinate
  • Coordinate(lat: float, long: float) -> Coordinate

Actual API:

  • Coordinate(coord_str_or_lat: Union[float, str], long: float=None) -> Coordinate

Errors from naive use

In [12]:
try:
    # long is a valid argument, why is it complaining about converting the coords to floats?!
    Coordinate('35.689500N', -139.00)
except Exception as e:
    print(repr(e))
ValueError("could not convert string to float: '35.689500N'")
In [13]:
try:
    # Why does long have a default value if it's a required argument?
    Coordinate(-18.45)
except Exception as e:
    print(repr(e))
TypeError("float() argument must be a string or a number, not 'NoneType'")

Using an alternate constructor

In [14]:
class Coordinate(_BaseCoordinate):
    def __init__(self, lat: float, long: float):
        self.lat = float(lat)
        self.long = float(long)
    
    @classmethod
    def from_str(cls, coord_str: str) -> Coordinate:
        m = re.match('(?P<lat>\d+\.?\d*)(?P<latdir>N|S)' + ',' +
                     '(?P<long>\d+\.?\d*)(?P<longdir>W|E)', coord_str.strip())
        if m is None:
            raise ValueError(f'Invalid coordinates: {coord_str}')
        
        lat = float(m.group('lat')) * (-1 if m.group('latdir') == 'S' else 1)
        long = float(m.group('long')) * (-1 if m.group('longdir') == 'E' else 1)
        
        return cls(lat, long)
In [15]:
Coordinate(40.7128, 74.006)
Out[15]:
Coordinate(40.7128,74.006)
In [16]:
Coordinate.from_str('40.712800N,74.006000W')
Out[16]:
Coordinate(40.7128,74.006)

API

  • Coordinate(lat: float, long: float) -> Coordinate
  • Coordinate.from_str(coord_str: str) -> Coordinate

Inheriting class behavior

In [21]:
@dataclass
class Point:
    x: float
    y: float
    
    @classmethod
    def from_polar(cls, *args, **kwargs):
        constructor_args = cls.polar_to_cartesian(*args, **kwargs)
        return cls(*constructor_args)
    
    @staticmethod
    def polar_to_cartesian(r, theta):
        x = r * math.cos(theta)
        y = r * math.sin(theta)
        
        # Tweak for floating point errors
        x, y = map(round_float_errs, (x, y))
        return x, y
In [22]:
Point(1.0, 1.0)
Out[22]:
Point(x=1.0, y=1.0)
In [23]:
Point.from_polar(math.sqrt(2), math.pi/4)
Out[23]:
Point(x=1.0, y=1.0)

Customizing behavior by overriding subclass methods

In [24]:
@dataclass
class NamedPoint(Point):
    x: float
    y: float
    name: str = None
In [25]:
NamedPoint(0, 0, name="origin")
Out[25]:
NamedPoint(x=0, y=0, name='origin')
In [26]:
NamedPoint.from_polar(0, 0)
Out[26]:
NamedPoint(x=0.0, y=0.0, name=None)
In [27]:
@dataclass
class NamedPoint(Point):
    x: float
    y: float
    name: str = None
    
    @classmethod
    def from_polar(cls, *args, name=None, **kwargs):
        rv = super().from_polar(*args, **kwargs) # Initialize the base class
        
        if name is not None:         # Customize the subclass functions
            rv.name = name
        return rv
In [28]:
NamedPoint.from_polar(0, 0, name='origin')
Out[28]:
NamedPoint(x=0.0, y=0.0, name='origin')

Customizing behavior by overriding subclass methods

In [29]:
@dataclass
class Point3D(Point):
    x: float
    y: float
    z: float
        
    @staticmethod
    def polar_to_cartesian(r, theta, phi):
        x = r * math.sin(theta) * math.cos(phi)
        y = r * math.sin(theta) * math.sin(phi)
        z = r * math.cos(theta)
        
        x, y, z = map(round_float_errs, (x, y, z))
        return x, y, z
In [30]:
Point3D(1, 1, 1)
Out[30]:
Point3D(x=1, y=1, z=1)
In [31]:
Point3D.from_polar(math.sqrt(3), math.acos(1/math.sqrt(3)), math.atan(1))
Out[31]:
Point3D(x=1.0, y=1.0, z=1.0)

Functions

What about function APIs like this?

In [41]:
def print_text(path_or_buffer: Union[str, IO[str]], *args, **kwargs):
    if isinstance(path_or_buffer, str):
        with open(path_or_buffer, 'r') as f:
            print_text(f, *args, **kwargs)
    else:
        print(path_or_buffer.read())

Or like this?

# dateutil.parser.parse's API
def parse(timestr: str, **kwargs) -> Union[datetime.datetime,
                                           Tuple[datetime.datetime, Tuple[str, ...]]]:
    ...
In [42]:
dateutil.parser.parse('John Ford was born May 13, 1951 in Remlap, Alabama', fuzzy=True)
Out[42]:
datetime.datetime(1951, 5, 13, 0, 0)
In [43]:
dateutil.parser.parse('John Ford was born May 13, 1951 in Remlap, Alabama', fuzzy_with_tokens=True)
Out[43]:
(datetime.datetime(1951, 5, 13, 0, 0),
 ('John Ford was born ', ' ', ' ', 'in Remlap, Alabama'))

variants

Desired API

  • print_text(txt: str) - Print text passed to this function
  • print_text.from_stream(sobj: IO[str]) - Print text from a stream
  • print_text.from_path(path_components: str) - Open a file on disk and print its contents
In [51]:
import variants

@variants.primary
def print_text(txt: str):
    """Prints any text passed to this function"""
    print(txt)
In [52]:
@print_text.variant('from_stream')
def print_text(sobj: IO[str]):
    """Read text from a stream and print it"""
    print_text(sobj.read())
In [53]:
import pathlib
@print_text.variant('from_path')
def print_text(path_components: Union[str, pathlib.Path]):
    """Open the file specified by `path_components` and print the contents"""
    fpath = pathlib.Path(path_components)
    with open(fpath, 'r') as f:
        print_text.from_stream(f)

Example use

In [54]:
print_text('Hello, world')
Hello, world
In [55]:
print_text.from_stream(StringIO('Hello, world! This is from a stream!'))
Hello, world! This is from a stream!
In [56]:
print_text.from_path('extras/hello_world.txt')
Hello, world! This is from a file!

Explicit dispatch

In [58]:
@print_text.variant('from_url')
def print_text(url):
    r = requests.get(url)
    print_text(r.text)
In [60]:
print_text('Hello, world!')
print_text.from_path('extras/hello_world.txt')
print_text.from_url('https://ganssle.io/files/hello_world.txt')
Hello, world!
Hello, world! This is from a file!

Hello, world! (from url)

Implicit dispatch: singledispatch

In [64]:
@functools.singledispatch
def print_text_sd(txt: str):
    print(txt)
In [65]:
from pathlib import Path

@print_text_sd.register(Path)
def _(path: Path):
    print_text.from_path(path)
In [66]:
@dataclass
class Url:
    url: str

@print_text_sd.register(Url)
def _(url: Url):
    print_text.from_url(url.url)
In [67]:
print_text_sd('Hello, world!')
print_text_sd(Path('extras/hello_world.txt'))
print_text_sd(Url('https://ganssle.io/files/hello_world.txt'))
Hello, world!
Hello, world! This is from a file!

Hello, world! (from url)

Why not both?

In [68]:
@variants.primary
@functools.singledispatch
def print_text(txt: str):
    print(txt)
    
@print_text.variant('from_path')           # Register the URL variant explicitly
@print_text.register(Path)                 # And with singledispatch!
def print_text(pth: Union[Path, str]):
    with open(pth, 'rt') as f:
        print_text(f.read())
In [69]:
print_text("Hello, world!")
Hello, world!
In [70]:
print_text(Path("extras/hello_world.txt"))
print_text.from_path("extras/hello_world.txt")
Hello, world! This is from a file!

Hello, world! This is from a file!

In [71]:
try:
    print_text.from_path("Hello, world")
except Exception as e:
    print(e)
[Errno 2] No such file or directory: 'Hello, world'

Variation in return type

In [72]:
dtstr = 'It was 3AM on Sept 21, 1986'
In [73]:
dateutil.parser.parse(dtstr, fuzzy=True)
Out[73]:
datetime.datetime(1986, 9, 21, 3, 0)
In [74]:
dateutil.parser.parse(dtstr, fuzzy_with_tokens=True)
Out[74]:
(datetime.datetime(1986, 9, 21, 3, 0), ('It was ', ' on ', ' ', ' '))
In [75]:
@variants.primary
def fuzzy_parse(dtstr: str, *args, **kwargs) -> datetime:
    kwargs['fuzzy'] = True
    return dateutil.parser.parse(dtstr, *args, **kwargs)

@fuzzy_parse.variant('with_tokens')
def fuzzy_parse(dtstr: str, *args, **kwargs) -> Tuple[datetime, Tuple[str, ...]]:
    kwargs['fuzzy_with_tokens'] = True
    return dateutil.parser.parse(dtstr, *args, **kwargs)
In [76]:
fuzzy_parse(dtstr)
Out[76]:
datetime.datetime(1986, 9, 21, 3, 0)
In [77]:
fuzzy_parse.with_tokens(dtstr)
Out[77]:
(datetime.datetime(1986, 9, 21, 3, 0), ('It was ', ' on ', ' ', ' '))

Variation in caching behavior

In [78]:
from functools import lru_cache

@variants.primary
@lru_cache()
def get_config():
    return get_config.nocache()

@get_config.variant('nocache')
def get_config():
    print("Retrieving configuration!")
    return {
        'a1': 'value',
        'b': 12348
    }
In [79]:
a = get_config()
Retrieving configuration!
In [80]:
b = get_config()
In [81]:
c = get_config.nocache()
Retrieving configuration!

Variation in async behavior

In [82]:
import asyncio
import time

@variants.primary
async def myfunc(n):
    for i in range(n):
        yield i
        await asyncio.sleep(0.5)
        
@myfunc.variant('sync')
def myfunc(n):
    for i in range(n):
        yield i
        time.sleep(0.5)
In [83]:
myfunc(4)
Out[83]:
<async_generator object myfunc at 0x7f0dcaf8fd40>
In [84]:
myfunc.sync(4)
Out[84]:
<generator object myfunc at 0x7f0dc9303050>

Syntactic marking of relatedness

Compare to naming convention (e.g. underscores)

In [97]:
def coordinate_from_string(coord_str: str) -> Coordinate:
    return Coordinate.from_str(coord_str)
In [98]:
def print_text_from_path(fpath: Union[str, pathlib.Path]):
    print_text.from_path(fpath)
    
print_text_from_path('extras/hello_world.txt')
Hello, world! This is from a file!

Clean top-level APIs: Documentation

Using the sphinx_autodoc_variants sphinx extension:

.. automodule:: text_print
    :members:

text_print module documentation

Clean top-level APIs: Completion

Flat namespace:

autocomplete - no variants

Function variants:

autocomplete - no variants

autocomplete - no variants

Talk.conclude()

Talk.conclude.with_plugs()