A Cute Little Test for Compiler Hackers: Ensuring Codebase Consistency

Building compilers, transpilers, and interpreters involves writing numerous production and transformation rules. As the complexity of the language increases, the rules’ number could potentially reach into the hundreds. Ensuring consistency between different parts of the language toolchain becomes a challenging task. However, there is a cute little trick that can help to maintain consistency while testing: matching the parser and transpiler rules.

This article explains a Python code snippet that serves this exact purpose. The snippet is simple yet invaluable in keeping your code base clean and consistent as you hack.

The Testing Strategy Trick

    def test_transpiler_parser_rules_match(self: "TestTranspiler") -> None:
        """Test number and names of transpiler and parser rules match."""
        from jaseci.jac.parser import JacParser

        parser_func_names = [
            attr
            for attr in dir(JacParser)
            if inspect.isfunction(getattr(JacParser, attr))
        ]

        transpiler_func_names = [
            attr
            for attr in dir(JacTranspiler)
            if inspect.isfunction(getattr(JacTranspiler, attr))
            and attr.startswith("transpile_")
        ]

        parser_only_funcs = [
            x
            for x in parser_func_names
            if "transpile_" + x not in transpiler_func_names
        ]

        self.assertEqual(len(parser_func_names) - 6, len(transpiler_func_names))
        self.assertEqual(
            parser_only_funcs,
            ["errok", "error", "index_position", "line_position", "parse", "restart"],
        )

The code above checks that the names and number of rules (functions) in the parser and transpiler match exactly. The goal is to ensure that there is a one-to-one mapping between various parse rules and their corresponding transpiler rules.

Delving into the Code

The code first imports the parser module, JacParser, from the package jaseci.jac.parser. It then iterates over the attributes of the JacParser and JacTranspiler classes, gathering the names of their functions. This is done using Python’s built-in dir() function, which returns a list of attributes and methods for any object (including class types). It also uses the inspect.isfunction() function to check if the attribute is a function.

The transpiler function names are filtered further to only include those starting with “transpile_”. This helps differentiate the actual transpiler rules from other auxiliary functions.

The code then finds out the functions only found in the parser and not in the transpiler, and stores them in parser_only_funcs.

Finally, it uses the assertEqual() function twice to verify if:

The number of parser functions (excluding six specific ones) equals the number of transpiler functions.
The parser-only functions are exactly the ones specified in the list [“errok”, “error”, “index_position”, “line_position”, “parse”, “restart”].

Improving on the test.

A second, improved, code snippet is:

    def test_transpiler_parser_rules_match(self: "TestTranspiler") -> None:
        """Test number and names of transpiler and parser rules match."""
        from jaseci.jac.parser import JacParser

        parser_func_names = []
        for name, value in inspect.getmembers(JacParser):
            if (
                inspect.isfunction(value)
                and value.__qualname__.split(".")[0] == JacParser.__name__
            ):
                parser_func_names.append(name)

        transpiler_func_names = []
        for name, value in inspect.getmembers(JacTranspiler):
            if (
                name.startswith("transpile_")
                and inspect.isfunction(value)
                and value.__qualname__.split(".")[0] == JacTranspiler.__name__
            ):
                transpiler_func_names.append(name.replace("transpile_", ""))

        self.assertEqual(len(parser_func_names), len(transpiler_func_names))

The improvements between the first and the second code snippets are as follows:

Use of inspect.getmembers(): In the second snippet, inspect.getmembers() function is used, instead of dir(). The former returns all the members of an object including methods, classes, etc., while the latter returns names of the attributes of an object, including its methods. Using inspect.getmembers() is a better choice here, because it returns both the name and the value of each member, which is utilized in the following if statements.
Additional Check for Belonging to Class: The second snippet uses value.__qualname__.split(".")[0] == JacParser.__name__ (or JacTranspiler.__name__) to verify that the method actually belongs to the JacParser (or JacTranspiler) class. This helps to avoid any potential issues caused by inherited methods.
Removal of Hard Coded Values: The first code snippet contains hard-coded values such as 6 (the number subtracted from the parser function count) and the list of parser only function names. These hard-coded values make the code less flexible and harder to maintain. The improved code snippet removes these hard-coded values.
Simplification of Transpiler Function Naming Convention: In the second snippet, when appending transpiler function names to the list, name.replace("transpile_", "") is used to remove the “transpile” prefix from the function name. This simplifies the naming convention and facilitates the comparison with the parser function names.

The second code snippet improves upon the first one by using inspect.getmembers(), adding an additional check for class ownership, removing hard coded values, and simplifying the transpiler function naming convention. These changes lead to more robust and maintainable code.

Why is this Helpful?

The brilliance of this approach lies in its simplicity and utility. When your code base starts growing, and the rules or tree node types start spiraling into the hundreds, it can quickly become a challenge to ensure that every parse rule has a corresponding transpiler rule and vice versa. By regularly running this test, you can catch inconsistencies early and correct them, helping to maintain the quality and consistency of your code base.

This approach can be very beneficial to those writing programming languages, compilers, transpilers, and similar code structures. By using this strategy, you can improve the consistency of your code base, making it easier to manage and debug.

It’s a small trick, but as we all know, the devil is in the detail. And this detail might save you hours of troubleshooting and debugging in the long run.