The Grammar of Code: A Framework Inspired by Linguistics
Designing a Framework to Unify Parsing and Interpretation: Part 1 of a Two-Part Journey
What if the language we use to program computers could speak the language of human thought?
Imagine coding not just as a technical skill, but as an art form—one where the ability to convey a single, precise meaning through a few lines of code mirrors the elegance of a well-crafted sentence.
The Rise of LLMs
With the rise of Large Language Models (LLMs) like GPT, the ability to communicate with machines in natural language, particularly English, has become a crucial skill. This new frontier of `prompt engineering`—crafting precise instructions in natural language to guide these models—is rapidly evolving. Just as coding in traditional programming languages requires logic and structure, so does instructing an LLM. The difference is that now, the syntax is English, and the challenge lies in harnessing the nuances of language to achieve the desired outcomes.
Here's the Catch...
But what about programming in the traditional sense? Can we extend this paradigm beyond natural language processing to create a deterministic framework where programming isn't just about writing code, but about crafting precise, meaningful interactions with the machine? Imagine a world where coding is as much about linguistic precision as it is about technical knowledge.
You might argue that this is already what code does, but why hasn't code evolved beyond just syntax? It's always bugged me that while instructing machines is well-established, the way we communicate with them often remains cryptic when it could—and should—be meaningful. I understand that the kind of machine, its capabilities, the toolchain we use, and the paradigms we follow add layers of complexity, but this is no different from the nuances we navigate in natural language.
Hmmm, what I’m trying to ask is—what exactly are those layers of complexity that are stopping us from programming machines in a way that feels like a natural extension of human language and thought, just like English, for example?
To be more precise, just like a contract drawn up by a lawyer, the language used is hardly open for interpretation due to the way in which it’s written.
The Proposal
I think of programming as a form of rhetoric—a functional or practical one, and the simplest form of rhetoric or persuasion is to instruct.
I'm proposing a framework that offers a unified and streamlined approach for programming language designers to define every and each parsing and interpretation rule of its instructions. This includes rules such as:
Variable or block scope resolution
Conditional execution
Loop interpretation
Function evaluation, whether eager or lazy
Type casting
Exception handling interpretation
Etc.
In this framework, syntax and semantics are handled using the same set of primitives. This approach enhances the expressiveness and effectiveness of programming by aligning it more closely with the way we naturally think and communicate.
To achieve this, I’ll lay out the linguistic foundation of this proposal in the following section, but before that, let me touch on why this matters to me.
Motivation
I’ve always been fond of languages. I remember when I first got my floppy disk (you know, those square things with stickers on them that millennials will remember!). It was from Sony if I recall correctly, and I spent hours trying to decipher the mysterious Japanese characters on its label, convinced they held some secret code.
Even though I haven’t had the chance to learn Japanese yet (maybe one day!), I do speak a few languages and have been active in the software engineering industry for over eight years now. Programming has actually made me quite articulate in life. However, as Dijkstra once said:
The tools we use have a profound and devious influence on our thinking habits, and therefore on our thinking abilities.
Linguistic Foundation
At its core, language is a system of symbols and rules used to convey meaning. In linguistics, there’s a common view that grammatical particles—such as prepositions, conjunctions, articles, and other function words—are fundamental units in constructing meaning. These particles, whether free or bound morphemes like "and" or "but," or bound morphemes like the suffix "-s" or the prefix "un-," serve as the building blocks that structure our expressions and clarify our intentions.
I’ve come to believe that, just as letters combine to form words, grammatical particles are the letters of semantics. They give structure and meaning to our thoughts, enabling us to convey complex ideas with precision and nuance.
On the other hand, content words are abstractions of meaning, where each word encapsulates a meaning that might be either primitive or complex. They carry the main semantic load of a sentence, representing the "what" (nouns, verbs, etc.) in language. For example, when I say "Go to school," "Go" is a primitive content word that directly conveys an action, while "School" is a more complex content word that can be fully explained only by using a combination of particles and other content words in a sentence.
But let’s be honest—we come from different linguistic backgrounds. A grammatical particle in English might be an abstraction or content word in Korean. Therefore, we need to establish a set of rules and heuristics to define what constitutes a particle versus a content word, which will be part of the framework’s goal in the context of programming.
The Approach
Before we get technical, I want to outline the objectives we aim to achieve within the context of this proposed framework. Our objectives are to:
Define a set of rules and heuristics to distinguish between grammatical particles and content words linguistically.
Establish a core set of grammatical particles.
Define a set of primitive content words that act as abstractions for instructions, entities, and data types, which are necessary for an arbitrary programming language to function.
Build a toy programming language on top of this framework as a proof of concept.
Incorporate querying features into the toy programming language as declarative operations.
Before we delve deeper, I would like to expand on what I call the relativity of paradigms.
I believe that instructions are inherently imperative in nature. Let's be clear: imperativeness represents the simplest form of instruction. Whether a programming paradigm is seen as imperative or declarative depends on the perspective of the agents involved. For example, in low-level programming, such as assembly language, instructions like MOV a, b
are likely viewed as imperative from the developer's perspective. However, it is seen as declarative from the machine's perspective because it specifies what should happen—namely, the movement of data from one location to another. The 'how' of this operation (the underlying physical-level operation) is abstracted away from the command itself.
Another example can be found in high-level programming languages, where a SQL query like SELECT * FROM users WHERE age > 30
is often considered declarative because it specifies what data to retrieve without describing how to retrieve it. However, from the perspective of the database engine, this query can be broken down into a series of imperative steps that the engine must execute to fetch the desired data. Thus, the query is declarative from the developer's perspective but imperative from the database engine's perspective.
In the proposed framework, the toy programming language can utilize declarative commands (content words) that the framework then breaks down into primitive content words and particles within a clause, allowing for precise execution while maintaining a higher level of abstraction for the developer.
That being said, in the following section, I will briefly touch on how the proposed framework is intended to work, providing some initial examples. A more concrete direction will be discussed in the follow-up post (Part 2).
I would like to note that the success of this approach hinges on the accuracy, universality, and comprehensive coverage of the core particles and primitive content words we select. With the right foundation in place, I believe this framework has the potential to fundamentally change how we think about and engage with programming languages.
How It Works?
Imagine a program file being transpiled into a list of clauses, where each clause represents a fundamental unit of logic or instruction, akin to sentences in natural language. Each clause is managed by a particle that defines its body’s structure (signature) and meaning. For illustration purposes, we can represent this structure using YAML, which includes explanatory comments. Please review the file here.
Now, let’s take two samples to set the tone, and illustrate how this framework could be applied in practice.
Python Function
def main():
print("Hello world!")
Note: You can view the complete YAML breakdown of this Python function here.
SQL Query
SELECT user_id, username
FROM users
WHERE last_active_at >= CURDATE();
Note: The full YAML representation of this SQL query can be explored here.
Please note that the grammatical particles and content words in the examples are preliminary and subject to change as the framework evolves :)
The Big Picture
Clauses are processed by particles that machines or compilers handle and optimize differently. These particles, akin to grammatical elements in natural language, guide the structure and flow of the program, determining how each clause interacts with others.
This framework is not just about adding another layer of complexity; it’s about merging toolchains in a way that transforms programming into a form of communication as intuitive and expressive as natural language. For example, take `SELECT` in an SQL query—it functions like a transitive verb, taking multiple objects as inputs and invoking other particle clauses. This approach allows us to rethink and redesign the way we code, making the process more natural and aligned with human thought. Additionally, we could design programming languages that adhere to different syntax but share the same underlying interpretation.
In the next part of this series, I will dive deeper into the technical aspects of this framework, exploring more examples and discussing how it can be implemented in real-world scenarios.
This journey is just beginning, whether you’re a language enthusiast, a software developer, or both, I would love to hear your thoughts on this. How do you see this framework evolving? What challenges and opportunities do you foresee? Reach out!