What I Learned As An Intern For the Free Software Foundation

I’m FT, and I was one of the summer interns with the FSF’s tech team, working on LibreJS. I’ve long been very invested in free-software, and as I came to a point where school was wrapping up and I was beginning to enter the professional world, I saw this internship as a great opportunity to learn more about the discussions and work that directly goes on in the space as a whole.

The great thing about free-software is that anyone can get involved. You don’t need certifications, you don’t need super-expensive equipment, you don’t need necessarily need specialized knowledge or connections. In theory, if you have access to a device and the internet, that alone opens up the door of possibility. There’s a lot of freedom to be had in both what you contribute and how you learn; if you have an idea, you have the freedom to make it happen. In this space, emphasis is put on results, as opposed to rank or prestige.

But the flipside is that said freedom can often be daunting. For the person who’s looking to contribute there’s often not a clear path or set of steps to follow, meaning you have to figure out a lot on your own.

In school, we had programming projects, but there was always a clear, narrow, and guaranteed with a clearly defined technology stack and required skillset suitable to our knowledge. We had professors and TAs well versed in the projects who could offer solutions and a bunch of students all working on the same problem.

It’s completely different in free-software. If you wish to contribute to a project, you might have to break down the codebase even when it is insufficiently documented. If you have an idea, it might be up to you to get together the team to work on it. If there’s something you need to work on, you won’t be tackling it during your 9-to-5 but in your spare time. There’s no paycheck or boss over your shoulder to keep you motivated. The world of commercial software production happens in the office, but here, you cannot take the same for granted.

However, the FSF’s tech team is a rare example of one of those office environments in the free-software space. Because of this, I wanted to make sure I made the most of my internship, learning the necessary skills and problem-solving mindset to contribute not just for the summer season, but in my own time throughout the course of my career.

The project I was tasked with over the summer was extending LibreJS to support SPDX standard identifiers. Up until now, a large hurdle to adoption of LibreJS was the confusion surrounding how website developers were supposed to tag their licenses. The existing methods (WebLabels, full license-text, and magnet-links) proved unwieldy each in their own ways, so an acceptable standard recommendation could not really be formed. But in the years since LibreJS’ conception, SPDX has come to be a commonly accepted standard for tagging software licenses. LibreJS will now support license tags which make use of these SPDX identifiers.

The LibreJS codebase has been the product of multiple authors, each with their own style of both coding and documenting. With this post, I hope to share the more general lessons I have learned over the course of adding this functionality.

If there is a free-software project you would like to make a contribution to, ask yourself the following questions one at a time, as opposed to jumping in all at once.

1. What are you working on?

Before you can get to writing any code, you need to first get a feel for the environment you’re working in. On a general, user-end level, how does the code function? For me, I had used LibreJS in the past, so I had some idea coming in, but I had never built the extension from source. My first day would be getting this test environment set up.

Jumping into a project of any complexity, just looking at the Git repo right away will be a lot to process since you’re not just dealing with the original LibreJS code and structure, but also code from the various libraries and other components of the stack. Luckily, building the environment does a lot to give you an understanding of the tools in your toolbox.

Right away, with the actual building and loading of an XPI, that gives some idea as to the relationship between LibreJS’ source code and how it actually gets compiled and loaded into a usable extension.

In the dependencies, I see Acorn. I look up what Acorn is, I see that it is a JavaScript parser. This immediately gives me a rough idea of how LibreJS may work. It seems to be taking raw JavaScript files and parsing them in order to both analyze and operate upon them. I keep that in mind, with the idea that if I ever need to work on JavaScript parsing logic, I’ll have the Acorn docs to be able to consult.

I also create a basic website with JavaScript, with the intent to test. This helps me understand the various ways in which JavaScript can be inserted, but also the difference between trivial and non-trivial code.

2. What is your problem?

Before I can get to studying LibreJS, I first need to understand the problem itself. What are the current issues with LibreJS? What is SPDX and how does it alleviate these issues? How does SPDX’s spec definition impact the assumptions I can make when coding? How do SPDX tags get used in the real world?

Defining the parameters of the problem will give you a clear idea of cases and assumptions.

Cases are the different use scenarios you need to account for. In this context, it would be all the different acceptable, real-world ways website developers could be expected to tag and package their JavaScript. These are clearly defined, and should be easily testable on whether or not my code meets their respective criteria.

Assumptions refer to both the leeway and restrictions you need to keep in mind when writing your code. The SPDX spec tells me that all valid SPDX short-form identifiers must be on one line. This gives me leeway in that I know I as a programmer need not waste time on accounting for situations in which the identifier is split across multiple lines. But the SPDX spec also does not have an identifier for unlicensed code. So I will have to keep in mind that whatever solution I do write does not mistakenly mark unlicensed sections of code as licensed.

3. How does the code work?

This is arguably the hardest part of the process, especially if you do not have access to thorough documentation or the ability to consult the developers. I was very lucky in that I did have this sort of access: if you’re working online, try to see if you can find contact via IRC/Matrix/email/etc. It can go a long way.

Of course, you cannot take this access for granted, and where that is the case, you’ll have to break down the code yourself. In order to do this, start by identifying the code’s entry point (the main function). For LibreJS, this would be the aptly named “main_background.js”.

From here, figure out how it executes. This was tricky for me, as JavaScript does not always operate by the traditional synchronous programming I am accustomed to. LibreJS has various asynchronous functions which are structured around the JavaScript event loop. So I had to learn how the event loop works (this talk at JSConf was incredibly helpful). In what order do functions execute, and under what conditions will a certain section of code run?

After you have a general grasp on that, figure out what it is generally doing. Look at the overall logic, understanding the role of each function in the larger picture, and chart out the dependency chain in order to make sense of how each function and file is related. Take a notepad and draw out a diagram to help, it goes a long way.

4. What do I need to change?

From there, identify the functions that seem relevant to your task. When I look under checks.js (a script called within the aforementioned main_background.js) and see functions like “checkScriptSource” or “checkLicenseText” that indicates to me that those likely have to do with the actual license identification. In order to validate this, I began changing minor things in these functions and using the test environment to see what was impacted. If the behavior of the relevant component (in this case the license checker) is altered, then I know I’m on the right track.

Once you’ve identified the function in question, the next thing you need to consider is how it interacts with the rest of the system. Here, it helps to think of your function as a black box. There’s some input being passed in, and some expected output. Figure out what that input and output means, where it comes from, how it is formatted, and what operations are eventually performed on it.

Once you’ve established your function and everything surrounding it, you should be ready to actually implement your own logic.

5. How far down should I go?

Another key piece of advice I got from the tech team was when selecting a function to add your changes to, try to go as low as possible. In a program like LibreJS there’s a lot of functions calling other functions. Typically, the lower you go down this call chain, the more likely you are to find generic logic.

In LibreJS, you could have the main script with a function intended to scan the full HTML of the page for embedded scripts and then looking for licenses, but what you’ll see is that within that function is a call to another function dedicated to specifically checking for licenses in any script, embedded or external.

That function is called in multiple situations, for all types of scripts. By modifying that lower level script, you apply those changes to all cases the code checks for, as opposed to having to duplicate the same logic in multiple places.

6. What else should I know?

Some additional tips I got from the tech team throughout the internship related to how you should style your code in real-world scenarios.

When formatting your code, the priority is readability. You want to make the codebase as accessible as possible for contributors after you. After all, open projects are going to have multiple eyes on them. The following are general heuristics, not ironclad rules, so keep them in mind and weigh them appropriately with respect to your situation.

If a single function can’t fit on your screen, it’s better to break it up into multiple functions. This helps with readability, as each block of logic is clearly defined as its own function.
Typically, one would assume that more comments are better, as it means more documentation. But it’s better to write your code in a way which is self-explanatory, avoiding the need for documentation. Anything which isn’t obvious should get a comment, but if a section of code can be rewritten in a more intuitive way, it’s usually better to consider that.
Code should be written with clarity in mind, as opposed to cleverness. Just because there’s a more compact or smart way to do something doesn’t mean it should necessarily be done, especially if it serves to reduce readability.

One last tip I got from the tech team pertained to working in the free-software space in general. If you’re a professional in the tech world, as many contributors are, it can often be easy to find yourself content with your day job as your fill of coding for the week. But it’s in your spare time where you’re going to have the best opportunities to contribute to the free-software space. So consider setting aside a dedicated block of time for a couple hours once every week, where you do nothing except focus on contributing to free-software. Steadily, this should help with building both your skills and impact on the space as a whole over time.

In whole, the experience I had at the FSF was incredibly valuable, and I hope that my recounting of it provided some value to you too. These new improvements to LibreJS should make tagging significantly easier, so if you’re a website developer who hasn’t yet freed your JavaScript, there’s no better time than now!