Torvalds' quote about good programmer

238

216

Accidentally I've stumbled upon the following quote by Linus Torvalds:

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

I've thought about it for the last few days and I'm still confused (which is probably not a good sign), hence I wanted to discuss the following:

  • What interpretation of this possible/makes sense?
  • What can be applied/learned from it?

beyeran

Posted 2012-08-31T17:06:36.273

Reputation: 748

Question was closed 2014-04-21T15:14:19.773

18I think this question probably has multiple answers that are equally valid. But it's a good question anyway. I love that quote. It expresses why I don't understand programmers who worry about switching languages. It's rarely the language that matters in a program, it's the data structures and how they relate.Ryan Kinal 2012-08-31T18:18:23.370

5Maybe if you take the time making the data structures "elegant" then the code doesn't have to be convoluted to deal with these data structures? I'm probably too dumb to really know the meaning of Torvalds' quote. :}programmer 2012-08-31T18:33:06.503

2@JasonHolland That's pretty much it. Once you understand the data structures, the code is almost irrelevant. It becomes a matter of memory and/or reference. The complicated and interesting part is conceptually figuring everything out. I often solve problems and design solutions away from the keyboard.Ryan Kinal 2012-08-31T18:59:31.787

3@RyanKinal But of course the language does matter, because it makes it considerably easier to deal with and think about certain data structures. Think about all the languages that specialize in LISt Parsing, for example, or languages that have native support for data structures that have to be hacked into other languages, (sets and sparse arrays come to mind).kojiro 2012-08-31T21:22:33.207

83Torvalds is not alone in this, by the way: "Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." – Fred Brooks, The Mythical Man-Month. "Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won't usually need your code; it'll be obvious." and "Smart data structures and dumb code works a lot better than the other way around." – Eric S. Raymond, The Cathedral and The Bazaar.Jörg W Mittag 2012-09-01T02:10:48.193

1IMHO it is referring to functional aspect of programming.Sid 2012-09-01T18:23:57.010

1@kojiro Language will, of course, matter for implementation, but there are very few times when you can't express your solution in whichever language you want. Some might be more difficult, or you may have to modify your solution slightly, but it usually doesn't matter much at all.Ryan Kinal 2012-09-04T13:13:30.280

Very profound quote, and true in many dimensions. Is it smart to write the CSS before the HTML?MathAttack 2012-09-23T13:57:44.233

Does anyone have an example of a problem solved in each of the two contrasted styles, maybe a kata, which would make the idea concrete?Jonathan Hartley 2012-09-23T18:45:04.153

4This explains why the Linux kernel is a mess :)l1x 2012-09-24T05:34:36.110

1I have a friend who used to use another quote that I like even more: "Most programmers think about things in terms of how they work. Great programmers think about them in terms of how they break."Eric Burcham 2013-03-11T17:52:03.643

I would also add Dijkstra's remark that "...our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed."Dave 2013-08-22T21:22:09.360

recommended reading: Discuss this ${blog}

gnat 2014-04-14T06:56:41.253

Answers

326

It might help to consider what Torvalds said right before that:

git actually has a simple design, with stable and reasonably well-documented data structures. In fact, I'm a huge proponent of designing your code around the data, rather than the other way around, and I think it's one of the reasons git has been fairly successful […] I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important.

What he is saying is that good data structures make the code very easy to design and maintain, whereas the best code can't make up for poor data structures.

If you're wondering about the git example, a lot of version control systems change their data format relatively regularly in order to support new features. When you upgrade to get the new feature, you often have to run some sort of tool to convert the database as well.

For example, when DVCS first became popular, a lot of people couldn't figure out what about the distributed model made merges so much cleaner than centralized version control. The answer is absolutely nothing, except distributed data structures had to be much better in order to have a hope of working at all. I believe centralized merge algorithms have since caught up, but it took quite a long time because their old data structures limited the kinds of algorithms they could use, and the new data structures broke a lot of existing code.

In contrast, despite an explosion of features in git, its underlying data structures have barely changed at all. Worry about the data structures first, and your code will naturally be cleaner.

Karl Bielefeldt

Posted 2012-08-31T17:06:36.273

Reputation: 93 833

25the best code can't make up for poor data structures good gravy is that trueConrad Frix 2012-08-31T18:38:48.697

I shouldn't need to care or know about what data structures Git uses underneath really. How is he measuring success here; by how many people use Git and find it easy to use, or by how many contribute code to it?James 2012-08-31T18:58:36.827

5He's talking from the point of view of programmers making changes to git itself. The end user point of view is completely orthogonal to this discussion, other than easily maintainable code making for fewer bugs and faster feature additions.Karl Bielefeldt 2012-08-31T19:07:09.730

2@James: He's saying that the software is better (hence easier to use, and used by more people) because the data structures are better. Of course you don't need to know about the data structures of software you use, but you do care about them, indirectly, even if you don't realize it, because the data structures are what drive the things that you do realize you care about.ruakh 2012-08-31T19:09:41.810

1+1. This answer puts context on a statement that could otherwise be construed to mean something very different. Anyone who has read a 5000 line monstrosity of a file knows exactly what I mean.riwalk 2012-08-31T20:16:13.750

20

"Worry about the data structures first, and your code will naturally be cleaner.": The Roman statesman Cato (http://en.wikipedia.org/wiki/Cato_the_Elder) used to say "Rem tene, verba sequentur" = "Have the argument clear in your mind, the words will follow naturally". Same thing with programming: understand the data structures and design first, the actual code will follow by itself.

Giorgio 2012-09-23T08:37:47.653

If I am not wrong, the first versions of git where doing the sha of the content, while the newer (+2 years probably) do with the content and headers. This is a data structure change, tha broke the first versions of git.Guillermo 2012-09-23T17:08:53.990

1Which makes this kind of a single-developer version of Fred Brooks: "Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious."azernik 2012-09-23T18:47:51.327

I'd also like to add that bad data structures usually cause bad code, because over time, all that remains is the question "why was it designed this way?" and tacked on top of it are several half hearted attempts to fix the design. These attempts are also usually aborted out of fear because the "fix" inevitably comes down to "breaking" the way it currently works.Carl 2012-09-23T23:13:22.430

I have to say though, that the line between datastructure and code isn't all that clear. To code up a good data structure, you often must have good code in the first place! After all, data structure is really just more code (albeit more modular ,and perhaps reusable)Chii 2012-09-24T14:00:23.850

@ruakh: A comment like Torvalds' could have referred only to the internal maintainence of GiT, but in this case, Linus actually sees the data structures as a kind of public interface. That goes against the religion of "data hiding"; but behind it lies a deeper truth: you should expose the simplest, most stable interfaces possible. Linus claims that for GiT, that is the on-disk representation, rather than any function-call API. (The git commands themselves are an API for scripts).Adrian Ratnapala 2012-09-25T15:20:02.660

@AdrianRatnapala: Data structures as the public interface, plus data hiding (physical data independence), roughly equals the relational model of data. Its principles apply equally to GIT design and to database design.Mike Sherrill 'Cat Recall' 2012-09-27T14:43:29.273

Most media lockers (ex iTunes) exploit this fact to create artificial lock-in. They allow data in, but modify it (ex change file names) so it can't be easily extracted. In addition, most meta-data is stored in a platform-specific database format so it's only portable to the same/similar platform.Evan Plaice 2012-12-14T01:40:57.573

60

Algorithms + Data Structures = Programs

Code is just the way to express the algorithms and the data structures.

zxcdw

Posted 2012-08-31T17:06:36.273

Reputation: 4 350

Latest edition http://www.ethoberon.ethz.ch/WirthPubl/AD.pdf

dchest 2012-09-23T11:13:09.363

This is true for procedural programming; in OOP is a little bit different.m3th0dman 2012-09-24T17:21:36.040

3It's not fundamentally any different. You have data and do set of operations on it. Member variables and methods. Exactly the same thing. The whole essence of computing ever since the 50's has been built upon that very simple rule that programs consist of algorithms modifying data structures, and it keeps holding true 60 years later. You could also consider programs as functions. They take input on which they operate to produce output. Exactly as mathematical functions do.zxcdw 2012-09-24T18:10:29.760

31

This quote is very familiar to one of the rules in "The Art of Unix Programming" which is Torvalds' forte being the creator of Linux. The book is located online here

From the book is the following quote that expounds on what Torvalds is saying.

Rule of Representation: Fold knowledge into data so program logic can be stupid and robust.

Even the simplest procedural logic is hard for humans to verify, but quite complex data structures are fairly easy to model and reason about. To see this, compare the expressiveness and explanatory power of a diagram of (say) a fifty-node pointer tree with a flowchart of a fifty-line program. Or, compare an array initializer expressing a conversion table with an equivalent switch statement. The difference in transparency and clarity is dramatic. See Rob Pike's Rule 5.

Data is more tractable than program logic. It follows that where you see a choice between complexity in data structures and complexity in code, choose the former. More: in evolving a design, you should actively seek ways to shift complexity from code to data.

The Unix community did not originate this insight, but a lot of Unix code displays its influence. The C language's facility at manipulating pointers, in particular, has encouraged the use of dynamically-modified reference structures at all levels of coding from the kernel upward. Simple pointer chases in such structures frequently do duties that implementations in other languages would instead have to embody in more elaborate procedures.

Jay Atkinson

Posted 2012-08-31T17:06:36.273

Reputation: 331

I remembered this too!aitchnyu 2012-09-24T11:04:19.597

1OTOH, look at any StackOverflow question about int**. That should convince you that data is in fact NOT obvious; it only becomes so by attaching meaning to the data. And that meaning is in code.MSalters 2012-09-27T14:36:58.737

29

Code is easy, it's the logic behind the code that is complex.

If you are worrying about code that means you don't yet get that basics and are likely lost on the complex (ie data structures and their relationships).

Morons

Posted 2012-08-31T17:06:36.273

Reputation: 13 927

17Heh, I wonder if the next generation of programmers will be asking: "Morons once said Code is easy, it's the logic behind the code that is complex, what did he mean?"Yannis 2012-08-31T18:19:32.827

36@YannisRizos That will be especially confusing when people aren't sure whether it was said by people who were morons, or a single person by the name of Morons.KChaloux 2012-08-31T19:06:28.293

14

To expand on Morons' answer a bit, the idea is that understanding the particulars of the code (syntax, and to a lesser extent, structure/layout) is easy enough that we build tools that can do it. Compilers can understand all that needs to be known about code in order to turn it into a functioning program/library. But a compiler can't actually solve the problems that programmers do.

You could take the argument one step further and say "but we do have programs that generate code", but the code it generates is based on some sort of input that is almost always hand-constructed.

So, whatever route you take to get to code: be it via some sort of configuration or other input that then produces code via a tool or if you're writing it from scratch, it's not the code that matters. It's the critical thinking of all the pieces that are required to get to that code which matter. In Linus' world that's largely data structures and relationships, though in other domains it may be other pieces. But in this context, Linus is just saying "I don't care if you can write code, I care that you can understand the things that will solve the problems I'm dealing with".

Daniel DiPaolo

Posted 2012-08-31T17:06:36.273

Reputation: 271

Every programmer uses programs that generate code. They are often called "compilers", sometimes in combination with "linkers". They take a (relatively) human-readable and human-writeable input, which is usually (but not always) provided in some sort of text format, and turn it into data that the computer can understand as instructions and execute.Michael Kjörling 2012-09-27T07:54:58.533

13

Linus means this:

Show me your flowcharts [code], and conceal your tables [schema], and I shall continue to be mystified; show me your tables [schema] and I won't usually need your flowcharts [code]: they'll be obvious.

-- Fred Brooks, "The Mythical Man Month", ch 9.

Daniel Shawcross Wilkerson

Posted 2012-08-31T17:06:36.273

Reputation: 1

12

I think he's saying that the overall high-level design (data-structures and their relationships) is much more important than the implementation details (code). I think he values programmers who can design a system over those who can only focus on details of a system.

Both are important, but I would agree that it's generally much better to get the big picture and have issues with the details than the other way around. This is closely related to what I was trying to express about breaking up big functions into little ones.

GlenPeterson

Posted 2012-08-31T17:06:36.273

Reputation: 10 367

+1: I agree with you. Another aspect is that often programmers are more worried about what cool language feature they are going to use, instead of focusing on their data structures and algorithms and on how to write them down in a simple, clear way.Giorgio 2012-09-01T19:34:35.947

I also agree. The fact is that it's easy to change isolated pieces of code, but harder to change data structures or interfaces between pieces of code (as these types of changes may affect many things rather than just one thing).Brendan 2012-09-23T14:05:05.580

5

Well, I can't entirely agree, because you have to worry about all of it. And for that matter, one of the things I love about programming is the switches through different levels of abstraction and size that jump quickly from thinking about nanoseconds to thinking about months, and back again.

However, the higher things are more important.

If I've a flaw in a couple of lines of problems that causes incorrect behaviour, it probably isn't too hard to fix. If it's causing it to under-perform, it probably doesn't even matter.

If I've a flaw in the choice of data structure in a sub-system, that causes incorrect behaviour, it's a much bigger problem and harder to fix. If it's causing it to under-perform, it could be quite serious or if bearable, still appreciably less good than a rival approach.

If I've a flaw in the relationship between the most important data structures in an application, that causes incorrect behaviour, I've a massive re-design in front of me. If it's causing it to under-perform, it might be so bad that it would almost be better if it it was behaving wrong.

And it'll be what makes finding those lower-level problems difficult (fixing low-level bugs is normally easy, it's finding them that can be hard).

The low-level stuff is important, and its remaining importance is often seriously understated, but it does pale compared to the big stuff.

Jon Hanna

Posted 2012-08-31T17:06:36.273

Reputation: 1 639

2

Someone who knows code sees the "trees." But someone who understands data structures sees the "forest." Therefore a good programmer will focus more on data structures than on code.

Tom Au

Posted 2012-08-31T17:06:36.273

Reputation: 793

2But focusing on either the forest or the trees to the exclusion of the other can be detrimental, so I don't think this analogy fits.kojiro 2012-09-01T00:17:12.523

1

@kojiro: In the expression can't see the forest for the trees, it is assumed that someone who can see the forest will also see the trees (see http://en.wiktionary.org/wiki/see_the_forest_for_the_trees). Therefore I think it is a good analogyy here.

Treb 2012-09-01T04:07:33.167

2

Knowing how the data will flow is all important. Knowing flow requires that you design good data structures.

If you go back twenty years, this was one of the big selling points for the object oriented approach using either SmallTalk, C++, or Java. The big pitch -- at least with C++ because that's what I learned first -- was design the class and the methods, and then everything else would fall into place.

Linus undoubtedly was talking in broader terms, but poorly designed data structures often require extra rework of code, which can also lead to other problems.

octopusgrabbus

Posted 2012-08-31T17:06:36.273

Reputation: 409

2

What can be applied/learned from it?

If I may, my experience in the last few weeks. The preceding discussions clarified the answer to my question: "what did I learn?"

I rewrote some code and reflecting upon the results I kept seeing & saying "structure, structure..." is why there was such dramatic difference. Now I see that it was Data structure that made all the difference. And I do mean all.

  • Upon testing my original delivery, the business analyst told me it was not working. We said "add 30 days" but what we meant was "add a month" (the day in the resulting date doesn't change). Add discrete years, months, days; not 540 days for 18 months for example.

  • The fix: in the data structure replace a single integer with a class containing multiple integers, change to it's construction was limited to one method. Change the actual date arithmetic statements - all 2 of them.

The Payoff

  • The new implementation had more functionality but the algorithm code was shorter and clearly simpler.

In Fixing the code behavior/results:

  • I changed data structure, not algorithm.
  • NO control logic was touched anywhere in code.
  • No API was changed.
  • The data structure factory class did not change at all.

radarbob

Posted 2012-08-31T17:06:36.273

Reputation: 4 715

1

Can't agree more with Linus. Focusing on the data helps greatly distill a simple and flexible solution to a given problem. Git itself is a proving example -- giving so many features supported in the years of development, the core data structure largely remain unchanged. That's magic! --2c

mc2

Posted 2012-08-31T17:06:36.273

Reputation: 29

1

I like to imagine a very clever team of librarians in a beautifully made library with a million random and brilliant books, it would be quite a folly.

Tudor Watson

Posted 2012-08-31T17:06:36.273

Reputation: 21

0

I've seen this is numerous areas.

Think about business analysis... Let's say you're analyzing the best way to support Marketing at a consumer products company like Colgate. If you start with fancy windows, or the latest technology, you won't help the business nearly as much as if you think through the data needs of the business first, and then worry about presentation later. The data model outlasts the presentation software.

Consider doing a webpage. It's much better to think about what you want to show (the HTML) first, and worry about style (CSS) and scripting (pick your tool) after.

This isn't to say coding isn't important too. You need programming skills to get what you need in the end. It's that data is the foundation. A poor data model reflects either an overly complex or unthought business model.

MathAttack

Posted 2012-08-31T17:06:36.273

Reputation: 2 661

0

I find myself writing new functions and updating existing ones a lot more often than having to add new columns or tables to my database schema. This is probably true for all well designed systems. If you need to change your schema every time you need to change your code, its a clear sign you are a very bad developer.

quality of code indicator = [code changes] / [database schema changes]

"Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious." (Fred Brooks)

Joel Jacobson

Posted 2012-08-31T17:06:36.273

Reputation: 1

-2

It seems like this idea has various interpretations in the various types of programming. It holds true for systems development and also holds true for enterprise development. For example, one could argue that the sharp shift in focus toward the domain in domain-driven design is much like the focus on data structures and relationships.

eulerfx

Posted 2012-08-31T17:06:36.273

Reputation: 924

-4

Here's my interpretation of it: You use code to create data structures, so the focus should be on the latter. It's like building a bridge - you should set out to design a solid structure rather than one that looks appealing. It just so happens that well written data structures and bridges alike look good as a result of their efficient designs.

Bob

Posted 2012-08-31T17:06:36.273

Reputation: 1