Talk to Anders about C#
本文阅读重点 <
1 Talk to Anders about C#
1.1 Language and Design
1.2 Growing a Language
1.3 C#
1.4 The Future of Computer Science
Chapter 13 of the book Masterminds of Programming
When Microsoft settled a lawsuit from Sun Microsystems over changes to the Java programming language, they turned to veteran language designer Anders Hejlsberg to design a new object-oriented language backed by a powerful virtual machine. The result was C#—and a replacement for both Visual C and Visual Basic within the Microsoft ecosystem. Although comparisons to Java are still inevitable in syntax, implementation, and semantics, the language itself has evolved past its roots, absorbing features from functional languages such as Haskell and ML.
Language and Design
You've created and maintained several languages. You started as an implementer of Turbo Pascal; is there a natural progression from implementor to designer?
Anders Hejlsberg: I think it's a very natural progression. The first compiler I wrote was for a subset of Pascal, and then Turbo Pascal was the first almost full implementation of Pascal. But Pascal was always meant as a teaching language and lacked a bunch of pretty common features that are necessary to write real word apps. In order to be commercially viable, we immediately had to dabble in extending in a variety of ways.
It's surprising that a teaching language would be so successful in bridging the gap between teaching and commercial success.
Anders: There are many different teaching languages. If you look at Niklaus Wirth's history—Niklaus Wirth designed Pascal, later Modula and Oberon—he always valued simplicity. Teaching languages can be teaching languages because they're good at teaching a particular concept, but they're not really real other than that; or they can be full-fledged languages that truly teach you the basics of programming. That was always Pascal's intent.
There seem to be two schools of thought. Some schools—MIT, for example—start with Scheme. Other schools seem to take a "practical" focus. For a while, they taught C . Now it's Java, and some use C#. What would you do?
Anders: I've certainly always been in the more practical camp. I'm an engineer more than I'm a scientist, if you will. It's my belief that if you teach people something, teach them something they can use later for something practical.
Like always, the answer is not at the extreme. It's somewhere in between. Continually in the programming language practice, in the implementation of programming languages for the industry, we borrow from academia. Right now, we're seeing a great harvesting of ideas from functional programming which has been going on in academia for God knows how long. I think the magic here is you've got to do both.
Is your language-design philosophy to take ideas from where you can and make them practical?
Anders: Well, in a sense. I think you probably have to start with some guiding principles. Simplicity is always a good guiding principle. Also, I'm a great fan of evolving as opposed to just starting out new.
You might fall in love with one particular idea, and then in order to implement it, you go create a brand-new language that's great at this new thing. Then, by the way, the 90% that every language must have, it kind of sucks at. There's just so much of that, whereas if you can evolve an existing language—for example, with C# most recently we've really evolved it a lot toward functional programming—it's all gravy at that point, I feel. You have a huge user base that gets to just pick up on this stuff. There's a bit of a complexity tax, but it is certainly much less than having to learn a whole new language and a whole new execution environment in order to pick up a particular style of programming.
It's hard to draw the line between a language per se and its ecosystem.
Anders: Well, yeah, and certainly these days more and more. The language used to dominate your learning curve, if you go back say 20, 30 years. Learning a programming environment was all about learning the language. Then the language had a little runtime library. The OS had maybe a few things, if you could even get to the OS. Now you look at these gigantic frameworks that we have like .NET or Java, and these programming environments are so dominated by the sheer size of the framework APIs that the language itself is almost an afterthought. It's not entirely true, but it's certainly much more about the environment than it is about the language and its syntax.
Does that make the job of the library designer more important?
Anders: The platform designer's job becomes very important because where you really get maximum leverage here is if you can ensure longevity of the platform and the ability to implement multiple different languages on top of the platform, which is something that we've always put a lot of value in. .NET is engineered from the beginning as a multilanguage platform, and you see it now hosting all sorts of different languages on it—static languages, dynamic languages, functional languages, declarative languages like XAML, and what have you. Yet, underneath it all is the same framework, the same APIs, and the leverage there is just tremendous. If these were all autonomous silos, you'd just die a slow death in interop and resource consumption.
Do you favor a polyglot virtual machine in general?
Anders: I think it has to be that way. The way I look at it is, you go back to the good old 8-bit days, where you had 64K of memory. It was all about filling those 64K, and that happened pretty quickly. It's not like you were going to build systems for years there.
You could implement for a month or two and then that was that; 640K, maybe six months and you'd filled it up. Now it's basically a bottomless pit. Users demand more and more, and there's no way we can rewrite it all. It's about leveraging and making things that exist interoperate. Otherwise, you're just forever in this treadmill, just trying to do the basics.
If you can put a common substrate below it all and get much higher degree of interoperability and efficiencies out of shared system services, then it's the way to go. Take interoperability between managed code and unmanaged code, for example. There are all sorts of challenges there. But better we solve it than every distinct programming environment trying to solve it. The most challenging kinds of apps to build are these hybrid apps where half of the app is managed and the other half is unmanaged, and you have garbage collection on one side of the fence and none on the other.
There seems to be a design goal in the JVM never to break backward compatibility with earlier versions of the bytecode. That limits certain design decisions they can make. They can make a design decision at the language level, but in the actual implementation of generics, for example, they have to do type erasure.
Anders: You know what? I think their design goal wasn't just to be backward compatible. You could add new bytecodes and still be backward compatible. Their design goal was to not do anything to the bytecode, to the VM at all. That is very different. Effectively, the design goal was no evolution. That totally limits you. In .NET, we had the backward compatibility design goal, so we added new capabilities, new metadata information. A few new instructions, new libraries, and so forth, but every .NET 1.0 API continued to run on .NET 2.0.
It's always puzzled me that they chose that path. I can understand how that gets you there right now on what's there, but if you look at the history of this industry, it's all about evolving. The minute you stop evolving, you've signed your own death sentence. It's just a matter of time.
Our choice to do reified generics versus erasure is one that I am supremely comfortable with, and it is paying off in spades. All of the work we did with LINQ would simply not be possible, I would argue, without reified generics. All of the dynamic stuff that we do in ASP.NET, all of the dynamic code generation we do in practically every product that we ship so deeply benefits from the fact that generics are truly represented at runtime and that there is symmetry between the compile time and runtime environment. That is just so important.
One of the criticisms of Delphi was that there was a strong reluctance to break code, which informed some language decisions.
Anders: Let's step back then. When you say break code, that must first of all mean that you're talking about an evolution of something. You're talking about a version N 1 of something. You could argue that sometimes it's good to break code, but by and large, when you sum it up, I've never been able to justify breakage. The only argument I hear for breakage, because they're not really good arguments, is "It's cleaner that way" or "It's architecturally more sound" or "It'll prepare us better for the future" or whatever. I go, "Well, you know, platforms live maybe 10, 15 years and then they cave in under their own weight, one way or the other."
They become more or less legacy, maybe 20 years. At that point, there's enough new around them and enough new without any overhead. If you're going to break it, then break it good. Break everything. Get to the very front of the line. Don't like move up a couple of slots. That's pointless.
That sounds like a game of leapfrog where the turns take 5 or 10 years.
Anders: You either play leapfrog or you be super cognizant of backward compatibility, and you bring your entire community with you every time.
Managed code does that to some degree. You can use your existing components in process.
Anders: Certainly from the inception of .NET we have remained backward compatible at every release. We fix some bugs that caused some code to break, but I mean there has to be some definition by which it is okay to break people's code.
In the name of security or in the name of correct program behavior or whatever, yes, we will sometimes break, but it is rare, and generally it reveals a design error in the user's program or something that they're actually glad to have fixed because they weren't aware that that was a problem. It's good at that point, but gratuitous breakage in the name of more beautiful code or whatever, I think it is a mistake. I've done that enough in my early years to know that that just gets you nowhere with your customers.
It's hard to make the argument from just good taste.
Anders: Yeah. Well, sorry. My good taste is not your good taste.
If you look back on the languages you were involved in, from Turbo Pascal through Delphi, J , Cool, and C#, are there themes in your work? I can listen to early Mozart and then to his Requiem, and say, "Those are both distinctly Mozart."
Anders: Everything is a picture of the time that you're in. I've grown up with object orientation and whatever. Certainly ever since the middle of Turbo Pascal up until now everything I've worked on has at the core been an object-oriented language. A lot of evolution happened there that has carried forward. In Delphi, we did a bunch of work on a more component-oriented programming model, with properties and events and so forth.
That carried forward into the work that I've done with C#, and certainly that's recognizable. I try to always keep a finger on the pulse of the community and try to be there with the relevant new. Well, Turbo Pascal was the innovative development environment, and Delphi was the visual programming—RAD. C# and .NET has all been about managed execution environments, type safety, and so forth. You learn from all of the stuff that's around you, be it in your ecosystem or competitive ecosystems. You really try to distill what is good about those, and what didn't work for them. In this business, we all stand on the shoulders of giants. It's fascinating actually how slowly programming languages evolve when you compare to the evolution that we've seen in hardware. It is astounding.
Since Smalltalk-80, we've had between 15 or 20 generations of hardware!
Anders: One every 18 months practically, and yet, there's not really a massive difference between the programming language we use today and those that were conceived, say, 30 years ago.
They're still arguing over old concepts such as higher-order functions in Java. That's probably going to be a 10-year debate.
Anders: Which is unfortunate, because I think they could move a bit faster on that one. I don't think there's really a question of whether it's valuable. It's more a question of whether there's too much process and overhead in the Java community to get it done.
If going to a continuation passing style and exposing "call with current continuation" at the language level gives you a huge advantage, would you do that, even if only 10% of programmers might ever understand it?
Anders: If, yes—but that's a big if. I don't think that that's the case, but look at what we did with LINQ. I truly believe that that will benefit the vast majority of our C# programmers. The ability to write more declarative styles of queries and have a uniformly applicable query language across different domains of data, it's super valuable. It's like the Holy Grail language and database integration in some ways. We may have not solved the entire problem there, but I think we made sufficient progress that it justifies the extra learning, and there are ways you can expose that to people without having them figure out the lambda calculus from first principles.
I think it's a great example of a practical application of functional programming. You can happily use it and never even know that you're doing functional programming, or that there are functional programming principles powering it underneath. I'm very happy with where we ended up on that one.
You used the word "practical." How do you decide which features to add and which features to exclude? What are your criteria for deciding what to add and what to keep out?
Anders: I don't know. Over time, you get a knack for telling whether this is going to benefit enough of your users to merit the conceptual baggage that it creates, right? Trust me, we see lots of interesting proposals from our user base of, "Oh, if we could only do this," or "I'd love to do that," but often it's too narrowly focused on solving one particular problem and adds little value as an abstract concept.
Certainly the best languages are designed by small groups of people, or single individuals.
Is there a difference between language design and library design?
Anders: Very much so. The APIs are obviously much more domain-specific than languages, and languages really are a level of abstraction above APIs if you will. Languages put in place the framework, the quarks and the atoms and the molecules, if you will, of API design. They dictate how you put together the APIs but not what the APIs do.
In that sense, I think there's a big difference. This actually gets me back to what I wanted to talk about before. Whenever we look at adding a new feature to the language, I always try to make it applicable in more than one domain. The hallmark of a good language feature is that you can use it in more than just one way.
Again, I'll use LINQ as an example here. If you break down the work we did with LINQ, it's actually about six or seven language features like extension methods and lambdas and type inference and so forth. You can then put them together and create a new kind of API. In particular, you can create these query engines implemented as APIs if you will, but the language features themselves are quite useful for all sorts of other things. People are using extension methods for all sorts of other interesting stuff. Local variable type inference is a very nice feature to have, and so forth.
We could've probably shipped something like LINQ much quicker if we said, "Let's just jam SQL in there or something that is totally SQL Server-specific, and we'll just talk to a database and then we'll have it," but it's not general enough to merit existence in a general-purpose programming language. You very quickly then become a domain-specific programming language, and you live and die by that domain.
You turn your nice 3GL into a 4GL, which is a general-purpose death.
Anders: Yeah. I'm very cognizant of that. Now one of the big things we're looking at is concurrency. Everybody's looking at concurrency because they have to. It's not a question of want to; it's a question of have to. Again, in the concurrency domain we could have the language dictate a particular model for concurrency—but it would be the wrong thing to do. We have to step above it and find what are the capabilities that are lacking in the language that would enable people to implement great libraries for concurrency and great programming models for concurrency. We somehow need treatment in the language to give us better state isolation. We need function purity. We need immutability as core concepts. If you can add those as core concepts, then we can leave it to the OS and framework designers to experiment with different models of concurrency because lo and behold, they all need these things. Then we don't have to guess at who will be the winner. Rather we can coast by when one blows up and it turns out that the other one was more successful.
We're still relevant.
It sounds like you want to give people tools to build great things, rather than dictating the kinds of things they're going to build.
Anders: I want to. You get much better leverage of community innovation that way.
Where do you see that in the C# community? Do people bring code to you? Do you go visit customers? Do you have your MVPs trolling newsgroups and user groups?
Anders: It's a mixture of all of the above plus some more. We have code-sharing things like Codeplex. There are all sorts of communities. There's commercial communities. There's open source. There's lots of open source .NET code. It's from all over. I don't think there is a single point of influx, so to speak. It's a varied and complex ecosystem out there.
You always run across stuff where you go, "Wow, how did they come up with this?" or "That's amazing." You can appreciate how much work this was for someone to do. It might not be commercially viable, but boy, it's a beautiful piece of work.
I certainly try to follow lots of blogs that are relevant to C# and LINQ.
Those are some of my favorite keywords when I go blog trolling, just to see what's happening out there. It gives you good insight in whether people are picking up on the work that you've done in the right way or not. It teaches you something for the future.
Growing a Language
How do you recognize simplicity?
Anders: There's true simplicity and then there's this thing that I call simplexity, which I see a lot of. It is when you first build something super complex and then you go, "Wow, people will never get this. This is way too complicated but we have to have all this power in here. Let's try to build a simple system on top of it. Let's just try to like wrap it all up in a simple interface."
Then, the minute you have to do something that isn't quite what the system was designed to do, boom! You fall into this big morass of complexity underneath because all you were looking at was just a thin veneer on top of something that's very complicated as opposed to something that is truly simple all the way down. I don't know if this makes a lot of sense to you, but I tend to think of it like that. Simplicity often just means that you're doing more with less. There's just less there, but it does the same as something else or it even does more than something else. It's all about do more with less. It's not about doing more with more with a simple layer on top.
Would you follow this principle if you were to create a new programming language today?
Anders: Oh, certainly. I've created lots of programming languages by now or certainly lots of implementations. I think it's very important before you embark on creating a new language you have to be very, very clear about why you're doing it and what is the problem that you want to solve.
Often the mistake people make with new programming languages is that they get enamored with a particular problem they want to solve. Maybe the programming language is the right place to solve it, and so they set about solving that part of the problem and maybe they do a wonderful job of that. Then every programming language—and I mean every programming language—consists of 10% new and 90% stuff that is just bread and butter for programming and that just has to be there. A lot of these new innovative solutions that we see in new programming languages are great at the 10% new, but then they are terrible at the 90% that every language must do in order for you to really be able to write programs, and therefore they fail.
It's very, very important to understand that there's a bunch of boring standard stuff that has to be in every programming language. If you don't get that right, you will fail. Conversely it means that if instead of creating a new programming language, you can evolve an existing programming language, then the equation looks very different because then you already have the 90% covered. In fact you have 100% covered. You're just trying to add the new thing.
Like C .
Anders: Like C , which was a great example of an evolution of C or of the different versions of C# that we've done and so forth. I'm very much a believer in evolving. Then, of course, there comes a time when you just can't stuff more in there—there's so much tension between the new things you add and the old way of doing it in the language that you just can't move it anymore. Creating a new language is really more of an exception to the rule than it is the rule.
Would you create a general-purpose language or a domain-specific language?
Anders: I think the real answer there is "neither." How I would address that problem is I would create a general-purpose programming language that is great at creating domain-specific languages. Again, the devil that we face with all of these domain-specific languages is that they may get the domain right but then they get the general-purposeness wrong. There are certain general-purpose features that literally every domain-specific language ends up needing. Unless the domain-specific language is purely just a data definition language where you're just stating data, and at that point in my opinion you might as well use XML then.
If you're really a programming language where there's logic or predicates or rules or whatever, then you have to have expressions and expressions have operators and maybe you have to have standard functions and your customers are going to want to do things that you never even thought of. There's just a bunch of standard stuff that you need. If you can instead create your domain-specific language out of a base that is a general-purpose programming language, then I think you're much better off than starting out fresh every time.
One of the things that is problematic with general-purpose programming languages today is they're getting better at creating internal DSLs, and you could view LINQ as an example of that. But what they're not good at currently is capturing the correct usage patterns of those internal DSLs. In some ways, when you create internal DSLs you actually want to limit the things that you can do with the general-purpose programming language. You want to be able to shut off the general-purposeness of the language, and you want to only reveal it in certain spots in your DSL. That's one thing that general-purpose programming languages are not very good at right now. That might be something that would be useful to look at.
Brian Kernighan said that if you want to create a general-purpose language, you should start from the beginning with that goal in mind. Otherwise, if you create a little language, as soon as people start using it, they are going to ask to add features to it. Growing a DSL generally doesn't work very well.
Anders: Oh yeah. I think Gosling said that every configuration file ends up being its own programming language. It's very true, and you want to be real careful about that.
You said that in some ways the platform is more important than the language. Are we going to produce reusable components?
Anders: Well, the reason I said that is if you look at the evolution over the last 25, 30 years of languages, tools, and frameworks, it's quite remarkable how little programming languages have changed. It's equally remarkable how much larger our frameworks and run times have gotten. They're probably three orders of magnitude larger today than they were, say, 25, 30 years ago. When I started with Turbo Pascal, there were like maybe 100, 150 standard functions in the runtime library and that was that. Now we have the .NET Framework with 10,000 types with a combined 100,000 members. Obviously leveraging all of that work is increasingly important. It's important because it shapes the way we think about problems, but the framework is getting increasingly important because it is the stuff that we leverage in our programs.
Leverage is everything today. Your computer is, from a programming perspective, basically a bottomless pit. You could write code from now until the day you die and you would never fill it up. There's so much capacity, and end user expectations keep going up and up and up. The only way you really succeed is by finding smart ways to leverage work that has already been done. That wasn't the case if you go back 25, 30 years ago. You had 64k of memory, well, gee, that would fill up in a month or two.
How much does the language influence the programmer's productivity, and how much is it the ability of the programmer that makes the difference?
Anders: I think the two go hand in hand. I think the language tends to influence the way we think. The programmer's job is to do the thinking, if you will. That's the raw material, the raw power that goes into the process. The language is the thing that shapes your thinking—its function is really to help you think in productive ways. That's how, for example, languages with object-oriented support cause you to think about a problem in a certain way. Functional languages cause you to think about the problem in another way. Dynamic languages might cause you to think about it in a third way. They're different hats you put on that cause you to think differently. Sometimes it's useful to try and put both hats on and approach it from various viewpoints.
Would you prefer adding a language feature that make everyone a bit more productive or one that makes just a few developers much more productive?
Anders: For a general-purpose programming language, it's not a good idea to add features that only help a few because you end up being a grab bag of strange things. The hallmark of any good language feature is that it has many good uses, not just one good use. If you look at all of the stuff we added to the language in C# 3.0, all of the stuff that collectively forms this concept called language-integrated query or LINQ, that actually breaks down to about six or seven discrete language features that in and of themselves have many good uses. They don't benefit just one particular programmer. They're at a more abstract level than that. For every good language feature, you have to be able to show how it's going to benefit you in several scenarios or else it may not be right for the language. It might be better to just have that be an API feature.
Do you consider which features to add or remove to make debugging easier? Do you consider the debugging experience during the design process of the language?
Anders: Oh, absolutely. If you look at the whole underpinning of C#, the language is a type-safe language, which means there is no such thing as an array overrun or a stray pointer. Everything has well-defined behavior. There is no such thing as undefined behavior in C#. Error handling is done with exceptions as opposed to return codes that you could ignore and so forth. So each of those underpinnings like type safety, memory safety, and exception handling all help tremendously in eliminating whole classes of bugs or making whole classes of bugs much easier to find. That's something we think about all the time.
How do you try to prevent these recurrent problems without limiting the developers? How do you choose between safety and freedom for the developer?
Anders: I think each language puts itself somewhere on the spectrum of power versus productivity, if you will. C# is definitely a much safer and more protected environment than C , which in turn is safer and more protective than if you're writing assembly code. The general trend for programming languages throughout their entire history really has been for us to keep moving the level of abstraction up and to make the program environment safer, if you will, or put more and more of the housekeeping that programmers have to do in the hands of the machines and allow programmers to focus on the creative part of the process, which really is where they add value. Programmers are terrible at doing memory management as a rule. They're terrible at doing type-safety analysis; therefore, we have bugs.
To the extent that we can put that in the hands of the machine instead and have the programmers do the creative thinking, that, I think, is a good tradeoff. It costs just a little bit of performance but boy, it's not all that much. Today in a typical .NET application, if you profile a program execution and look at where the program spent its time, garbage collection rarely even shows up. Yet your program is safe and will have no memory leaks. That's a wonderful tradeoff. That's just fantastic compared to the kinds of stuff we had to deal with in manually memory-managed systems like C or C.
Could we use a scientific approach in the way we design and grow a language? I can see improvements given by research results in the implementation, but language design sounds like a matter of the designer's personal preferences.
Anders: I think programming language design is an interesting combination of art and science. There's clearly a lot of science in it, mathematical formalism in notation for parsing and semantics and type systems, and what have you, code generation, blah, blah, blah. There's lots of science. There's lots of engineering.
Then there's the art of it. What does the language feel like? What is this process that happens in your head when you program in this language versus the other language and how is it different? What's going to be easier for people to understand and what's going to be hard for people to understand? I don't think we'll ever be able to measure those.
It'll never be scientific. It will always be an angle of language design that is purely an art. Just like there are good paintings and bad paintings and you can sort of scientifically talk about, "Well, the composition wasn't done right. Maybe he didn't use the right kind of paint." But ultimately it's in the eye of the beholder. There's just something about it that you cannot formalize.
Do you think that the fact that you speak at least two languages in some ways might help?
Sometimes in Italian I can describe with one word a concept that in English requires a sentence, and obviously sometimes the reverse happens.
Anders: I don't know. That's a good question. I never thought of that. Possibly. I certainly think that to be a good language designer you have to understand multiple programming languages, no doubt about it. Whether it helps to understand multiple spoken languages, I I don't know. Could very well be the two are connected. On the design team we definitely have people that speak many languages or there are people that are good at music. They do seem to be connected somehow, but I'm not quite sure how.
C#
How long is the future of C#? You've been on it for almost 10 years.
Anders: C# the project started in late December of '98, so we're coming up on our 10-year anniversary. That's not 10 years of existing in the industry, but it's 10 years since inception internally. I'd say we've got another 10 years at least, but it all depends. I've said I've long given up predicting the far-off future of this industry because no one ever gets it right anyway. But I certainly see a strong healthy future for C#. We're not done innovating, and there's plenty of work still to do.
When I look at the evolution of C# from an application domain standpoint, I see the desire to replace C as a systems programming language.
Anders: It can be used for that, but there are a lot of uses for which a managed execution environment like .NET or Java is more appropriate.
When I compare C# to Java, C# seems to have a stronger push toward evolution. The Java people seem to want a baseline where everyone's code looks more or less the same. Whether you've programmed Java for a decade, never programmed before, or just graduated from a six-month course on Java, all of your code will look the same. C# seems to pull in new ideas from Haskell or from F#. Is there a desire to add new features that people who've just finished the six-month C# course haven't seen and won't immediately understand?
Anders: I am not in this to engineer the next COBOL; let's just put it that way.
What is it that powers the Internet revolution and the electronic revolution that we've seen? It's the fact that we're constantly evolving. I bring it back to that. The minute you stop evolving, I don't know that you're adding any value. This is, again, taking it to the extreme. Of course, there is value in stability of the platform, but I think you provide that value by ensuring backward compatibility. You are free to get off the bus at C# 1.0 and just not move any further. For those people that really want to be more productive and want to build newer kinds of apps like SOA or whatever and get into more dynamic styles of programming—adaptable programs and more declarative styles of programming like we're doing with LINQ—then you've got to evolve or get out of the way, or something else will replace you.
Do you get feedback regarding the C# language, not just the implementation?
Anders: We get feedback every day on the language in many different ways. It could be people mail me. I read people's blogs, I read forums where people ask technical questions, go to conferences—all sorts of ways that we get feedback daily on what works and what doesn't in the language. We take that feedback back to the design team and we maintain a long laundry list of all of the crazy ideas. Some of them will never make it into the language, but we keep them on the list because there's something there that maybe someday we'll get a good idea around this area. We know that we don't have it right yet, but there's a desire to do something.
Then gradually we find solutions to problems. Some of them are just simple things that people ask and that we just go do. Others are things that are bigger problems that people never really said anything about like LINQ. It's not like someone ever asked us, "We'd love to have queries built into the language," because you don't really think about the notion that you could.
I wouldn't say that there's one particular way we get the feedback. It's a very organic process and we get it from many different places. Certainly there's no way we could design the language without all this feedback, so it's all based on listening to what people do with the product.
How do you manage the design team? How do you make decisions?
Anders: First of all, when you get feedback from customers, very often customers will tell you, "We would really like for you if you could add this particular feature." As you dig, it turns out that, oh, they're trying to do such and such, and typically people will tell you what they think is the solution to the problem. Now it is your job to discover what their real problem is, and then try to fit that into the bigger framework of the language. In a sense the first part of getting feedback is to do a little bit of detective work and understand what's really behind this solution that the customer is asking for. What is their true problem?
Then I think in terms of deciding what to do about it. As you evolve a language, you always have to be careful about just willy-nilly adding a bunch of features to a language, because the more features you add the more you age the language. Eventually the language just caves in under its own weight. There's too much stuff—too many conflicting things.
You have to be very, very judicious about what you add because you really don't want to end up with three different ways of doing the same thing that all there only for historical reasons.
So there are many times where we go, "Yeah, if we could start over we would definitely include this feature that people are asking for right now." Since we can't start over, we're not going to do it because it's fundamental or foundational enough that we can't fundamentally change the nature of the beast by peppering on. We can only make it a dual-headed beast and we don't want that.
In terms of the design process itself, we have a very strong C# design team consisting typically of between six and eight people who meet regularly. We have met regularly for the past 10 years from 1:00 to 3:00 every Monday, Wednesday, Friday afternoon. Some of the meetings get cancelled, but that is a slot that we have all had on our calendars for 10 years and it continues to be there. The people who are in the process have changed. I've been there throughout. Scott Wiltamuth has as well pretty much. Other people have come and gone, but the process has existed for that long.
We use this as our design function. That is where we do our ongoing design work. In order to have continuity in a product it's very important to have design as a continuously going process. Very often people will do stuff in spurts, "Oh, it's time to do the next version. Let's have some meetings and decide what it's going to be." Then you have a bunch of meetings and then people go away and you don't do any design work for a year. By the time a year's gone by and it's time for the next version,
you can't even get the same people together anymore. You end up with this sort of schizophrenic product where every release feels different. If you keep the design ongoing, there's almost like a personality of the product that you keep alive.
Also good ideas don't happen on a schedule. They just happen. If you don't have a process to capture the good idea, if you're not designing right now, well then maybe that idea is lost. We're always doing continuous discussion of the next version that we're about to ship and the one after that, in an ongoing fashion. I think that works really well.
C# has an ECMA standardization process, which is rare for languages. What was the motivation for that?
Anders: Standardization for many people is a requirement for adoption. There are certainly places—not so much businesses—but if you look at government, standardization is actually a requirement. Academic as well. It actually has interesting benefits, for Microsoft to standardize. Whenever we build a technology like .NET, there will invariably be implementations of that technology built by third parties for other platforms, and you can then choose to have them randomly try to replicate what you've created and get it wrong, and then have a poor experience there. That means also a poor experience for those customers that, by and large, rely on yours but need this other implementation for legacy hardware they have or whatever.
When you sum it all up, it actually makes sense to do it, even from a business standpoint. It also works as a great forcing function in being very precise about what it is you're building which has lots of advantages internally. The fact that we standardized C# meant that we had to write a very concise specification of the language. That very precise specification of the language—that investment—has come back to us manyfold already just from an internal standpoint.
In terms of having better test frameworks from our QA department, having better vehicles in research for implementing new language features, because prototype compilers, it's entirely clear
what they're supposed to do. For teachability of the language, the fact that there is a very concise specification means that people can always consult that as a reference as opposed to just guessing.
It helps us in ensuring that code remains backward compatible. So anyway, lots of benefits there that you might think immediately are not there, but in reality they are. By going through a standardization process, you get the eyes of a very savvy community on your product. We've gotten lots of feedback from the other companies and individuals involved in the standardization process and that made C# a better language. That's valuable, too. I'm not sure that these organizations and individuals would've taken an interest if it wasn't because we standardized.
Standardization lags behind language evolution, though.
Anders: Right. Standardization does to some extent slow you down. It sort of depends on how you word it. Some standards are worded as, "You must implement this and nothing else, and it is a violation of the standard to have extensions to what we specify here." I have never much believed in that. Standards are supposed to establish a common baseline, and arguably, also a way to ensure that you are adhering to the baseline and not overstepping it. But standards should definitely leave freedom for innovation in them because that is how you're going to produce v2 of the standard—by picking up some of those innovations. You can't outlaw it.
For C#, there's a standard, but that standard has not kept us from evolving. Rather a process of evolution happens outside the standards process, because you're not going to get innovation out of a standards community. That's not its purpose. Whatever framework you're operating in must allow for that innovation to occur elsewhere.
What is your opinion on the formal aspect of the design of the language? Some people suggest that you should start with a formal specification on a piece of paper and then write the code. Some people just ignore the formal specification totally.
Anders: The answer is rarely at the extreme. I think languages with no formal specification at all
generally tend to be a mess. Languages where you first formally specify everything and then implement the compiler as an afterthought also tend to be not very pleasant to use. The way we developed C# is we would in parallel write the compiler and the language specification, and the two deeply influenced each other. We would run into issues in writing the compiler that we would have to go back and address in the specification. Or in writing the specification and trying to rigorously analyze all the possibilities we would find stuff that, "Whoa. Maybe we should just try to do this differently in the compiler because there's these other cases that we didn't think of."
I think both are important. I'm happy that we did the standardization work because it forced us to be very explicit about what the language is and how it works. Then it forced us to have a formal specification, which, like you said, some languages don't; that is just not a good thing. When the source code is the specification that means that in order for you to understand what's going to happen in this particular program, you have to go look at a source code for the compiler. Not a lot of people are capable of doing that. Your only other alternative is to guess or to write tests and see what happens and hopefully you caught all the corner cases. I don't think that's the right way to do it.
By the way, how do you debug your C# code?
Anders: My primary debugging tool is Console.Writeline
. To be honest I think that's true of a lot ofprogrammers. For the more complicated cases, I'll use a debugger because I need to look at a stack trace or what happened to my locals here or whatever. But quite often you can quickly get to the bottom of it just with some simple little probes.
Do you follow any principles to design an API?
Anders: Well, first of all I would say keep them simple, but what does that mean? I mean that sounds stupid, right? I am a great believer in APIs that have as few methods as possible and as few classes as possible. Some people believe more is better. I'm not one of those. I think it's important to look at what is it that you think people will typically be doing with your API. Find the typical five scenarios that they're going to be doing and then make sure that the API makes that as easy as possible. Ideally that it's just one call to the API. It shouldn't be that in order to do the typical scenario you have to write many lines of code against the API. At that point it is not at the right level of abstraction.
However, I also think it's important in APIs to always leave an out. You want to flow from the very simple use of the API and seamlessly move into the more advanced uses if you need to. A lot of APIs have sort of this step function. Yes, there's some simple methods you can call, but then the minute you want to do something that's a little more advanced then, boom, then you fall off a cliff. Now you have to learn about all these other things that you didn't care about in order to do the little more advanced stuff. I'm very much a believer in sort of more of a gradual easing into it.
What about documentation?
Anders: The state of documentation in software in general is terrible. I always urge programmers and try to advocate internally as well, and I'm not always successful, but I tell programmers half the value you deliver to customers is good documentation for your APIs. A great API is worthless without documentation that explains what it does and how it's supposed to be used. It's a tough one. Lots of companies like to have the programmers write the code and the documentation people write the documentation, and the two never talk. You end up with documentation that just says "MoveWidget moves the widget," or states the obvious in as many words as possible. That's a shame. I think programmers should write more documentation than they do.
Do you like the idea of comments inside the code or you were thinking of some external document?
Anders: I've always been an advocate of having XML documentation comments in the code. If you put it in code, then chances are the programmer who's working on it will notice that whatever it says up there in that documentation comment isn't right. Maybe he'll go fix it. If you put it in a different file somewhere, then the programmer will never look at it, and so it'll never be correct.
It's all about trying to bring the two as close together as possible. It's not perfect by any means, but we try.
What do you suggest to become a better C# programmer?
Anders: It's hard. There are many good books out there on C# programming and I would encourage people to pick up one of the better books. I'm not going to start naming names here, but there are many good books out there that will help you become a better C# programmer and help you better understand the .NET Framework. There are many things available online that also help. There are things like Codeplex. There's a bunch of open source projects that you can grab and look at and learn from and so forth.
To become a better programmer in general, one of the things that have helped me is to look at different styles of programming and different kinds of programming languages. I have learned in the last 5, 10 years a lot from looking at functional programming, which is a very different way of programming, but it teaches you a bunch of things. It's obviously about programming, but it comes at it from a different angle, and that gives you a different viewpoint on problems that I find to be very, very useful.
The Future of Computer Science
What do you consider the outstanding problems in computer science?
Anders: If you look even at a metalevel above that, the ongoing trend in language evolution has always been this constant increase of abstraction level. If you go all the way back to plugboards and machine code and then symbolic assemblers and then C and then C and now managed execution environments and so forth, each step we moved a level of abstraction up. The key challenge always is to look for the next level of abstraction.
There are several contenders right now that appear to be big challenges. One we already talked about is concurrency: producing meaningful programming models for concurrency that are understandable by the broad masses of programmers as opposed to a few high priests of parallelism. That's kind of the world we're in right now. Even the high priests at times get surprised by their own code today. We have a big challenge there.
Also there's a lot of talk about domain-specific languages and metaprogramming in the community these days. In my opinion, there's more talk than reality there. I don't think we know what the answers are. You see things like aspect-oriented programming and intentional programming, but we have yet to really nail it.
Depending on who you ask, people go either, "Oh, there are no domain-specific languages," or "Oh, domain-specific languages are everywhere." We can't even agree on what a domain-specific language is, in a sense—but there's clearly a there there when it comes to devising more declarative ways of expressing yourself. In some ways, we've run out the line all the way on imperative styles of programming. We're not going to get much further with our imperative programming languages. It's not like there are new statements that we could add that would make you 10 times more productive all of the sudden.
I think one of the things that are true of most programming languages today is that they force you to overspecify the solution to your problem. You're down there writing nested for
loops and if
statements and whatever, and really all you wanted to do was a join between two pieces of data. But there's nothing that allows you to say that. You have to get down and dirty and do hash tables and dictionaries, blah, blah, blah.
The question is how do we move to that more declarative level of programming. Of course, the more you move it up, the more concepts you end up having because you get more domain-specific. There's a lot of truism to this dream of domain-specific languages, yet we somehow have not found the right vehicle to implement them, I feel like. Yet. So that does remain a challenge.
Right now, we're seeing this interesting resurgence of dynamic programming languages. I actually feel it is really in spite of the languages being dynamic and more because they have great metaprogramming capabilities. Like if you look at Ruby on Rails, it's all powered by Ruby's metaprogramming capabilities, not so much the fact that it's dynamic. It just so happens that eval and metaprogramming are a lot easier in a dynamic language than in a static language.
On the other hand, it's a high price to pay to give up your statement completion and your compile-time error checking and so forth.
The argument I've seen from lots of people who've been around dynamic languages for a while is Smalltalk's browser.
Anders: I'm not sure I buy that. That works when your problem is small enough, and it used to be that problems were small enough back when Smalltalk first appeared. Now with the size of the frameworks, it is unfathomable to think that people actually know all of the APIs that exist on a particular object or even care to know. Tools like statement completion and Intellisense and refactoring driven by compile-time metadata or static typing are just invaluable. They're going to continue to be more valuable because the world is going to keep getting more complex. Right now we're seeing a surge of dynamic programming languages, but I think in large part it is powered 1) by the metaprogramming angle, and 2) it's in many ways just a reaction to the complexity of the J2EE environment.
I've certainly seen lots of Java programmers defect to Ruby just because they're dying in frameworks and Struts and Spring and Hibernate and what have you. Unless you are a grand wizard of technology, you're not going to be able to put all of these things together yourself.
Should programming be more accessible to people who aren't and have no aspiration ever to be grand wizards?
Anders: I think so. It all depends on what you mean by programming also. Because in a sense, is using a spreadsheet programming? If you can make people program without them even realizing they're programming, then, oh my God, that's wonderful. I harbor no aspirations that we've got to teach the world users how to write programs in the kinds of programming environments that we use as developers today. Certainly, programming, yes, but at a much higher level.
What's facing us now and in five years?
Anders: Concurrency is the big one right now. That thing is right in our face, and we've got to find solutions to that problem. One of my biggest challenges in the foreseeable future is having our team work that issue.
Again, we'd like to do it in an evolutionary fashion, but how do you deal with the shared state problem and side effects without breaking all the existing code? We don't know yet, but it very well may be that that concurrency is a big enough paradigm change that whole new languages are needed or whole new frameworks are needed. Although I don't think we're at that point yet.
I think there's a lot of ground that we can gain from making it possible for people to write APIs that internally are massively parallel and written by people that really understand a particular domain, be it transformations or numeric processing or signal processing or bitmaps or image manipulation.
And yet, put APIs on it that look largely synchronous from the outside and in a sense, wall off the concurrency to inside the APIs.
There are things that are required in our programming languages today in order for us to be able to do that properly. One of them we already have, which is the ability to pass code as parameters. As APIs get more and more complex in their capabilities, you can't just pass in flat values or data structures to the API. You've got to pass in pieces of code that the API then orchestrates and executes.
You need higher-order functions and abstractions such as map, fold, and reduce.
Anders: Higher-order functions. Exactly. In order to be able to do that, you need stuff like lambdas and closures and so on. In order for you to be able to do that in a concurrent environment, you also need guarantees on whether these lambdas are pure, or do they have side effects. Could I just automatically run this in parallel, or are there side effects that would cause that not to be possible. How can I know that? Those are things that we don't have in our languages today, but we can certainly speculate about adding these. Of course, the trick is adding them in a way that doesn't constrain you too much and that doesn't break too much of your existing code. That's a big challenge.
That is something our team thinks about daily.
Does the need for concurrency change the implementation or also the design of the language?
Anders: Oh, it certainly changes the design. A lot of people have harbored hope that one could just have a slash parallel switch on the compiler and you would just say, "Compile it for parallel" and then it would run faster and automatically be parallel. That's just never going to happen. People have tried and it really doesn't work with the kind of imperative programming styles that we do in mainstream languages like C and C# and Java. Those languages are very hard to parallelize automatically because people rely heavily on side effects in their programs.
You need to do several things. You need to first of all construct modern APIs for concurrency that are at a higher level than threads and locks and monitors and where we're at now.
Then there are certain things you need from the language to make that style of programming easier and safer, like guaranteed immutability of objects, pure functions where you know there are no side effects, analysis around isolation of object graphs so you know whether a particular reference to an object graph has ever been shared with anybody else, and if it hasn't, then you can safely mutate it, but if it has then there might be side effects. Things like that; things of that nature where the compiler can do some analysis and help provide safeties, like we have type safety today and memory safety and so forth.
Those are some of the things that I think need to happen in the next 5 or 10 years in order for us to better be able to program in these concurrent systems.
Essentially you are telling the computer what to do.
Anders: That's one of the big problems with the very imperative style of programming that we do now is that it is indeed very overspecified. And that's why it's hard to automatically parallelize.
In the future, might we let the framework do the work to deal with concurrency?
Anders: Oh, I think so. There are many different kinds of concurrency, but if you're talking about data-parallel kinds of concurrency where you're going to do operations on large datasets like image manipulation or voice recognition or numerical processing, then I think it's very likely or very appropriate for us to have a model where you just view it as an API. You have a higher-level API where you can say to the API, "Here's the data and here are the operations I want to have applied. You go away and do it and do it as quick as you can given the number of CPUs that are available."
It's interesting there because today it's pretty easy for you to just say, "Here's the data." You can just give it a reference to some big array or some object or whatever. Specifying what the operations are would typically involve you giving references to pieces of code, if you will delegates or lambdas, and it sure would be nice if the compiler could analyze and guarantee that these lambdas have no side effects and warn you if they do. That's part of what I'm talking about, but that's just one kind of concurrency. There are other kinds of concurrency for more asynchronous distributed systems, which is a different kind of concurrency where we could also benefit from support in the programming languages. If you look at a language like Erlang, which is used in very highly scalable distributed systems. They have a very, very different model of programming that's much more functional and that's based on asynchronous agents and message passing and so forth. There are some interesting things that I think we could all learn from also in our languages.
Does the object-oriented paradigm create problems?
Anders: You know, it depends on what you group under the object-oriented paradigm. Polymorphism and encapsulation and inheritance are as such not a problem, although functional languages typically have a different view of how you do polymorphism with their algebraic data types. Aside from that, I think the biggest problem typically with object-oriented programming is that people do their object-oriented programming in a very imperative manner where objects encapsulate mutable state and you call methods or send messages to objects that cause them to modify themselves unbeknownst to other people that are referencing these objects. Now you end up with side effects that surprise you that you can't analyze.
In that sense object-oriented programming is a problem, but you could do object-oriented programming with immutable objects. Then you wouldn't have these same problems. That's kind of what functional programming languages are doing, for example.
Regarding your interest in functional programming, should computer science students study more math and experiment more with functional programming?
Anders: Well, I certainly think that it is important to include functional programming in any computer science curricula. Whether you start with it that depends. I'm not sure that your very first introduction to programming should be functional programming, but I definitely think that it ought to be part of a curriculum.
What lessons should people learn from your experience?
Anders: Well, if you look at the first product I worked on, Turbo Pascal, it was very much about not believing the traditional way of doing things. Don't be afraid. Just because people tell you it can't be done, that doesn't necessarily mean that it can't be done. It just means that they can't do it. I think it's always fun to think outside of the box and try to find new solutions to existing problems.
I think simplicity is always a winner. If you can find a simpler solution to something—that has certainly for me been a guiding principle. Always try to make it simpler.
I think to be really good at something, you have to be passionate about it too. That's something you can't learn. That's just something that you have, I think. I got into programming not because I wanted to make lots of money or because someone told me to. I got into it because I just got totally absorbed by it. You just could not stop me. I had to write programs. It was the only thing I wanted to do. I was very, very passionate about it.
You have to have that passion to get really good at something, because that makes you put in the hours, and the hours are the real key. You need to put in a lot of work.
给本文打分 post