piątek, 20 grudnia 2013

Code Kata: A simple text-based calendar

In this Kata we want to properly display a simple calendar view of a given month. Let's start with an example. A proper output of our program for the input data of "December, 2013" would be:

Mo Tu We Th Fr Sa Su
                   1
 9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31

Points to consider:
  • how many times we need to make a call to time-date functions?
  • try to minimize the number of these calls
  • try to separate program logic from presentation as much as you can

wtorek, 3 grudnia 2013

Job interview: software developer

We now have two or three generations of software developers who got their first job (or any next job) by correctly answering the question: "what will be the output of this code?". As we possibly did not have a better idea what to ask a candidate about, this might have been a satisfactory question one generation ago. Is it OK, from the point of view of the recruiting entity and the candidate, to ask the same question today? I would gladly replace the conventional list of questions:
  • what will be the output of this few lines of code?
  • what will be the size of this structure in memory?
  • what is wrong with this code snippet?
with questions more relevant to daily developer's tasks, like:
  • try to refactor this code snippet
  • look at this class; how would you add a unit test to test this function?
  • after running this program, explain why it prints this output
While the former questions test something, they are not even close to problems that a developer faces daily. On the contrary, seeing how someone attempts to test a function in a class that was not developed with TDD may give us much more insight to their programming abilities.

Ideally, these questions would be answered by using a simple compiler and editor during the interview. Developers don't program on paper in their jobs (at least no longer). If they want to learn something about the traits of the code, they lean on the compiler (for example, print size of a structure, instead of calculate it in your mind).

As you could have infer from this short opinion, I am also not in favor of automated, on-line coding tests. They are not done on paper, which is good, but they focus too much on exact answer, oftern numeric, instead of other important abilities, such as writing concise code.

czwartek, 17 października 2013

The return of native applications - but what for?

Before the web applications era we had the following (and not an easy one) situation with software:
  • some applications had only one target in terms of hardware or operating system
  • some applications had different versions so that they could be used on more than one software / hardware combination
Around the time when I started my first job (2000) it was apparent that the ultimate interface for everyone and everything would be the browser. Since then we have had web based applications serving as the primary user interface in all possible domains, from email, through project management to modeling and simulation.

But recently, we see more and more applications that are marketed as native. Although often a web version of an application exists, its platform-tied version (iOS, Android or Windows) is marketed as if the nativity alone were such a big advantage.

What can we make out of this? I don't know. But it just seems strange.

środa, 9 października 2013

Scripting paranoia

Jim is a developer in a team of nine and is better acquainted with Git than any of his colleagues. They rely on him when there is something particularly difficult to do with Git or if they have any kind of questions about Git. They feel comfortable asking him even the most basic questions, because Jim is so friendly and helpful.

Jim wants to make his teams' life easier so much that he creates a whole bunch of scripts. Over the course of time the team has a number of scripts that come in very handy when they want to perform regular, day-to-day versioning tasks:
  • branching: instead of regular Git commands -> branch.sh
  • merging: instead of regular Git commands -> merge.sh
  • tagging: instead of regular Git commands -> mktag.sh
  • pushing to remote server: instead of regular Git commands -> pushall.sh
  • viewing commit history: instead of regular Git commands -> showcommits.sh
  • and so on...
Scripts are well suited to what the team does daily. If there's the need to tweak any script a little, Jim is very willing to do so. Over the course of time, Jim's colleagues become so addicted to these scripts they cannot live without them. Unfortunately, they can no longer write proper Git commands or understand standard Git output on the console. How come they got into the mire, if they did all those things in order to become more efficient?

Obviously, it could be anything else, not neccesarily Git. My point is that any kind of computer system designed to interface with human beings, such as a version control system, presents to us its own mini-language, in the form of console commands or other, with which we can communicate with it. Its creators wanted us to communicate with it that way, it itself wants us to communicate with it that way. We should beware of encapsulating the sentences of this mini-language in scripts, because we will end up not being able to communicate with the system in any way that does not match any of our dozen of shortcuts. Not only will we be unable to issue any non-standard requests to the system, but we will also be unable to understand what it says to us.

Beware of scripting paranoia.

wtorek, 24 września 2013

Design Patterns

What is a design pattern?

Design pattern is a workaround that we use to compensate for a deficiency of a programming language.

niedziela, 22 września 2013

Projector effect

When you are in a Scrum planning session or any other kind of Scrum event or at a meeting, look closely how people behave when there is a computer screen projected on the wall.

What happens most of time is this:
  • nobody looks at others,
  • everybody look at the projected screen (or a monitor, if there is no projector),
  • people stare at the slides even though they have already read them 5 times,
  • and worst of all: people tend to turn off their regular intellectual abilities and day-dream to the subtle noise of projector's fan
If you do use computer at the meetings, specifically during Scrum events, try out this advice and stop using it. Even for a couple of meetings. I am sure you will see the difference.

The mere fact that product development is going to be done on computers and that the product will run on a computer doesn't mean that every aspect of our work must be done with a computer.

czwartek, 5 września 2013

Real-time shared editing of code in Eclipse

I came across this great video by Mustafa K. Isik.

http://vimeo.com/1195398

This was recorded a couple of years ago and I'm hoping the whole idea had developed since then.

Anyway, I searched for an Eclipse plugin that supports shared editing and I quickly learned about ECF:

http://www.eclipse.org/ecf/

However, despite well documented installation, I found no help there whatsoever to start actually using the plugin. Spending some more time trying to find this information resulted in this excellent how-to article:

http://realsearchgroup.org/SEMaterials/tutorials/docshare/docshare.htm

I really recommend seeing the video in the first link, if you haven't done so yet. And then, actually giving it a try!

poniedziałek, 5 sierpnia 2013

Code Kata: A rover

Another game-like exercise. A rover moves over a board with X and Y dimensions. Some places on the board are filled with stones and this is where the rover cannot move onto. Your task is to implement the rover and the board with the following requirements and restrictions:
  • the rover understands your commands (e.g. from command line) of the form:
    • forward - moves one step forward
    • left, right - makes 90 degrees turn left or right
  • if the rover gets the forward command and there is a stone in front of it, it just ignores it; you can optionally print a message to signal that
  • the rover does not have the knowledge of the terrain - it keeps a reference to a board [object] which actually knows where the stones are and tells the rover whether a given move is possible or not
  • the rover should not be allowed to move outside of the board
Although it will probably be fun to test this with command line, you are encouraged to use TDD as you go.

czwartek, 27 czerwca 2013

A reflection on the future of programming languages

I started my first job as a programmer in 2000, a year before I got Master's degree in Computer Science. At that time, a recurring question that was asked by students and graduates was:

Which language should I learn in order to be successful on the market?

It seems like this question is universal and has always been asked, in some cases inquiring about a programming language, another time on a library or a framework.

So, will learning Java pay off better than C++? Does it even make sense to learn C these days? Or will mastering the Java Servlet Pages position one better on the market than learning PHP?

The simple and provocative answer is: it does not matter.

The longer answer is really the rest of this post, so if you are interested, read on.

An interesting phenomenon in the history of software is that many great libraries and products have been created by individuals or, at most, tiny groups of people. Consider these examples - if any of the products seems unfamiliar, read about it on Wikipedia.

Chipmunk                 ---------          Scott Lembcke
jQuery Form Plugin       ---------          Mike Alsup
Clojure                  ---------          Rich Hickey
FileZilla                ---------          Tim Kosse
Python                   ---------          Guido van Rossum
web2py                   ---------          Massimo Di Pierro
PHP                      ---------          Rasmus Lerdorf
Emacs                    ---------          Richard Stallman and Guy Steele
Gtk                      ---------          Spencer Kimball and Peter Mattis
JUnit                    ---------          Erich Gamma and Kent Beck

We all use these daily to test, develop and deliver software. Sometimes they become part of our products. They have not initially been written by an large team or open source community and that fact is not without meaning in terms of programming language that the authors used. This statement is going to be a bit speculative on my part, but I imagine the authors have not had terrible doubts in terms of which language to choose and why. Working alone or with just two colleagues they could decide on the language based on their own preference. It was only later that developers from open source community joined these projects, also based on their preference of language and domain.

Now, speaking between us, the humble programmers, we have to face the fact that very few of us (< 90%) will be next Rossums and next Becks. We will use the excellent works of others and integrate them into a working products using languages that happen to have bindings to these libraries. Therefore, the choice of the language, in a particular situation, will depend very strongly on the availability of library bindings and then secondly on current fashion, our legacy code base and other factors. This means we must be able to write clean code in at least a handful of languages for which there is really good library choice. We have to acccept this variety and work towards mastering our coding skills, rather than towards mastering one given language.

If there is a very good, stable library already available, written, say, in C, there will likely be at least a couple of bindings created for languages other than C. Re-writing this 10-year old library from scratch in "pure X language" would most often be impractical and too costly to undertake. The library itself may well outlive some languages and the new ones that will pop up in the future will create their own bindings for it. It may even outlive the language it is itself written in, in the sense that the library is widely used, although the language disappeared from the mainstream long ago.

Of course it is not completely true that the widely used libraries are never created from scratch in more than one language. Take SNMP for example - we have at least two different implementations (not bindings), namely net-snmp written in C and snmp4j - in Java. For that matter, Java has demonstrated something that I strongly doubt we will ever witness again - the proliferation of pure Java libraries for almost everything we already had had C/C++ libraries in place. But today, these Java libraries are also being used from within newer languages like Scala or Clojure. These new languages also create their bindings, taking advantage of the JVM, instead of re-implementing all Java heritage from scratch.


With all that said, I think we should not treat the discussions like "Is Scala better than Java" or "Why C++ is not truly object oriented language" too seriously. Some languages are definitely more expressive than others and some support given programming paradigms better than others, but from the point of view of delivering what the customer needs quickly, these features are less important than the availability of libraries and frameworks.

We can hear from time to time the statement that our software industry is young, compared to other industries like automotive, for example. That's true and this is a very important observation, in my opinion. I believe in the next few tens of years, as the industry matures, some of the well-established libraries will settle for good, just like tools and techniques used to produce cars had settled in automotive industry. If we are tasked with developing some new functionality, we will know there are, let's say, two libraries and four languages that we can use exactly for that purpose. And we will use them. For a majority of customer requirements, the library choices will be as obvious, as today is mounting an ABS module in a car in order to provide the required safety level during braking.


poniedziałek, 27 maja 2013

Bugs on a board - modern concurrency in Clojure

[ Inspired by Rick Hickey's Ant Colony Simulation ]
 
Having said before that Clojure is great, I feel obliged to provide a concrete example of a concurrent Clojure program. We will implement a simulation of N bugs walking randomly on an X x Y board. If two bugs walk onto the same cell, they both vanish. Let's start!

The board has dimensions specified as follows:

(def dimx 50)
(def dimy 50)

A bug performs one step from the current cell by randomly altering its (x,y) coordinates. Each coordinate is incremented or decremented by 1, or it stays the same.

(defn random-step [n]
  (let [step (rand-int 3)]
    (cond (= step 2) (dec n)
      (= step 1) (inc n)
      (= step 0) n)))

Next location for a bug will be randomly generated, unless a bug is at a border. In that case, it immediately jumps in the opposite direction.

(defn next-loc [loc]
(let [x (loc 0) y (loc 1)
      newx (cond (= x (dec dimx)) (dec x)
                 (= x 0) (inc x)
                 :else (random-step x))
      newy (cond (= y (dec dimy)) (dec y)
                 (= y 0) (inc y)
                 :else (random-step y))]
   [newx newy]))

Unique initial locations for bugs are generated at the start using this function:

(defn initial-locs [locs numof]
(if (= numof 0) locs
  (let [x (rand-int dimx) y (rand-int dimy)]
    (if (not-any? (fn [loc] (and (= x (loc 0)) (= y (loc 1)))) locs)
      (initial-locs (conj locs [x y]) (dec numof))
      (initial-locs locs numof)))))

We will initially have 20 bugs at 20 different locations:

(def init-locs (initial-locs [] 20))

Our simulated world will be a hashtable of agents (later we will see that the bugs are agents).

(defn init-world [w agent-list]
  (if (empty? agent-list) w
    (init-world (assoc w @(first agent-list) (first agent-list)) (rest agent-list))))

We create a list of bugs (agents) and  the world:

(def bugs (map #(agent %) init-locs))
(def world (atom (init-world {} bugs)))

We need a function that updates the world, given the world, old bug location, new bug location and bug alias. The function will also detect if the moving bug ends up at a cell occupied by another bug - in this case both need to be removed from the world.

(defn upd-world [world-map old-loc new-loc agent-alias]
(cond
  (= nil (world-map old-loc)) world-map
  (or (identical? agent-alias (world-map new-loc))
      (= nil (world-map new-loc)))
    (let [foo (dissoc world-map old-loc)]
      (assoc foo new-loc agent-alias))
  :else
    (let [foo (dissoc world-map old-loc)]
      (dissoc foo new-loc))))

We also need a function that directs the crawl of a bug. It will update the world upon each step.

(defn crawl [loc]
(if (= nil (@world loc)) loc
  (do
    (Thread/sleep 50)
    (send *agent* crawl)
    (let [newloc (next-loc loc)]
      (swap! world upd-world loc newloc *agent*)
      newloc))))



Now we need some GUI stuff. Bugs will be represented as squares. Let's define bug size in pixels.

(def bugsize 5)


Import some useful GUI components:






(import [javax.swing JPanel JFrame])

(import [java.awt Color Dimension])



'Now the function for creating the frame:

(defn create-frame [panel]
(let [frame (JFrame.)]
(doto frame
  (.add panel)
  (.setVisible true)
  (.pack)
  (.show))))

And the main function for rendering the world:

(defn render [g]
(do
  (.setColor g Color/white)
  (.fillRect g 0 0 (* bugsize dimx) (* bugsize dimy))
  (.setColor g Color/black)
  (doseq [k (vec (keys @world))]
    (if (@world k) (.fillRect g (* bugsize (k 0)) (* bugsize (k 1))
      bugsize bugsize)))))

One more GUI function:

(defn create-panel []
(let [panel (proxy [JPanel] []
  (paint [g]
    (render g)))]
(doto panel
  (.setPreferredSize (Dimension. (* bugsize dimx) (* bugsize dimy))))))

And we are really close - just initialize the GUI:

(def panel (create-panel))
(def frame (create-frame panel))



Define the animation agent that will repaint the frame periodically:

(def anim (agent nil))
(defn animate [x]
(when (> (count @world) 1)
    (Thread/sleep 10)
  (send-off *agent* animate)
  (.repaint panel)
  (.validate panel)))
(send-off anim animate)




And send the crawl function to all the bugs:

(dorun (map #(send-off % crawl) bugs))

That's it! Do Copy&Paste, remove my comments, download the latest Clojure jar from clojure.org and start it:

java -jar clojure.jar

And from the Clojure REPL load the file containing the program:

(load-file 'bugs.clj')

This program is 96 lines long, including some empty newlines. This is an example of the expressive power of Clojure. It doesn't contain any locking, mutexes or semaphore handling code. All thread coordination is implemented using STM. So, comparing it to what it would take to implement the same thing in one of the more popular languages, what do you say?

piątek, 24 maja 2013

Code Kata: shopping list search

We have a list of, let's say, goods we bought at a shop (or you can imagine any other list that you like):

|        Name        |      Quantity      |       Price      |
|        Beer          |           10          |        30            |
|       Pickles        |           1           |        12            |
|     Red beans    |            2           |        22            |
...

The task is to write a piece of code that will search through the list and will display each row that matches search criteria. Too simple? Maybe. But consider these additional hints:
  • search criteria may not be exact. The string "bean" might match "Red beans"
  • we may even be searching for any of a few items: rows one and three will both match if our search criteria are "any of 'beer', 'bean' ".
I recommend that you start this kata with an exact search for one item and then gradually move towards the "like/contains" criteria and to "any of" criteria and that you observe the impact these changes will have on your unit tests (needless to say, this should all be done using TDD).

poniedziałek, 6 maja 2013

Do I really have to call super(...).__init__ in Python?

Yes. But, if you prefer a longer answer:

Class B inherits from class A. When implementing B's __init__ function, we may call A's init by writing:

super(B, self).__init__(arguments of A's __init__)

Is it really necessary? After all, in C++ or Java the constructor of base class is called automatically, so shouldn't it be called automatically in Python as well?

The point is that __init__ is not a constructor. Python objects are constructed with __new__. __init__ is used to initialize an object with concrete values, either defaults or passed as __init__'s parameters.

Need more details? Read Python docs at http://docs.python.org/2/reference/datamodel.html#object.__new__


wtorek, 30 kwietnia 2013

The mythical independency principle

We're having a hypothetical conversation with Joe, a developer in a Scrum team.

- Joe, how is your team doing in terms of unit tests?
- Very good! We have a developer in our team, Jack, who is proficient in writing good unit tests. He works with other developers to understand what they are planning to implement and then writes dozens of unit tests that they can use to validate their work.
- Wait a minute, so he's the only one in the team who writes unit tests?
- Yes. That's his specialization. Because of this, not only is he able to write really good, maintainable tests, but is also so productive that an average of five tests per day is for him a piece of cake.
- But the unit tests aren't a separate development activity, they are integral part of developing the right code!
- This is what I used read on the Internet, but the reality is what we are doing here is in accordance with the independency principle.
- Tell me more about it.
- We are avoiding the Initial Error. If one developer writes the tests and another writes the implementation we know that they compared one thing against the other. If the same person would write the test and the implementation we would risk misinterpreting the requirements and doing wrong test for wrong implementation, with the test still passing.
- You must be kidding me (...)

OK, I could go on like this for hours. Remember: do not let anybody talk you into the Initial Error or Independent Unit Testing story. Consider the following short example, as an argument against views like those Joe presents.

You've bought one of these fancy new TV sets that can be hanged on the wall. In order to mount it, you need to mount a handle onto the wall. When the handle is mounted, the rest is simple - the TV set is compatible with the handle and you just put it on the handle. The trouble is, it takes more than one screw to mount the handle to the wall. How are you going to mount the handle so that it is in perfectly horizontal position? Use tape measure, perhaps? Or even a laser device to set the screws precisely? I might be a good idea, but our old friend Joe comes along and denies you the use of a tape measure, not to mention the laser. He says:

I will bring here a friend of mine with tape measure and the laser and once you are finished with the mounting he will measure it precisely to independently see whether your understanding of the word "horizontal" is same as yours.


czwartek, 18 kwietnia 2013

Code Kata: text game from 1982

* * * W A R N I N G * * *
This Code Kata can seem very strange for people born in the nineties.

The goal is to implement a text-based game in which the player walks through dungeons in order to accomplish a task. The basic characteristics of an implementation are as follows:
  • the game generates random dungeons as a group of ROOMS and PATHS that join the ROOMS
  • the generation does not leave unreachable ROOMS
  • the user is presented with the information about the ROOM  they are in and the ROOMS reachable directly from the current ROOM
  • the user can move from the current ROOM to the directly reachable ROOMS by entering a command
This may seem very boring at first, but let's add some delighters:
  1. We do all of the implementation using "TDD, as if you meant it", of course
  2. Initially, the goal may be simply to get to a given ROOM
  3. Then, we may start adding attributes to ROOMS; a possible attribute is something that the user can collect, such as diamonds; N diamonds are placed in the dungeon and the game is finished when the user collects all of them; additional attributes can be used to describe the look a ROOM
  4. When that's done, we can refactor our implementation so that we do not use IF statements
  5. When that's done, we can refactor our implementation so that we do not use for/while loops
  6. We can work on the implementation a little bit more to get rid of excessive tabs (let's allow only up to two tabs in the body of a function, per Robert Martin's suggestion)
  7. We can get better by limiting the number of lines of a function (how about < 5 lines per function?)
  8. I'm guessing you did all of this with classes and objects; how would you do it with pure functions and no objects?

niedziela, 14 kwietnia 2013

Enjoyment of failed tests

We usually feel bad about failed tests. No surprise, as it simply means that something got wrong or that some tests have been invalidated by the recent change to the code. But can we feel good about a failed test? I think we can and here is why.

If I add a new tests for functionality that does not exist yet, it will either not compile or will fail. Watching it is not entirely upsetting experience, because the bright side of it is that we have evidence that the newly added test get at all executed. And it is good that it got executed, this is what we expected.

Another occasion when we could look at the failing tests with a smile is just after making a "very small change that should not break anything". And after this change we see a dozen of failed tests. Imagine doing this change without unit tests at all and keeping the belief that "it couldn't have really broken anything" for the next few months.

czwartek, 11 kwietnia 2013

Estimation, estimation... ehh

Steve McConnell gives an excellent overview of estimation methods in his great book "Software Estimation - Demystifying the Black Art". Probably, the most prominent part of the book is the Cone of Uncertainty.

On the other hand, we may spot from time to time the sentence "Estimation is waste". So, where's the Truth? Is it waste or not? And is estimation a fine skill to have? Here is my personal opinion:

Estimation is a Good Thing when we want to answer questions like:
  • how many months will pass until the project will have been finished? (if the unit is weeks, don't bother estimating)
  • can we do the work that is ahead of us with the staff the we have, or do we need more people / more teams in order to do it? How many do we need?
  • in order to satisfy customer needs, we can implement solution A, B or C; the solutions have different costs and different attributes in terms of scope, risk, maintenance, etc. Estimation of cost helps us with making the right choice
Estimation is unnecessary and is really pure waste when:
  • we have some work ahead of us that we need to undertake now, regardless of cost
  • we have ahead of us a chunk of a bigger effort, such as a small feature or a group of user stories and we feel this work will take us just several iterations, or less, to accomplish

The last bullet may be surprising at first, but let's honest about it: a team of N people is working on something. The team is asked for an effort estimate for several next user stories. Usually, the person asking the question imagines that the effort associated with this work is X. The development team is telling them this is twice as much or half of X. What serious decision can be made in this situation based on comparing X with 2*X? Providing any number in this case opens possibility for endless discussion. The best thing the team can do is to work on these user stories and do not waste their time arguing what the "real" effort estimate should be. Velocity is much important and tangible measure here, than effort.

wtorek, 9 kwietnia 2013

Python profiling for beginners

def bar(i):
    return i*i

def foo():
    for i in range(0,1000000):
        bar(i)


Let's try to use these two silly functions to do a profiling exercise.

First of all, we need to import cProfile. And then call: cProfile.run('foo()')

Output (from my laptop):
         1000004 function calls in 3.564 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.926    1.926    3.564    3.564 <pyshell#11>:1(foo)
  1000000    1.638    0.000    1.638    0.000 <pyshell#9>:1(bar)
        1    0.000    0.000    3.564    3.564 <string>:1(<module>)
        1    0.000    0.000    3.564    3.564 {built-in method exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


At the first glance, it may look criptic, but: the total time is the time spent in a function excluding the time spent in callees. The cummulative time is the time spent in a function including the time spent in callees. Therefore, these two times are equal for bar function. ncalls and percall are self-explanatory.

But we can go further. I highly recommend trying out a GUI application that can read the cProfile output, such as RunSnakeRun from http://www.vrplumber.com/programming/runsnakerun For this simple case it'd be useless, but for much more complex call tree, it is of great help.

In order to use it, we need to run cProfile giving it the dump file name as the second argument:


cProfile.run('foo()', 'dump.txt')

poniedziałek, 8 kwietnia 2013

Velocity calculations

Let's starts with some facts and numbers. There is five of us in the team. In the last three iterations, we completed the scope worth of 9, 13 and 12 story points. We feel comfortable commiting 10-11 story points in an iteration and we never commit more than 12.

Case #1
If Harry joins our team next iteration, will you comfortably commit 12, on the average?
No. We may be delivering 12 from then on, but the reality is we don't know. It may be 12, but odds are it will be 14 or 8.

Case #2
If Jack leaves your team next iteration, will we comfortably commit about 8?
Not necessarily. We may still be able to deliver functionality worth of 10 story points. But that's not all. We are not sure how you will react to that, but without Jack we may be able to deliver 13 or 14 pretty easily.

Case #3
So you are telling me that arithmetic does not work in Scrum?
No, it's better to put it this way: arithmetic does work in general, but it does not properly describe a complex system, such as a Scrum team working on a project.
So what is the real message behind all of this?
The fact is that our team of five great engineers delivers about 11 story points each iteration. You cannot scale it, you should not theoretize about what the velocity would be, if we switched from 3 weeks to 4 weeks' iterations. You can't know what this number would be, if two of us left our side activities and focused only on the project. You can't know what this number would be, if all of us worked overtime We can commit to delivering 10-11 points each iteration and that's the only fact.

wtorek, 26 marca 2013

SuperFib - an example of using lexical closure in Python

This story begins with simple implementation of a function that calculates Fibonacci numbers.

def fib(n):
    return 1 if n == 1 or n == 2 else fib(n-1) + fib(n-2)


Assuming that we are using 1-based indices, we can run a couple of tests and see that it works. Of course, fib(35) takes some time to calculate - about 15 seconds on my laptop.

Now, if you don't know what lexical closure is, I recommend reading about it on Wikipedia) before going further. Let's try to cache the results, but also be better than using conventional memoization - not only will we cache the final result of each call, but also intermediate results.

def fib():
    _dict = dict()
    def inner(n):

        nonlocal _dict
        if n == 1 or n == 2:
            return 1
        else:
            try:
                return _dict[n]
            except KeyError:
                _dict[n] = inner(n-1) + inner(n-2)
                return _dict[n]
    return inner


f = fib()
f(1000) returns instantly, with:
43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875

However, if I call f(1500) without calling f(1000) first, the calculation will not finish and my interpreter will crash or at least restart because of stack evaluation overflow.

But if I ran f(1000) and then f(1500) and then f(2000) and so on and so forth, advancing by 500 each time, I can very quickly get to quite far elements of Fibonacci sequence. Calculating 10 000th element is quick and easy, compared to conventional approach. We can quickly get to the point where printing the numbers to the console takes more time than actually calculating them.


piątek, 22 marca 2013

Unit Testing code with IO file operations (in Python)

We may often come across a piece of code that was written without Unit Tests at all. In addition, the piece of code may be dealing with IO like file writing and reading, which makes it more difficult to Unit Test it when we are trying to refactor and modify. Let's suppose the code in question looks like this:

def writeInitialsToFile(filename, name, surname):
    initials = name[0] + '.' + surname[0] + '.'
    with open(filename, 'w') as file:
        file.write(initials)


def readInitials(filename):
    initials = None
    with open(filename, 'r') as file:
        initials = file.readline()
    return initials


A straightforward and bad idea would be to write a couple of Unit Tests that make use of a real file and simply test the reading and writing. Is therea a better way to test this code?

First of all, we need a way to replace the real file with something else. For both reading and writing we will now have a couple of functions, one that expects a stream for reading or writing and the other that creates the stream and calls the corresponding reading or writing function:

def readInitialsFromFileStream(fileStream):
    return fileStream.readline()


def readInitialsFromFile(filename):
    initials = None
    with open(filename, 'r') as fileStream:
        initials = readInitialsFromFileStream(fileStream)
    return initials


def writeInitialsToFileStream(fileStream, name, surname):
    initials = name[0] + '.' + surname[0] + '.'
    fileStream.write(initials)


def writeInitialsToFile(filename, name, surname):
    with open(filename, 'w') as fileStream:
        writeInitialsToFileStream(fileStream, name, surname)






Now, we can test at least the pair: readInitialsFromFileStream and writeInitialsToFileStream. In order to test these functions, we don't even need to create a file in the file system - we can just pass something that has similar characteristic, but is not a real file: io.StringIO.

A quasi - Unit Test piece (without the full unittest.TestCase class, to shorten things a bit) would look like this:

testReadingOfInitialsFromFileStream:
testStream = io.StringIO()
testStream.write('T.M.')
testStream.seek(0)
assert('T.M.', readInitialsFromFileStream(testStream))

testWritingOfInitialsToFileStream:
testStream = io.StringIO()
writeInitialsToFileStream(testStream, 'Thomas', 'Mann')
testStream.seek(0)
assert('T.M.', testStream.readline())



Why is it better than using a real file? Some of the reasons (probably not all) are:
  • unit tests should work flawlessly regardless of the environment; when using real file, we may have different permissions depending on the computer and operating system where the tests are run
  • some tests (although this was not necessarily clear from the example) may require that the file has certain name or even certain content - otherwise they'll fail; by using a real file with pre-set name and content we are creating an unnecessary dependency that may impact the maintainability and, in the long run, usefullness of the tests
  • in order to avoid using a real file we had to separate the creation of a stream from writing to / reading from the stream; this forced us to make a first step towards fulfilling the Single Responsibility Principle

niedziela, 17 marca 2013

Long if-else statement without if (in Python)

Let's suppose we find ourselves in a situation where we have no other option than to write a lengthy if-else statement (definitely not something that would make cleaner by polimorphism).

def inevitable_long_if (n):
    if n == 0:
        print('Zero')
    elif n == 1:
        print('One')
    elif n == 2:
        print('Two')
    else:
        print('Manu-manu')


Can we re-write it so that we do not use IFs?

import collections
import functools

d = collections.defaultdict(lambda : functools.partial(print, 'Manu-manu'))
d[0] = functools.partial(print, 'Zero')
d[1] = functools.partial(print, 'One')
d[2] = functools.partial(print, 'Two')



Now, let's call some to see how it works:


d[0]()  ==>  Zero
d[1]()  ==>  One
d[2]()  ==>  Two
d[7]()  ==> Manu-manu

The first version is 9 lines long, the second one (without imports) is 4 lines long. To me, the second version is also much more readable, but you decide.

czwartek, 14 marca 2013

Scrum? Yes, I'll have one, please!

There is one of those soft skills training like "Conflict management" or "Team building". The trainer is a woman with psychological background. After the usual part that is delivered during every soft skills training (and, surprisingly, this part is almost always the same on every training) there is a break. The trainer asks people what they do in their job and it may well be that she is really interested in what people say, or maybe she's just pretending. Regardless, she goes like this:

- Could you tell me more about this Scrum thing you mentioned? How do you do it exactly?

One of the trainees feels he's the one who must speak up:

- Well, we meet every day at 9:00 and we talk briefly what we did yesterday and what we have to do today and whether anyone has any obstacles. This is called "stand-up". Also, the whole project is divided into a number of short periods called iterations and there is the planning at the beginning of each and (...)

The trainer is impressed:

- Wow, that sounds like a great idea! I just wonder... could that Scrum thing be used in another sort of business, not necessarily software?

And then there is the key turn of action: the trainee ponders for a while... some more... OK, the neurons have worked out the answer:

- Hmmm.... in general yes, you can. I think you could well adapt it to another domain.

- Oh my, that is so great! I have to sell this idea to the marketing team I have a training with this week!

Let's stop the story here. Scrum is a framework for producing software. Agile Manifesto has been written with software in mind. There is NO evidence that these things would effectively work in, or could be applied to, any other kind of business. Sorry, no. As professionals working in the software business we should not be deceiving people from other businesses that this is something they can use as well.

wtorek, 12 marca 2013

Pure functions in Object Oriented World

Let's suppose we have a class called Time with method convertToGMT(...). The method has one parameter called time, it converts the parameter to the GMT and returns the result.

For some reason (and please let's don't try to question this reason for the sake of this short exercise) the actual time zone of the caller is an instance variable of the Time object. It may be set when the object is constructed or re-set afterwards.

convertToGMT simply checks the time zone by accessing the instance variable timezone, performs calculation and based on that returns the GMT result.

When writing a couple of unit tests to test how convertToGMT works for different time zones, we would have to repeatedly:
  • set the time zone by calling the presumed Time.setTimeZone(...) method
  • call convertToGMT and check the result
We would do that for each time zone - time-to-be-converted pair we want to test. Clearly, this function is not a pure function, since its result depends on the state of the object. Programmers with no functional background would probably not notice the impurity. However, we could legitemately ask the question: can we make our tests look cleaner in this case?

We could add a helper function _convertToGMT (with underscore) that would take two parameters instead of one: actual time and timezone. The function would be a pure function. We could write all our test cases without meddling at all with the state of the Time object, just checking the result of calling _convertToGMT(time, timezone) each time for different pair of parameters. Then, after covering all desired test cases, we would add just one or a couple to test that convertToGMT correctly makes use of _convertToGMT by passing the instance variable timezone as the second parameter.

Of course, this is a very simple example with just one parameter and one instance variable, but in your real code you can have more interesting cases where this technique could be useful.

How is it called? I have no idea. If you have seen this already elsewhere, please let me know.

środa, 6 marca 2013

Clojure is great!

I've been playing with Clojure for a couple of months now and I want to share with you some thoughts why I think it is a great language.

Clojure is a Lisp

Many wise people have written great things on Lisp, so I'm not even trying to present arguments for it; just try out these:
  • http://www.paulgraham.com
  • http://www.gigamonkeys.com/book
Clojure integrates easily with Java libs

Everybody knows it. Nothing interesting to add here.

Clojure implements modern concurrency techniques

Clojure implements STM(http://en.wikipedia.org/wiki/Software_transactional_memory). It features Atoms (abstractions for managing state synchronously), Agents (abstractions for managing state asynchronously, by applying function to agent's state), and lock-free handling of scenarios that would have to be handled with mutexes in other programming languages.

Don't be deceived that Clojure is "nice, because it revitalised Lisp or JVM". The key point is that it features modern approach to concurrency. Read this excellent explanation by Rich Hickey:
http://clojure.org/state


poniedziałek, 4 marca 2013

Code Kata: Eating habits and Markov chains

The Wikipedia entry:
http://en.wikipedia.org/wiki/Markov_chain#Introduction
contains nice and clear explanation of the idea of Markov chains. Based on the paragraph about eating habits, we can form an interesting coding exercise:
  • A person eats lunch every day
  • We form a set of rules similar to the one presented in Wikipedia - one part of the exercise is how to represent these rules in a program. What would be the convenient way to represent this internally and how would you translate a possible textual representation (such as one entered from standard input) into the internal format? Would the two format be quite similar or quite different from each other?
  • We run a simulation for a number of days - say, 365 days - and we get statistical distribution of the number of each type of meals that would be eaten
Does it seem like a long program to write? My initial solution in Python was 23 lines long and then refined solution took just 14 lines.

czwartek, 28 lutego 2013

Recommended reading: Clean Code

A week ago I started reading Uncle Bob's book "Clean Code". Although I'm still not yet half way through the book, I want to recommend this book to anyone who has not read it yet. One of the many things that impressed me during the reading was this:

In in the book, Robert Martin makes several times the point that the bad code, such as one with long functions, many indentation levels, cryptic names and complicated conditional expressions inside those functions is simply hard to read. This may be an astonishing this to hear from a professional, at a first glance. Indeed, through our education and early professional work we become used to the notion that "real professional" is able to decypher really complicated code without a hitch. We are also often impressed by our colleagues who through years of dedicated work have become fluent in understanding of extremely bad written and uncomprehensible modules.

Martin tells us that this is not the right way to look at things. Although it is not said strictly in the book, it almost reads like: hey, look, even I, with all my years of experience and proficiency in computer languages, am not feeling comfortable when I look at a bad-written module and it takes me a lot of time to even vaguely decypher what it does and how it does it.

We should not base our professional value system on how comlicated and badly-written code one is able to comprehend.

środa, 27 lutego 2013

Code Kata: Recently opened files

Kata #2: Recently opened files
 
Implement a container that has the well known functionality of "Recently opened files" that you know from various applications you use:
  • the container must be able to list file paths that it stores
  • if file A is opened and right after that we ask the container to list the file paths it stores, the path to file A will be listed as the first one
  • the number of stored file paths is limited; when the limit is exceeded the container forgets least recently used file path
  • in the list of recently open files, each file path can appear once and only once

wtorek, 26 lutego 2013

Code Kata: Vending Machine

Kata #1: Vending Machine

Your task is to implement a piece of software for a vending machine that sells sweets. The machine accepts coins and must be capable of returning change. It is up to you to decide on how inserted coins and change that is given out are represented. Consider the following variations of this exercise:
  • Simplification: you have an unlimited number of coins of every kind
  • Realistic: the machine is loaded with coins at the beginning and you have a limit on the number of coins of every kind
  • More realistic: the machine adds the inserted coins to respective coin slots and uses them to serve further transactions
  • Even more realistic: the machine tries to optimize the way of giving out coins - if there is danger of giving out too many coins of kind A, and there are is surplus of coins of kind B, then more of B coins are used to produce change
  • Futuristic: the machine asks you to provide it with coins that would help it return the proper change
How would your implementation of vending machine react if the amount of money inserted were less than the price of the chosen chocolate snack?

poniedziałek, 25 lutego 2013

Piotr's Less Obvious Advice on Google Mock: State maintenance

Google Mock provides several ways to maintain state inside mock objects. One way of implementing state maintenance is with SaveArg. Consider the following example.

We have a class Configurator, which allows a caller to set and get values of a parameter:

class Configurator
{
    public:

    virtual ~Configurator() {}

    virtual void setParamX(int n) = 0;
    virtual int getParamX() = 0;
};


And we have a class Client that calls Configurator's methods and it also has a method incParamXBy, that can be used to increase the current value of paramX by a certain value.

class Client
{
    public:

    Client(Configurator & cfg);
    virtual ~Client() {}

    void setParamX(int n);
    void incParamXBy(int n);
    int getParamX();

    private:

    Configurator & _cfg;
};

incParamXBy internally calls setParamX and getParamX on Configurator:

void Client::incParamXBy(int n)
{
    _cfg.setParamX(_cfg.getParamX() + n);
}

Let's assume that the initial value of paramX is A and that we want to increase paramX by B each time we call incParamXBy. Our expectation is that if incParamXBy is called for the first time, it will result in calling cfg.setParamX(A+B).
  • second call of incParamXBy(B) will result in calling cfg.setPAramX(A + 2*B)
  • third call: cfg.setPAramX(A + 3*B), and so on...

Interaction between Client and Configurator in this case is such that Client relies on Configurator (issues getParamX()) in order to correctly calculate the parameter for next setParamX() call. If we want to test Client object, we need a MockConfigurator that is able to remember its current value of the parameter we are increasing:

class MockConfigurator : public Configurator
{
    public:

    int paramX;
    int * paramX_ptr;

    MockConfigurator()
    {
        paramX = 0;
        paramX_ptr = &paramX;
    }
   
    MOCK_METHOD1(setParamX, void(int n));
    MOCK_METHOD0(getParamX, int());
};

MockConfigurator has a public member paramX. We will use paramX to store current value of the parameter. We could have put paramX outside of MockConfigurator, in the test code, but I think it makes more sense to bind the parameter with the object.

And now the test:

MockConfigurator cfg;
Client client(cfg);

int inc_value = 10;

//getParamX will be called a number of times.
//If it is called, we will return the value pointed to by paramX_ptr.
//Returning with ReturnPointee is necessary, since we need to have
//the actual (updated) value each time the method is called.

EXPECT_CALL(cfg, getParamX())
    .Times(AnyNumber())
    .WillRepeatedly(ReturnPointee(cfg.paramX_ptr));

//SaveArg stores the 0th parameter of the call in the value pointed to by paramX_ptr (paramX)
//expectation 3
EXPECT_CALL(cfg, setParamX(cfg.paramX + 3*inc_value))
    .Times(1)
    .WillOnce(DoAll(SaveArg<0>(cfg.paramX_ptr), Return()));
//expectation 2
EXPECT_CALL(cfg, setParamX(cfg.paramX + 2*inc_value))
    .Times(1)
    .WillOnce(DoAll(SaveArg<0>(cfg.paramX_ptr), Return()));
//expectation 1
EXPECT_CALL(cfg, setParamX(cfg.paramX + inc_value))
    .Times(1)
    .WillOnce(DoAll(SaveArg<0>(cfg.paramX_ptr), Return()));

client.incParamXBy(inc_value); //this will match expectation 1
client.incParamXBy(inc_value);
//this will match expectation 2
client.incParamXBy(inc_value); //this will match expectation 3

Is it the only way to maintain state inside a mock object? Of course not. First of all, we could have an equivalent test without maintaining state inside out mock object at all. We could have written all method calls and all expectations using pre-calculated values that we know validate Client's behaviour. The point is that state maintenance gives us more flexibility that may be advantageous in case we want to write a number of similar tests (e.g. test the same behaviour for a range of parameter values or test different behaviours that rely accessing value previously stored in a mock).

Secondly, we can always implement state maintenance by using Invoke. Calling another function or method allows us not only to an equivalent to SaveArg, but also more complex state maintenance cases.

Piotr's Less Obvious Advice on Google Mock: Returning new objects from a mock

Google Mock provides a way to return newly created objects from a mock method. Suppose we have a 
Generator class that is supposed to generate new objects when createNewRecord method is called:

class Generator
{
    public:
    virtual ~Generator() {}
    virtual Record * createNewRecord() = 0;
};

...and suppose we want to mock this class:

class MockGenerator : public Generator
{
    public:
    MOCK_METHOD0(createNewRecord, Record * ());
};

Suppose the caller class Client has run method defined as follows:

void Client::run()
{
    for(int i = 0; i < 3; i++)
    {
        rec_tab[i] = gen.createNewRecord();
    }
}

We want the mock to return a pointer to a new object each time createNewRecord is called. This is how we can express this in the test code:

TEST(ClientTest, CanRun)
{
    MockGenerator gen;
    Client c(gen);

    EXPECT_CALL(gen, createNewRecord())
        .Times(3)
                 //this is equivalent of returning new Record(1,2,3)
        .WillOnce(ReturnNew<Record>(1,2,3))
        .WillOnce(ReturnNew<Record>(2,3,4))
        .WillOnce(ReturnNew<Record>(3,4,5));

    c.run();
}


Creating and passing new objects through a method parameter
 
But what if we want to return a new object through a method parameter? Sometimes caller passes a pointer-to-pointer to a method that is supposed to create a new object and assign its address to the method parameter. Let's modify our example above so that:
  • the caller (class Client) has a private member:
Record * rec;
  • the caller (class Client) has the following method:
void Client::getNewRec()
{
        // we are passing Record ** (pointer to pointer) to createNewRecord
        // and Generator is supposed to allocate memory
    gen.createNewRecord(&rec);
}
  • our Generator.h has the following declaration:
virtual void createNewRecord(Record ** rec) = 0;
  • which results in the following declaration in MockGenerator.h:
MOCK_METHOD1(createNewRecord, void (Record ** rec));

How can we now emulate the desired behavior in a test code? A straightforward way of doing this would be:

TEST(ClientTest, GetsNewRec)
{
    MockGenerator gen;
    Client c(gen);

    EXPECT_CALL(gen, createNewRecord(_))
        .Times(1)
        .WillOnce(Invoke(my_create));

    c.getNewRec();
}

Where my_create is a small function that does the trick:

void my_create(Record ** rec)
{
         *rec = new Record(1,2,3);
}
But isn't it awkward? The need to define this small function, the hard-coded constructor's parameters... It would  be much more convenient if we had a Google Mock action that could actually create the new object and assign its address to the pointer. Then, in our test, we could simply write:

        .WillOnce(CreateAndPass<Record>(1,2,3));

The CreateAndPass action must be smart enough to create an object of a certain type (Record), create it with the parameters that we specify (1,2,3) and do the proper assignment to the pointer. For a constructor that takes three parameters, as in the example above, our new action would look like this:

ACTION_TEMPLATE(CreateAndPass,
                HAS_1_TEMPLATE_PARAMS(typename, T), // the type of object to be created
                AND_3_VALUE_PARAMS(p0, p1, p2))         // constructor parameters
{
    // we are using TR1; assign the pointer of the newly created object to dereferenced call parameter
    // this is the only parameter of the call in this case, so it has the index of 0

  *(::std::tr1::get<0>(args)) = new T(p0, p1, p2);
}

That's it. This example can be easily extended for different constructors that take less or more parameters. Since the action is templatized, we can use it to construct objects of any class.

Piotr's Less Obvious Advice on Google Mock: Mutual method calls

Imagine that object A calls method M of object B and B is a mock object. Let's assume further that if B were real, calling the M method would result in B calling a method on A (M would call an A's method from its body). Suppose we want to simulate the same behavior with a mock object. We have a "master" and "slave" objects called Starter and Startee, respectively.

class Starter
{
 public:
  Starter(Startee & startee, int vol);
  virtual ~Starter() {}

  void start();
  void configure();

 private:
  Startee & startee_;
  int vol_;

  Starter();
};

// implementation of the above Starter methods
Starter::Starter(Startee & startee, int vol) : startee_(startee), vol_(vol)
{
}

// Starter will call run method on startee
void Starter::start()
{
  startee_.run();
}

// this method is supposed to be called by Startee as a result of calling startee_.run()
void Starter::configure()
{
  startee_.setVolume(vol_);
}

And now, Startee:

class Startee
{
 public:

  virtual ~Startee() {}

  virtual void run() = 0;
  virtual void setVolume(int volume) = 0;
};

And MockStartee:

class MockStartee : public Startee
{
 public:

  MOCK_METHOD0(run, void());
  MOCK_METHOD1(setVolume, void(int volume));
};

In the test, we will need to tell the mock object to call configure method once the Starter object calls run method on MockStartee:
TEST(StarterTest, MutualCall)
{
  MockStartee startee;
  Starter starter(startee, 10);
 
  EXPECT_CALL(startee, setVolume(10))
    .Times(1)
    .WillOnce(Return());
 
  EXPECT_CALL(startee, run())
    .Times(1)
    .WillOnce(WithoutArgs(Invoke(&starter, &Starter::configure)));

  starter.start();
}