Thursday, December 31. 2009
I was doing some programming today (no, really?) and had need of a data structure that would return a value based on the key falling within a given range. Kind of like a dict, but each key in the dict would be two values, between which the querying key would fall. Thus was born BetweenDict. It's short and sweet, and to the point. And works for what I need.
class BetweenDict(dict): def __init__(self, d = {}): for k,v in d.items(): self[k] = v
def __getitem__(self, key): for k, v in self.items(): if k[0] <= key < k[1]: return v raise KeyError("Key '%s' is not between any values in the BetweenDict" % key)
def __setitem__(self, key, value): try: if len(key) == 2: if key[0] < key[1]: dict.__setitem__(self, (key[0], key[1]), value) else: raise RuntimeError('First element of a BetweenDict key ' 'must be strictly less than the ' 'second element') else: raise ValueError('Key of a BetweenDict must be an iterable ' 'with length two') except TypeError: raise TypeError('Key of a BetweenDict must be an iterable ' 'with length two')
def __contains__(self, key): try: return bool(self[key]) or True except KeyError: return False
Wednesday, December 23. 2009
Started a comment on this post but it got a little long.
So, I follow Planet Python and have seen Greg Wilson's posts on the Basie project.
Basie is a web-based software project forge that integrates revision control, issue tracking, mailing lists, wikis, status dashboards, and other tools that developers need to work effectively in teams.
Basie uses Django and jQuery among other technologies to make a leaner, meaner, multi-project "forge."
I've read up a bit on Basie. Modern framework! jQuery! Multiple projects! Python! Only Subversion! What? Nearly every other* project tracking system out there (even Trac's shoe-horned support) has support for alternate VCS's: Git, Darcs, Bzr, Mercurial, etc. Are you serious? With countless hours invested, and probably 30+ people having worked on the code base, couldn't "multi VCS" have been a requirement from the get-go? Granted, I've never tried it, but as Trac hacks, and Redmine, have proven, it can be done.
I think it's great that we have a project tracking system that is based on modern web framework technologies, but I really think Basie is going to be at a disadvantage out of the gate because it does not support (granted, minority, but growing) VCS's. I hope they are able to rectify this soon: I'd love to see Basie grow into a viable competitor to Redmine. Side note: I love Redmine, but a system in Python allows us to use our intellectual investment to contribute to our VCS; there is very little Ruby knowledge in our shop, so it's harder to give back.
Something else that caught my eye on the Basie site:
Why Build Another Forge?
Because none of the others meet our needs.... Redmine provides many of the features we want, but is still immature;
Can it be clarified when this was written? We currently run Redmine 0.8.6 (with 0.9 around the corner). It has been VERY stable, very able, and everything we need in a PTS. Other than using Python, is there any reason you did not simply choose to invest the time of 30+ people in an existing PTS instead of starting your own from scratch?
Again, this isn't a rant, or to put down the project: I think it's great, and I hope it gets traction in the PTS world. It's just a design decision and your rationale of an "immature Redmine" that got me curious.
*"Every other" meaning "Every other that I am aware of." I'm sure there are plenty of PTS's that I'm not aware of.
Wednesday, September 23. 2009
I am using Python to do some data file processing, converting data from a horrendously verbose, repetitive format to a nice, clean, CSV format. The date and time are in two different fields, and the date is in MM/DD/YYYY format, plus, the MM and DD might be one or two characters. That is, January is 1, not 01.
I am converting the timestamp to ISO format, so I was using time.strptime to extract the date/time and time.strftime to generate to proper ISO formatted date, like so:
return time.strftime("%Y-%m-%d %H:%M:%S", time.strptime(ts, "%m/%d/%Y %H:%M:%S"))
On the smallest of my data files, the processing was taking 13 to 15 seconds. I profiled it, and found that in a 13 second run, strptime was taking 8.755 seconds of that, and it was calling _getlang(), _parse_localename(), and the like very time.
So, thought I, regexes are pretty efficient, I wonder if that would reduce the run time any.
ts_re = re.compile('^(\d{1,2})/(\d{1,2})/(\d{4}) (\d{2}:\d{2}:\d{2})') m = ts_re.match(ts).groups() return ("%s-%02d-%02d %s") % (m[2], int(m[0]), int(m[1]), m[3])
(The re.compile() call is at the module level, outside the function, so it is only run once.)
My overall run time dropped to about 5 seconds, a little over 1/3 of the time it took previously. My convert_timestamp() function, which previously had consumed nearly 10 seconds, was only taking about 1.3 seconds now.
Sometimes regexes are the answer.
Wednesday, May 6. 2009
So today I discovered that my first ever contribution to an Open Source project was accepted. Before I was let go from WordStream, my boss was having me work on a feature addition to Buildbot.
I was allowed to contribute that addition, under my own name, and today that change was merged in. It comprised these commits: 1, 2, 3
Needless to say, I'm pretty jazzed!
Friday, November 14. 2008
I recently had the task of building Python Eggs on Windows that had C extensions. I did the usual googling, and found a few HOWTOs, but nothing I could find was very concise and straightforward. So, I present to you a very concise, very straightforward guide.
Setting up the Build Environment
We are assuming all installers are allowed to install to their default locations.
Download and install Python 2.5 and/or Python 2.4 from python.org. You may have to use Python 2.4.4 since that was the latest 2.4 series to have an installer at time of writing. Whichever you want associated with .py files, install last.
Download and install SetupTools for Python (both 2.4 and 2.5) here.
MingW files are here. Download and run the latest version of "Automated MinGW Installer." You only need to install g++, and maybe not even that. When it prompts you for old/current/preview version of the MinGW system, select current.
Download and run the current version of "MSYS Base System."
Open up the MinGW shell, and execute these commands:
cd / mkdir mingw mkdir code #Convenience, if you want to mount your code's dir at an "easy" spot echo "c:/MinGW /mingw c:/path/to/code /code" > /etc/fstab
Create the file c:\PythonNN\Lib\distutils\distutils.cfg and put the following in it, where NN is replaced with 24 or 25, depending on version::
[build]
compiler = mingw32
Do this for each version installed.
Building the Eggs
MinGW mounts your drives on the root directory, so your C: drive will be at /c, D: drive at /d, and so on. Open up a MinGW shell, and:
cd /code # or /c/path/to/code, if you didn't mount /code # Depending on which package you're building, you'll either use setup.py # or extended_setup.py. I'm using setup.py here as a placeholder. Replace as appropriate. # Python 2.4: /c/python24/python setup.py build_static # if needed /c/python24/python setup.py bdist_egg
# Python 2.5: /c/python25/python setup.py build_static # if needed /c/python25/python setup.py bdist_egg
You'll now have eggs for your architecture and OS. Enjoy!
Wednesday, November 5. 2008
Cilk Arts is very close to releasing their multi-core programming library called Cilk++, and with it, a new license, the Cilk Arts Public License. Using this license, they attempt to close the "IDO [Internal Development Organizations] loophole" they believe exists with the current GPL. Namely: an organization can take a piece of GPL code, create a derivative work, and, if they do not distribute it outside their organization, they do not have to give it back to the community.
Cilk Arts' new license seeks to prevent this, basically, be redefining "distribution" to mean within the company as well:
If you are an IDO building applications for use by others but not "distributed" under the existing open source definitions (e.g., GPL) and you want to keep your Cilk++-based derivative work proprietary, then there is an impact. The CAPL requires you to make a fair exchange in order to use Cilk++. If you share your software with everyone, we share ours with you. If you do not wish to share, you can give back to the project by purchasing a commercial license.
I believe,that while a laudable goal, this will at best be ineffective, and at worst, stymie, or at least slow, the adoption of Cilk++; and yes, I have reasons.
There are, at least, three types of Open Source consumers:
- Those who have full intention to give back
- Those who have no intention to give back, and don't, despite the legal obligation to do so. See the archives of GPL Violations for plenty of examples.
- Those who don't initially have an intention to give back, but after using the library, decide that they'd like to, or can, give back.
The first two types of consumers will not be affected by this license. The givers will give, and the keepers will keep. It is the third type of consumer where Cilk++ will lose out: the "potential givers" (PG) category.
When a PG is evaluating various libraries to use for their new multi-core program, their license and distribution model (even internally) may be in doubt. They may even have a (wrong headed) policy about giving back to open source programs. Thus, when they see "you can't distribute derivative works (even internally)" it may completely turn them off, and they'll go on to the next library. Which would be sad; I've not personally reviewed Cilk++, but as it is coming out of an MIT project, I would assume it to be created by some rather bright people. So, the PG will completely pass over the chance to use this library without any further evaluation. Cilk++ will lose in this case. If the PG would have decided not to give back, Cilk++ would have not lost or gained either way. However, if the PG would have decided to give back down the line (say, enough internal rumbling by developers, or management changing policy), then Cilk++ will have gained a contributor. But, because they say off the top that you have to give back, they will have lost everybody in the "potential giver" category. In addition, they may also lose out on a possible license sale by those who want to try the library, and then end up buying a commercial license because they wish to distribute their application as closed-source.
So, my feedback to Cilk Arts is: go with the standard GPL (or even BSD license). Those who want to give back will, those who don't, won't, and those who might will give you a try and possibly become a contributor (with code or money) when they might not have otherwise.
Your thoughts?
Monday, October 27. 2008
Yet another reason I will avoid PHP at all costs. The head developers of PHP have decided, after much discussion, to make the back-slash the name-space separator when referencing classes. Thats's right, this\is\a\tree\valid\variable. I'm sorry, a back-slash? This will create no end of confusion and frustration when it comes to parsers and syntax highlighters. What was wrong with the period? How about '::' as in: a::valid::namespace?
One of the (small) reasons I like Linux that the only reason I have to use a back-slash is for special control characters. Seeing back-slashes all over my code would be even more jarring than the supposed "white space" problem some people have with Python.
For more on the decision, read the RFC. For discussion, Reddit has the usual.
It boggles the mind.
Wednesday, August 13. 2008
Scott Westfall has a post about defects vs. features. He notes:
I like the generic term, “change request” for all changes in a system. But it’s very important to know whether it is a defect or a feature request. In my lexicon, a “defect” is something that doesn’t work as spec’ed; a feature request is a request to alter the intended behavior.
He goes on to point out that customers generally don't care which it is, they just want things changed. Programmers often care about the definition because if it's a defect, it might mean their code is flawed. Yes, programmers can have egos, often huge ones.
But what he doesn't answer seem to answer is: who, ultimately, decides whether a submitted "change request" is a defect or a feature request? And does unexpected behavior, even if it specified as such, or simply undefined, qualify as a bug?
I've run in to this exact issue on a "change request" for a product I use: the Adept package updater in Kubuntu. On May 10, 2006, I reported that Adept doesn't behave properly when the network is down, but simply says the transaction failed. I finally discovered that it didn't properly determine if the network was up before it attempted to start the upgrade process.
After submitting this change request as a bug, it was changed to a "wishlist" priority. After it was changed, I wrote:
I don't agree with this bug being a "wishlist" bug. If a user does not know their network is not up, and they try to update, they are immediately going to write for help saying "Updates aren't working," or they are simply going to give up on Ubuntu because it "doesn't work right." I know I would have given up if I hadn't known to go to the prompt and type "apt-get upgrade," which is how I found out my network was down.
The change back to whishlist was accompanied by this comment:
You don't have to agree. It is wishlist though. You are asking for a totally new feature. And missing features != bugs in my world. Please leave it on wishlist, there is absolutely no point in bloating the severity. Thanks.
Now, I'm not sure if, or where, the specification is for Adept, but if a network-aware program I was using didn't tell me the network was down, but simply failed with no explanation, I would qualify that as a bug. I think that view is justified, considering that 1) another user agreed with me, and 2) my original report now has three other reports attached to it as "duplicates."
So, I agree fully with Scott that not all change requests are defects. However, when a program violates expected (and/or reasonable) behavior, the programmers/project managers/whoever really need to take a hard look at either the specification (if the program is in fact following the specification), or they need to define the behavior for the given case, and possibly bring it in to alignment with users' expectation[1].
[1] Yes, I know user expectations can vary wildly. Maybe a usability study is in order, but that's a topic for another post entirely.
Update: Good additional discussion of the problem. With a good example of something that should be classified as a bug in Visual Studio 2008, but still has not been fixed; and some discussion of how the bug/feature request dichotomy is bad for software projects as a whole since it can create friction between users and developers.
Sunday, August 10. 2008
Original, as far as I know. Correct me if you know otherwise.
Q: How do you program a computer to make beef stew?
A: You use bullion logic.
As bad, but not as good:
Q: How do you solve a math equation involving beef stew?
A: You use bullion algebra.
Edit: For those of you who got here by searching for "bullion logic" what you're actually searching for is boolean logic.
|