Tuesday November 24, 2009
Dart ideas, Game ideas, Ideas
Comments Off
I am gradually building stronger Scheme support into my various bioinformatics stochastic-modeling tools. The main classes are being exposed via Guile, and there’s a simple recursive-descent syntax checker on the C++ side.
Scheme has been a good fit because I was already using S-expressions as a compact structured file format. I now have de facto S-expression representations of Newick and Stockholm file formats.
The big issue in today’s cloudy world is concurrency. So how does concurrency enter in with Scheme?
Termite Scheme (Google Code) is a variant of Scheme intended for distributed computing. It offers a simple and powerful concurrency model, inspired by the Erlang programming language…
Dynamite is a framework for developing dynamic AJAX-like web user-interfaces. We used Termite processes to implement the web-server side logic, and we can manipulate user-interface components directly from the server-side (for example through the repl). Schack is an interactive multiplayer game using Dynamite for its GUI. Players and monsters move around in a virtual world, they can pick up objects, use them, etc. The rooms of the world, the players and monsters are all implemented using Termite processes which interact.
Concurrency Oriented Programming in Termite Scheme, Germain Feeley & Monnier (PDF).
Schack sounds like a lot of fun but I unfortunately cannot find it on Google.
Then there’s Clojure – Scheme on the JVM. But are there any games available?
Friday November 20, 2009
BioE131/231
Comments Off
Today I announced the final project for my intro-to-compbio class. Each student will set up JBrowse for a different viral genome. The details are here: biowiki.org/view/Fall09/FinalProject
I’m excited about this – it could be a really cool demo of JBrowse and even have some immediate value to RNA virus researchers too.
Friday November 20, 2009
Dart ideas
4 Comments
Next xrate-guile project: add a new smob for ECFG_inside_outside_matrix. Will need a container structure (c.f. Stockholm_smob) to carry around associated objects like ECFG_scores, ECFG_envelope, Stockholm, Aligned_score_profile, etc.
Offer two constructor primitives: one populates an ECFG_counts object and returns as an S-expression (and therefore calls fill_down), the other doesn’t.
Offer two new primitives, wrapping post_transition_ll and post_state_ll, returning probabilities rather than log-likelihoods (more simple/intuitive). NB the guile wrappers for these methods will have to do state name lookup.
So that’s four new primitives: xrate-dpm, xrate-dpm-with-counts, xrate-dpm-post-trans-prob, xrate-dpm-post-state-prob (or something like that).
Then, write a Scheme function that uses these to generate a WIG track from a named state.
See comments for efficiency considerations motivating iterators with callbacks (could be implemented using ECFG_envelope).
Wednesday November 11, 2009
Dart ideas
Comments Off
Objects to expose to Scheme as smobs:
- Stockholm alignment
- Newick tree
- Fasta sequence
- GFF record
Note that some read/construct functions may return lists of these smobs (read-stockholm-database, read-fasta-database etc.)
Objects with existing S-expression forms:
- ECFG_scores+Alphabet (xrate)
- Gramset (stemloc)
- Transducer_SExpr_file (phylocomposer)
No smobs are necessary for these: we can just quote them. We do need to ensure there is a top-level tag for each one. Stemloc has “model-set” (rename “sankoff-model-set”?); xrate and phylocomposer need new top-level tags e.g. “phylogrammar” and “composition” respectively.
Still need to figure out a minimal list of functions to offer as Scheme primitives.
Tuesday November 10, 2009
Dart ideas, Planning
1 Comment
xrate’s phylogenetic emissions are probabilistic all right. but the annotations are currently either wildcards, or single-character only. That is, each state deterministically emits a particular character to every annotation row during a column emission (or it may not emit anything to a given row, in which case it acts like a wildcard).
this worked fine when those rows were just outputs, and it worked OK for early basic use cases where the rows were inputs too. but inevitably people want to put probability distributions over these things and treat them as first-class random variables in the model.
Read the rest…
Friday October 23, 2009
BioE241, Ideas, Planning
Comments Off
Somehow I’m two years late to the party on this. But it’s very cool and I’m going to try & include it in my BioE241 class that covers some probabilistic modeling applications in compbio:
Clustering by Passing Messages Between Data Points. Brendan J. Frey and Delbert Dueck, Science, 2007.
More here.
Friday October 16, 2009
BioE131/231
Comments Off
When I first started teaching Needleman-Wunsch algorithm, I would derive the recursions and attempt to fill out a DP matrix on the chalkboard — usually making at least one mistake that would cover me in shame.
Last year I came with a cheat sheet. I made no mistakes and it was boring as hell.
Today, I gave the cheat sheet away and challenged the class to spot my mistakes while I attempted to beat the clock… and while Jordi Silvestre-Ryan’s Mathematica animation ran in the background.
Making things harder for yourself is often more fun.
Tuesday October 13, 2009
Dart ideas, Ideas
Comments Off
Here are the current plans for the xrate extension language:
- expose Stockholm alignments as smobs w/methods
read-stockholm, write-stockholm, ancestor-list, leaf-list, branch-list, count-columns
- expose scoring, training, annotation & tree-estimation as guile functions
em-best-grammar-params, nj-best-tree, cyk-best-annotation
Other plans for xrate:
- beagle-lib: Jeff Yunes currently evaluating this for PFold
- wiggle tracks: should these be considered as probabilistic outputs too, c.f. annotate blocks? another, easier design is simply to use the wig track to output the posterior probability of being in a particular state
- overhaul of numeric types: replace
Score, Loge, Log2 & Prob with Gerton Lunter’s algebra classes (BFloat etc.) first staging in a temporary class, defining protected constructor & operator-overloading methods, to catch at compile-time cases like Score sc=0
Friday October 9, 2009
BioE131/231, Ideas, Planning
Comments Off
I have been a bit unhappy with the way I presented the complexity analysis of Ruth Nussinov’s RNA folding algorithms in BioE131. In particular the time complexity. I think in future I would represent every (i,j,k) tuple as a point on the 3D lattice. It is then obvious there are O(L^3) such points.
Another (related) point is that the boundary cases for Nussinov complicate the presentation of both the recursion and its concrete implementation. I would like to simplify this, perhaps with reference to a context-free grammar.
Sunday October 4, 2009
BioE131/231
Comments Off
Did some “fold this sequence” exercises on Friday, mostly using viral regulatory seqs. Another case if “teach an algorithm by getting students to do it manually”. Tried & trusted technique