<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Peter's Blog - Nodes for crm114</title>
    <link>http://www.petersblog.org/</link>
    <description>Nodes containing the tag crm114</description>
    <item>
      <title>Python crm114 recipe</title>
      <link>http://www.petersblog.org/node/view/976</link>
      <description>&lt;p&gt;
I'm interested in using &lt;a href="http://crm114.sourceforge.net/"&gt;CRM114&lt;/a&gt; in a project written in &lt;a href="/tag/python"&gt;python&lt;/a&gt;. CRM114 is a hard to describe but I want to use it as an intelligent categoriser to decide whether an item of text should go into group a or group b (something like a bayesian spam filter). I could use the bayesian filters from &lt;a href="/node/355"&gt;SpamBayes&lt;/a&gt; but CRM114 is likely to be faster, more flexible and less fixated with email spam filtering. 
&lt;/p&gt;
&lt;p&gt;
CRM114 has it's own weird programming language to learn so the problem is really to create something minimal that works and to wrap it in a language I know. Hence this recipe uses CRM114 to simply decide whether a lump of text is good or bad and returns the result to python. If this is in doubt it seems to err towards 'good'. 
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;
Install crm114 on &lt;a href="/tag/debian"&gt;debian&lt;/a&gt; or &lt;a href="/tag/ubuntu"&gt;ubuntu&lt;/a&gt; system: 
&lt;pre class="lazy"&gt;sudo apt-get install crm114
&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
create a script called 'learngood.crm' to create a 'good' database 
&lt;div class="verbatim-block"&gt;&lt;pre&gt;#!/usr/bin/crm

{
    learn &amp;lt;osb unique microgroom&amp;gt; (good.css)
}
&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
teach it about the good things in 'good.txt' like this: 
&lt;pre class="lazy"&gt;./learngood.crm &lt;span class="Keyword"&gt;&amp;lt;&lt;/span&gt; good.txt
&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
create a script called 'learnbad.crm' to create a 'bad' database 
&lt;div class="verbatim-block"&gt;&lt;pre&gt;#!/usr/bin/crm

{
    learn &amp;lt;osb unique microgroom&amp;gt; (good.css)
}
&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
teach it about the bad things in 'bad.txt' like this: 
&lt;pre class="lazy"&gt;./learnbad.crm &lt;span class="Keyword"&gt;&amp;lt;&lt;/span&gt; bad.txt
&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
create a script called 'pick.crm' to make the decision: 
&lt;div class="verbatim-block"&gt;&lt;pre&gt;#!/usr/bin/crm

{
    {
         classify &amp;lt;osb unique microgroom&amp;gt; ( bad.css | good.css )
         # bad
         exit /1/
    }
    # good
    exit /0/
}
&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
create a python script to run it: 
&lt;pre class="lazy"&gt;&lt;span class="line-numbers"&gt;   1 &lt;/span&gt; &lt;span class="Comment"&gt;&lt;span class="Comment"&gt;#&lt;/span&gt;&lt;/span&gt;
&lt;span class="line-numbers"&gt;   2 &lt;/span&gt; &lt;span class="Comment"&gt;&lt;span class="Comment"&gt;#&lt;/span&gt; Pick one or the other.&lt;/span&gt;
&lt;span class="line-numbers"&gt;   3 &lt;/span&gt; &lt;span class="Comment"&gt;&lt;span class="Comment"&gt;#&lt;/span&gt;&lt;/span&gt;
&lt;span class="line-numbers"&gt;   4 &lt;/span&gt; 
&lt;span class="line-numbers"&gt;   5 &lt;/span&gt; &lt;span class="Keyword"&gt;import&lt;/span&gt; sys
&lt;span class="line-numbers"&gt;   6 &lt;/span&gt; &lt;span class="Keyword"&gt;import&lt;/span&gt; popen2
&lt;span class="line-numbers"&gt;   7 &lt;/span&gt; 
&lt;span class="line-numbers"&gt;   8 &lt;/span&gt; strText &lt;span class="Keyword"&gt;=&lt;/span&gt; &lt;span class="MetaFunctionCallPy"&gt;sys.stdin.read&lt;span class="MetaFunctionCallPy"&gt;(&lt;/span&gt;&lt;span class="MetaFunctionCallPy"&gt;&lt;/span&gt;&lt;span class="MetaFunctionCallPy"&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class="line-numbers"&gt;   9 &lt;/span&gt; 
&lt;span class="line-numbers"&gt;  10 &lt;/span&gt; oCrm &lt;span class="Keyword"&gt;=&lt;/span&gt; &lt;span class="MetaFunctionCallPy"&gt;popen2.Popen3&lt;span class="MetaFunctionCallPy"&gt;(&lt;/span&gt;&lt;span class="MetaFunctionCallPy"&gt; &lt;span class="String"&gt;&lt;span class="String"&gt;'&lt;/span&gt;./pick.crm&lt;span class="String"&gt;'&lt;/span&gt;&lt;/span&gt;, &lt;span class="String"&gt;&lt;span class="String"&gt;'&lt;/span&gt;w&lt;span class="String"&gt;'&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="MetaFunctionCallPy"&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class="line-numbers"&gt;  11 &lt;/span&gt; 
&lt;span class="line-numbers"&gt;  12 &lt;/span&gt; &lt;span class="MetaFunctionCallPy"&gt;oCrm.tochild.write&lt;span class="MetaFunctionCallPy"&gt;(&lt;/span&gt;&lt;span class="MetaFunctionCallPy"&gt; strText&lt;/span&gt;&lt;span class="MetaFunctionCallPy"&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class="line-numbers"&gt;  13 &lt;/span&gt; &lt;span class="MetaFunctionCallPy"&gt;oCrm.tochild.close&lt;span class="MetaFunctionCallPy"&gt;(&lt;/span&gt;&lt;span class="MetaFunctionCallPy"&gt;&lt;/span&gt;&lt;span class="MetaFunctionCallPy"&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class="line-numbers"&gt;  14 &lt;/span&gt; 
&lt;span class="line-numbers"&gt;  15 &lt;/span&gt; nRet &lt;span class="Keyword"&gt;=&lt;/span&gt; &lt;span class="MetaFunctionCallPy"&gt;oCrm.wait&lt;span class="MetaFunctionCallPy"&gt;(&lt;/span&gt;&lt;span class="MetaFunctionCallPy"&gt;&lt;/span&gt;&lt;span class="MetaFunctionCallPy"&gt;)&lt;/span&gt;&lt;/span&gt;
&lt;span class="line-numbers"&gt;  16 &lt;/span&gt; 
&lt;span class="line-numbers"&gt;  17 &lt;/span&gt; &lt;span class="Keyword"&gt;if&lt;/span&gt; nRet &lt;span class="Keyword"&gt;==&lt;/span&gt; &lt;span class="Constant"&gt;0&lt;/span&gt;:
&lt;span class="line-numbers"&gt;  18 &lt;/span&gt;     &lt;span class="Keyword"&gt;print&lt;/span&gt; &lt;span class="String"&gt;&lt;span class="String"&gt;'&lt;/span&gt;It was Good&lt;span class="String"&gt;'&lt;/span&gt;&lt;/span&gt;
&lt;span class="line-numbers"&gt;  19 &lt;/span&gt; &lt;span class="Keyword"&gt;else&lt;/span&gt;:
&lt;span class="line-numbers"&gt;  20 &lt;/span&gt;     &lt;span class="Keyword"&gt;print&lt;/span&gt; &lt;span class="String"&gt;&lt;span class="String"&gt;'&lt;/span&gt;It was Bad&lt;span class="String"&gt;'&lt;/span&gt;&lt;/span&gt;
&lt;/pre&gt;
This script reads text from standard input and then passes it through the crm filter. The script prints whether the text is good or bad. 
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;
This is using the OSB (Orthogonal Sparse Bigram) classifier. CRM114 has multiple classifiers to choose from if you have some objection to Orthogonal Sparse Bigrams. 
&lt;/p&gt;
&lt;p&gt;
The use of 'Popen3' to pipe the text to crm114 means it won't work under &lt;a href="/tag/windows"&gt;Windows&lt;/a&gt;. You have my deepest sympathy. 
&lt;/p&gt;&lt;p&gt;Related Posts: &lt;a href="/tag/crm114"&gt;crm114&lt;/a&gt; &lt;a href="/tag/python"&gt;python&lt;/a&gt;&lt;/p&gt;</description>
      <guid>http://www.petersblog.org/node/view/976</guid>
      <category domain="http://www.technorati.com/tag">crm114</category>
      <category domain="http://www.technorati.com/tag">python</category>
    </item>
  </channel>
</rss>
