Twitter: raymondcamden


Address: Lafayette, LA, USA

Friday Puzzle - Welcome to Santa's IT Department

12-01-2006 5,195 views ColdFusion 17 Comments

For today's puzzler you find yourself newly employed in Santa's Workshop. While Santa is quite magical, full of joy, blah blah blah, his IT skills leave a bit to be desired. In fact, his current IT department currently consists of one elf who wants to be a dentist and some large furry creature.

Your first task is simple. Santa maintains a list of kids who are naughty and nice. He used to keep this list on a scroll of parchment but after recently upgrading to Windows 95 (hey, he takes his time with changes!) Santa began using Notepad.

Unfortunately - Notepad isn't a very nice database and this has led to some problems. Duplicate names and kids on two lists.

Write a simple function to read in two text files (good.txt and naughty.txt). Compare the files and remove any child on both lists and add them to a new file, tobedecided.txt. Also remove any child who appears more than once in a list.

Good luck - Santa is counting on you!

17 Comments

  • Commented on 12-01-2006 at 9:17 AM
    just names on the lists?
  • Commented on 12-01-2006 at 9:18 AM
    Nod. Just names. Santa is smart enough to know one John Smith from another. (Or maybe he isn't. Maybe next Friday... ;)
  • Commented on 12-01-2006 at 9:35 AM
    Originally my lists contained a name on each line, but was having trouble converting that to a list, what would be the delimiter? \r? Once I made my good and bad lists comma delimited it worked.

    <cffile action="read" file="D:\path\good.txt" variable="goodkids" />
    <cffile action="read" file="D:\path\bad.txt" variable="badkids" />
    <cfloop list="#goodkids#" index="i">
       <cfif listFindNoCase(badkids, i) neq 0>
          <cffile action="write" file="D:\path\tobedeterminded.txt" output="#i#" addnewline="yes">
       </cfif>
    </cfloop>
    <cffile action="read" file="D:\path\tobedeterminded.txt" variable="tbd" />
    <cfoutput><pre>#tbd#</pre></cfoutput>
  • Commented on 12-01-2006 at 9:40 AM
    Good start Tim - but don't forget you also need to remove (from both lists) the names of kids on both. The second task of removing dupes in general isn't quite as important.
  • Commented on 12-01-2006 at 9:44 AM
    Since I got distracted I'll concede victory to Tim.

    Tim, to convert to a list, just steal - er borrow - some of ray's blog cfc code -

    <cfset theList = replace(str, chr(10), ",", "all")>
    <cfset theList = replace(str, chr(13), ",", "all")>
  • Commented on 12-01-2006 at 9:45 AM
    Also Tim, I'd maybe save the output in a variable and write to file once rather than for each iteration of the loop. &lt;/myTwoCents&gt;
  • Commented on 12-01-2006 at 10:20 AM
    Here is my method. It tags the good and naughty FILE PATHS and creates the tobedecided.txt in the same directory.


    <cffunction name="FixSantasList" access="public" returntype="void" output="false">
       
       <!--- Define arguments. --->
       <cfargument name="GoodFile" type="string" required="true" />
       <cfargument name="NaughtyFile" type="string" required="true" />
       
       <!--- Define the local scope. --->
       <cfset var LOCAL = StructNew() />
       
       <!--- Read in good text. --->
       <cffile
          action="READ"
          file="#ARGUMENTS.GoodFile#"
          variable="LOCAL.GoodData"
          />
          
       <!--- Read in the naughty text. --->
       <cffile
          action="READ"
          file="#ARGUMENTS.NaughtyFile#"
          variable="LOCAL.NaughtyData"
          />
          
       <!---
          Create a file path for the undecided data. We will be putting
          this in the same directory as the other files.
       --->
       <cfset LOCAL.UndecidedFile = (
          GetDirectoryFromPath( ARGUMENTS.GoodFile ) &
          "tobedecided.txt"
          ) />
          
       
       <!--- Create indexes for the good, naugty, and undecided. --->
       <cfset LOCAL.GoodIndex = StructNew() />
       <cfset LOCAL.NaughtyIndex = StructNew() />
       <cfset LOCAL.UndecidedIndex = StructNew() />
       
       <!--- Loop over the good names and add to index. --->
       <cfloop index="LOCAL.Name" list="#LOCAL.GoodData#" delimiters="#Chr( 13 )##Chr( 10 )#">
       
          <cfset LOCAL.GoodIndex[ LOCAL.Name ] = LOCAL.Name />
       
       </cfloop>
       
       <!--- Loop over the naughty names and add to index. --->
       <cfloop index="LOCAL.Name" list="#LOCAL.NaughtyData#" delimiters="#Chr( 13 )##Chr( 10 )#">
       
          <cfset LOCAL.NaughtyIndex[ LOCAL.Name ] = LOCAL.Name />
       
       </cfloop>
       
       
       <!---
          ASSERT: At this point, the good and naughty indexes should
          ONLY have unique name entries. However, there may be names
          that exist in both.
       --->
       
       <!---
          Loop over the keys in the good index to see if they exist in
          the naughty index. If they do, they have to be removed from
          both and put into the undecided index.
       --->
       <cfloop item="LOCAL.Name" collection="#LOCAL.GoodIndex#">
          
          <!--- Check to see if it exists in naughty as well. --->
          <cfif StructKeyExists( LOCAL.NaughtyIndex, LOCAL.Name )>
          
             <!--- Add the name to the undecided index. --->
             <cfset LOCAL.UndecidedIndex[ LOCAL.Name ] = LOCAL.Name />
             
             <!--- Remove from the good and naughty index. --->
             <cfset StructDelete( LOCAL.GoodIndex, LOCAL.Name ) />
             <cfset StructDelete( LOCAL.NaughtyIndex, LOCAL.Name ) />
          
          </cfif>
       
       </cfloop>
       
       
       <!---
          ASSERT: At this point, all names should be in the appropriate
          indexes and should only appear one. Now, they need to be
          commited to file.
       --->
       
       
       <!--- Write the good names to the file. --->
       <cffile
          action="WRITE"
          file="#ARGUMENTS.GoodFile#"
          output="#ArrayToList( StructKeyArray( LOCAL.GoodIndex ), '#Chr( 13 )##Chr( 10 )#' ).Trim()#"
          />
          
       
       <!--- Write the naughty names to the file. --->
       <cffile
          action="WRITE"
          file="#ARGUMENTS.NaughtyFile#"
          output="#ArrayToList( StructKeyArray( LOCAL.NaughtyIndex ), '#Chr( 13 )##Chr( 10 )#' ).Trim()#"
          />
          
       <!--- Write the undecided names to the file. --->
       <cffile
          action="WRITE"
          file="#LOCAL.UndecidedFile#"
          output="#ArrayToList( StructKeyArray( LOCAL.UndecidedIndex ), '#Chr( 13 )##Chr( 10 )#' ).Trim()#"
          />
       
       <!---
          ***NOTE: I made this because I do NOT trust the case sensitivity of the
          key list returned by the structure. However, for short hand, I will use
          the above which does rely on sturct key last maintaining case sensitivity.
       --->
       <!---
       <!--- Create a string buffer for the good names. --->
       <cfset LOCAL.NameBuffer = CreateObject( "java", "java.lang.StringBuffer" ).Init() />
       
       <!--- Loop over good names and add to string buffer. --->
       <cfloop item="LOCAL.Name" collection="#LOCAL.GoodIndex#">
       
          <cfset LOCAL.NameBuffer.Append(
             LOCAL.GoodIndex[ LOCAL.Name ] &
             Chr( 13) & Chr( 10 )
             ) />
       
       </cfloop>
       
       <!--- Write the buffer to the file. --->
       <cffile
          action="WRITE"
          file="#ARGUMENTS.GoodFile#"
          output="#LOCAL.NameBuffer.ToString().Trim()#"
          />
          
          
       <!--- Create a string buffer for the good names. --->
       <cfset LOCAL.NameBuffer = CreateObject( "java", "java.lang.StringBuffer" ).Init() />
       
       <!--- Loop over naughty names and add to string buffer. --->
       <cfloop item="LOCAL.Name" collection="#LOCAL.NaughtyIndex#">
       
          <cfset LOCAL.NameBuffer.Append(
             LOCAL.NaughtyIndex[ LOCAL.Name ] &
             Chr( 13) & Chr( 10 )
             ) />
       
       </cfloop>
       
       <!--- Write the buffer to the file. --->
       <cffile
          action="WRITE"
          file="#ARGUMENTS.NaughtyFile#"
          output="#LOCAL.NameBuffer.ToString().Trim()#"
          />
       
       
       <!--- Create a string buffer for the good names. --->
       <cfset LOCAL.NameBuffer = CreateObject( "java", "java.lang.StringBuffer" ).Init() />
       
       <!--- Loop over undecided names and add to string buffer. --->
       <cfloop item="LOCAL.Name" collection="#LOCAL.UndecidedIndex#">
       
          <cfset LOCAL.NameBuffer.Append(
             LOCAL.UndecidedIndex[ LOCAL.Name ] &
             Chr( 13) & Chr( 10 )
             ) />
       
       </cfloop>
       
       <!--- Write the buffer to the file. --->
       <cffile
          action="WRITE"
          file="#LOCAL.UndecidedFile#"
          output="#LOCAL.NameBuffer.ToString().Trim()#"
          />
          
       --->
          
       <!--- Return out. --->
       <cfreturn />
    </cffunction>
  • Commented on 12-01-2006 at 10:29 AM
    Nice Ben. Also - people - note how his use of structs to suck in the names will also take care of dupes in the same list.
  • Commented on 12-01-2006 at 10:36 AM
    Thanks. I learned the Struct() as a look-up from Rick Osborne (one smart dude). Also, afterwards, I realize, I didn't need to convert to StructKeyArray() then to list. I could have gone right to StructKeyList() and passed in the same delimiters. I forgot that that function takes delimiters :D
  • Commented on 12-01-2006 at 10:41 AM
    OK, I changed my mind - here's my entry.

    Since Santa loves reusability we decided to build an entire component for him. I hope someone catches my obscure RUN-DMC reference.

    <cfcomponent displayname="santa">
       <cffunction name="toLines" returntype="string" hint="i make a list into lines" output="false">
          <cfargument name="str" hint="the list to turn into lines" required="true">
          <cfreturn replace(arguments.str, ",", chr(10), "all") />
       </cffunction>
       
       <cffunction name="toList" returntype="string" hint="i get the txt file and turn it into a list" output="false">
          <cfargument name="str" hint="the string to make a list out of" type="string" required="true">
          <cfset var returnStr = "">
          <cfset returnStr = replace(arguments.str, chr(10), ",", "all")>
          <cfset returnStr = replace(arguments.str, chr(13), ",", "all")>
          <cfreturn returnStr />
       </cffunction>
       
       <cffunction name="readFile" returntype="string" hint="i read the files" output="false">
          <cfargument name="fileToRead" hint="the full path to the file to read" required="true">
          <cfset var returnStr = "">
          <cffile action="read" file="#arguments.fileToRead#" variable="returnStr">
          <cfreturn returnStr />   
       </cffunction>
       
       <cffunction name="listNoDups" hint="i return a list with no duplicate items" output="false" returntype="string">
          <cfargument name="theList" hint="the list to clean up" required="true">
          <cfset var returnStr = "">
          <cfset var i = "">
          
          <cfloop from="1" to="#listLen(arguments.theList)#" index="i">
             <cfif NOT listContainsNoCase(returnStr, listGetAt(arguments.theList, i))>
                <cfset returnStr = listAppend(returnStr, listGetAt(arguments.theList, i))>
             </cfif>
          </cfloop>
          <cfreturn returnStr />
       </cffunction>
       
       <cffunction name="listCompare" hint="i compare two lists" output="false" returntype="string">
          <cfargument name="listOne" type="string" hint="the first list" required="true">
          <cfargument name="listTwo" type="string" hint="the second list" required="true">
          <cfset var tbdList = "">
          <cfset var i = "">
          <cfset var cleanListOne = listNoDups(arguments.listOne)>
          <cfset var cleanListTwo = listNoDups(arguments.listTwo)>
          <cfdump var="#listOne#"><cfdump var="#listTwo#">
          <cfloop from="1" to="#listLen(arguments.listOne)#" index="i">
             <cfif listContainsNoCase(cleanListTwo, listGetAt(cleanListOne, i))>
                <cfset tbdList = listAppend(tbdList, listGetAt(cleanListOne, i))>
             </cfif>
          </cfloop>
          <cfreturn tbdList />
       </cffunction>
       
    </cfcomponent>

    christmasInHollis.cfm
    ---------------------
    <cfset variables.goodTxt = expandPath("good.txt")>
    <cfset variables.naughtyTxt = expandPath("naughty.txt")>
    <cfset variables.goodList = readFile(variables.goodTxt)>
    <cfset variables.naughtyList = readFile(variables.naughtyTxt)>
    <cfset variables.tbd = listCompare(toList(variables.goodList), toList(variables.naughtyList))>
    <cffile action="write" file="#expandPath("tbd.txt")#" output="#toLines(variables.tbd)#">
  • Tuyen #
    Commented on 12-01-2006 at 11:55 AM
    Happy Friday everyone! :-) I don't think that Santa's machine has enough memory to read 2 HUGES text files (trillian records) then combine them. Beside, Coldfusion will take a lot of time to read them LINE-BY-LINE.... I'll show Santa to use free MySQL to do it... :-)

    Please read this post from Ben Forta:
    http://www.forta.com/blog/index.cfm/2006/12/1/Cold...
  • Commented on 12-01-2006 at 12:01 PM
    But you have to remember that Santa's computer is only limited by the amount of Christmas Cheer in the world, not by RAM and Processing speed ;)
  • Commented on 12-01-2006 at 4:25 PM
    Wow, this beats the heck out of real work.

    <!-- Read --->
    <cffile action="read" file="#expandPath("./good.txt")#" variable="goodFileContent" />
    <cffile action="read" file="#expandPath("./bad.txt")#" variable="badFileContent" />

    <!--- Maps elim dups --->
    <cfset tbd = structNew() />
    <cfset good = structNew() />
    <cfset bad = structNew() />

    <!--- Bad -> map --->
    <cfloop list="#badFileContent#" index="i" delimiters="#chr(10)#">
       <cfset bad[i] = "" />
    </cfloop>

    <!--- Good -> filtered into approp. maps --->
    <cfloop list="#goodFileContent#" index="i" delimiters="#chr(10)#">
       <cfset dupPosition = listFindNoCase(badFileContent, i, chr(10)) />
       <cfif dupPosition>
          <cfset tbd[i] = "" />
          <cfset structDelete(bad, i) />
       <cfelse>
          <cfset good[i] = "" />
       </cfif>
    </cfloop>

    <!--- Write --->
    <cffile action="write" file="#expandPath("./newGood.txt")#" output="#listChangeDelims(structKeyList(good), chr(10))#" />
    <cffile action="write" file="#expandPath("./newBad.txt")#" output="#listChangeDelims(structKeyList(bad), chr(10))#" />
    <cffile action="write" file="#expandPath("./newTbd.txt")#" output="#listChangeDelims(structKeyList(tbd), chr(10))#" />
  • Commented on 12-01-2006 at 5:24 PM
    I agree with Tuyen and the Forta reference. Two large lists (hundreds of millions of rows apiece?), perfect fodder for a DBMS.

    In my case, do two BULK INSERT statements, which would insert the data from the two lists into a SQL Server table in no time. Then, write one SQL query to generate the results you want.

    Awesome ideas on the use of structures in this case to help handle duplicates!

    Tony

    PS - This page http://www.un.org/esa/socdev/ageing/agewpop1.htm seems to indicate that as of the year 2000, 30% of the population was 15 years old or younger. It doesn't say how many were "good" and "bad". :(
  • Commented on 12-01-2006 at 6:01 PM
    If Santa doesn't have a database handy, CF provides a workaround. CFHTTP to read in the files as queries, QoQ to extract distincts as well as any that appear in both lists, then write the files back out. Like this:

    <cfset thisFileName = listLast(CGI.SCRIPT_NAME, "/") />
    <cfset baseFolder = replace(CGI.SCRIPT_NAME, thisFileName, "") />
    <cfset baseURL = "http://"; & CGI.SERVER_NAME & ":" & CGI.SERVER_PORT & baseFolder />
    <cfset baseFilePath = replace(CGI.CF_TEMPLATE_PATH, thisFileName, "") />

    <cfset goodURL = baseURL & "good.txt" />
    <cfset naughtyURL = baseURL & "naughty.txt" />

    <cfset goodFile = baseFilePath & "good.txt" />
    <cfset naughtyFile = baseFilePath & "naughty.txt" />
    <cfset undecidedFile = baseFilePath & "tobedecided.txt" />

    <cfhttp method="get" columns="name" firstrowasheaders="false" url="#goodURL#" name="goodKids" />
    <cfhttp method="get" columns="name" firstrowasheaders="false" url="#naughtyURL#" name="naughtyKids" />

    <cfquery dbtype="query" name="uniqueGoodKids">
       SELECT DISTINCT name
       FROM goodKids
       WHERE name NOT IN (#quotedValueList(naughtyKids.name)#)
    </cfquery>

    <cfquery dbtype="query" name="uniqueNaughtyKids">
       SELECT DISTINCT name
       FROM naughtyKids
       WHERE name NOT IN (#quotedValueList(goodKids.name)#)
    </cfquery>

    <cfquery dbtype="query" name="undecidedKids">
       SELECT DISTINCT name
       FROM goodKids
       WHERE name IN (#quotedValueList(naughtyKids.name)#)
    </cfquery>

    <cfset newLine = "
    " />

    <cffile action="write" file="#goodFile#" addnewline="false" output="#valueList(uniqueGoodKids.name, newLine)#" fixnewline="true" />
    <cffile action="write" file="#naughtyFile#" addnewline="false" output="#valueList(uniqueNaughtyKids.name, newLine)#" fixnewline="true" />
    <cffile action="write" file="#undecidedFile#" addnewline="false" output="#valueList(undecidedKids.name, newLine)#" fixnewline="true" />
  • MikerRoo #
    Commented on 12-02-2006 at 12:23 AM
    My approach:
    --------------

    <!--- Configuration settings --->
    <CFSCRIPT>
    sGoodFileName = "good";
    sBadFileName = "naughty";
    sFileExtension = "txt";
    sTBD_FileName = "tobedecided." & sFileExtension;

    sDSN = "MySQL_Test";
    sDataBaseName = "SantySnoop";

    /*--- Actually use different filenames for testing ease. ---
    /
    sGoodFileIn = sGoodFileName & "_src." & sFileExtension;
    sBadFileIn = sBadFileName & "_src." & sFileExtension;
    sGoodFileOut = sGoodFileName & "_out." & sFileExtension;
    sBadFileOut = sBadFileName & "_out." & sFileExtension;

    sBasePath = GetDirectoryFromPath (GetCurrentTemplatePath ());
    </CFSCRIPT>

    <CFOUTPUT>
    <title>Fixes Santa's naughty/nice lists</title>
    <h1>Fixes Santa's naughty/nice lists</h1>
    <p>
    This application cleans up Santa's "Do not cry" and "Terrible Tot watch" lists to comply with NSA requirements
    and USC BR-549 (13)(b)(ii) and TSA "whim of the week" TSA-88961473-5657-223.
    </p>

    <h3>Operation:</h3>
    <ol>
    <li> Read two source files, <i>#sGoodFileIn#</i> and <i>#sBadFileIn#</i>.
    <li> Remove duplicate names (case insensitive) from each list.
    <li> If any name appears on both lists, remove it to a third list, &quot;TBD&quot;.
    <li> Write out the cleaned up &quot;good&quot; list to <i>#sGoodFileOut#</i>.
    <li> Write out the cleaned up &quot;bad&quot; list to <i>#sBadFileOut#</i>.
    <li> Write out any &quot;TBD&quot; names to <i>#sTBD_FileName#</i>.
    <li> Done!
    </ol>

    <h3>Important!</h3>
    <ul>
    <li> Almost all of the work here is done in SQL (as it should be).
    <li> In this case, we use MySQL only but ANY RDBMS will do this better than a pure Coldfusion approach.
    <li> This code requires a valid MySQL datasource.
    <li> It Creates 5 tables in a database named &quot;#sDataBaseName#&quot;.
    <li> The text files are plain text with one name per line (EG &quot;Jim Johnson&quot;).
    </ul>
    </CFOUTPUT>


    <!--- Create the database and tables if they do not already exist.
    --->
    <CFSCRIPT>
    oDB_Func = CreateObject ("component", "cfcSantaListDB").oInit (sDSN, sDataBaseName);
    oDB_Func.SetupDataBase ();
    oDB_Func.CreateStandardNameTable ('tGoodListRaw', 'no');
    oDB_Func.CreateStandardNameTable ('tBadListRaw', 'no');
    oDB_Func.CreateStandardNameTable ('tGoodListFinal', 'yes');
    oDB_Func.CreateStandardNameTable ('tBadListFinal', 'yes');
    oDB_Func.CreateStandardNameTable ('tTBD_ListFinal', 'yes');
    </CFSCRIPT>


    <!--- Get the initial stats. --->
    <CFSET iGoodRowsInitial= oDB_Func.iGetRowCount ('tGoodListFinal')>
    <CFSET iBadRowsInitial = oDB_Func.iGetRowCount ('tBadListFinal')>
    <CFSET iTBDRowsInitial = oDB_Func.iGetRowCount ('tTBD_ListFinal')>


    <!--- Import the raw lists. --->
    <CFSET iGoodRowsRaw = oDB_Func.iImportTextData ('tGoodListRaw', '#sBasePath##sGoodFileIn#')>
    <CFSET iBadRowsRaw = oDB_Func.iImportTextData ('tBadListRaw', '#sBasePath##sBadFileIn#')>


    <!--- Clean up duplicates in each list. --->
    <CFSET iGoodRowsNew = oDB_Func.iAddNewNamesToList ('tGoodListRaw', 'tGoodListFinal')>
    <CFSET iBadRowsNew = oDB_Func.iAddNewNamesToList ('tBadListRaw', 'tBadListFinal')>


    <!--- Find the "TBD" names --->
    <CFSET iNumTBD_Names = oDB_Func.iStripOutNiceNaughtyDoers ('tGoodListFinal', 'tBadListFinal', 'tTBD_ListFinal')>


    <!--- Write the final files --->
    <CFSET iGoodRowsFinal = oDB_Func.iExportTextData ('tGoodListFinal', '#sBasePath##sGoodFileOut#')>
    <CFSET iBadRowsFinal = oDB_Func.iExportTextData ('tBadListFinal', '#sBasePath##sBadFileOut#')>
    <CFSET iTBDRowsFinal = oDB_Func.iExportTextData ('tTBD_ListFinal', '#sBasePath##sTBD_FileName#')>


    <!--- Report --->
    <CFOUTPUT>
    <h3>Results:</h3>
    <table class="SantasJudgement">
    <tr>
    <th>&nbsp;</th>
    <th scope="col">Nice</th>
    <th scope="col">Naughty</th>
    <th scope="col">Need more snooping</th>
    </tr>
    <tr>
    <th scope="row">Initial unique names:</th>
    <td>#iGoodRowsInitial#</td>
    <td>#iBadRowsInitial#</td>
    <td>#iTBDRowsInitial#</td>
    </tr>
    <tr>
    <th scope="row">Raw lines read:</th>
    <td>#iGoodRowsRaw#</td>
    <td>#iBadRowsRaw#</td>
    <td>na</td>
    </tr>
    <tr>
    <th scope="row">Duplicate, blank, or preexisting names:</th>
    <td>#iGoodRowsRaw - iGoodRowsNew#</td>
    <td>#iBadRowsRaw - iBadRowsNew#</td>
    <td>na</td>
    </tr>
    <tr>
    <th scope="row">New names added:</th>
    <td>#iGoodRowsNew#</td>
    <td>#iBadRowsNew#</td>
    <td>na</td>
    </tr>
    <tr>
    <th scope="row">Names appearing on both lists:</th>
    <td>#iGoodRowsInitial + iGoodRowsNew - iGoodRowsFinal#</td>
    <td>#iBadRowsInitial + iBadRowsNew - iBadRowsFinal#</td>
    <td>#iNumTBD_Names# new</td>
    </tr>
    <tr>
    <th scope="row">Final number of unique names:</th>
    <td>#iGoodRowsFinal#</td>
    <td>#iBadRowsFinal#</td>
    <td>#iTBDRowsFinal#</td>
    </tr>
    </table>
    </CFOUTPUT>

    <!---
    <CFDUMP var="#variables#">
    --->

    *****************************************************************************
    ****
    Source for cfcSantaListDB.cfc starts here. *****
    *****************************************************************************
    <CFCOMPONENT
    displayname = "SantaList_DB_Object"
    output = "no"
    hint = "Encapsulates the DB functions for our Santa list application."
    >
    <!--- For this Q&D example code, we avoid the data abstraction layer and just
    use MySQL syntax (Version 5.0).
    --->

    <!--- Private object globals. --->
    <CFSCRIPT>
    sDSN = "";
    sDB_Name = "";
    sLineTerm = "\r\n";

    oRunTime = CreateObject ("java", "java.lang.Thread");
    </CFSCRIPT>


    <CFFUNCTION
    name = "oInit"
    displayname = "Initialize function/constructor"
    output = "no"
    returntype = "cfcSantaListDB"
    hint = "Creates the object with the specified DSN and DB name."
    >
    <CFARGUMENT name="sDataSourceName" type="variablename" required="yes" hint="MySQL only for now.">
    <CFARGUMENT name="sDatabaseName" type="variablename" default="SantySnoop">


    <CFSET Variables.sDSN = Arguments.sDataSourceName>
    <CFSET Variables.sDB_Name = Arguments.sDatabaseName>

    <CFRETURN this>
    </CFFUNCTION>


    <CFFUNCTION name="SetupDataBase" output="no" returntype="void">

    <CFQUERY name="qDB_Setup" datasource="#Variables.sDSN#">
    CREATE DATABASE IF NOT EXISTS #Variables.sDB_Name#;
    </CFQUERY>

    <!--- Avoid race problems. Wait until DB shows up.
    --->
    <CFLOOP condition="1 EQ 1">
    <CFSET Variables.oRunTime.sleep (200)> <!--- We often need to allow some time for MySQL to REALLY finish DB Alter ops. --->

    <CFQUERY name="qDB_SetupChk" timeout="5" datasource="#Variables.sDSN#">
    SHOW DATABASES LIKE '#LCase (Variables.sDB_Name)#';
    </CFQUERY>

    <CFIF qDB_SetupChk.RecordCount GTE 1>
    <CFBREAK>
    </CFIF>
    </CFLOOP>
    </CFFUNCTION>


    <CFFUNCTION name="CreateStandardNameTable" output="no" returntype="void">
    <CFARGUMENT name="sTableName" type="variablename" required="yes">
    <CFARGUMENT name="bAddUniqueConstraint" type="boolean" default="no">

    <CFQUERY name="qTableSetup" datasource="#Variables.sDSN#">
    CREATE TABLE IF NOT EXISTS
    #Variables.sDB_Name#.#Arguments.sTableName#
    (
    iId mediumint(9) NOT NULL auto_increment,
    sName char(65) NOT NULL,
    PRIMARY KEY (iId),
    #IIf (Arguments.bAddUniqueConstraint, DE ("UNIQUE KEY sNameIdx "), DE ("INDEX "))# (sName)
    );
    </CFQUERY>

    <!--- We don't seem to need to allow any extra time or checks for table creation.
    SHOW TABLES FROM SantySnoop LIKE 'tGoodListRaw'
    --->
    </CFFUNCTION>


    <CFFUNCTION name="iImportTextData" output="no" returntype="numeric">
    <CFARGUMENT name="sTableName" type="variablename" required="yes">
    <CFARGUMENT name="sFilePath" type="string" required="yes">
    <CFARGUMENT name="bResetTable" type="boolean" default="yes">

    <!--- Escape path slashes. --->
    <CFSET var sEscFilePath = Replace (Arguments.sFilePath, "\", "\\", "ALL")>

    <CFTRANSACTION>
    <!--- CF BUG! The trailing select in the first query is not handled by CF!!
    Thus we need to split the queries -- which we can do because we stay in
    the same MySQL session.
    --->
    <CFQUERY name="qTextImport" datasource="#Variables.sDSN#">
    <!--- Note that row_count() does not work with INFILE.
    Also, spaces before () cause weird errors.
    --->
    <CFIF Arguments.bResetTable>
    TRUNCATE TABLE #Variables.sDB_Name#.#Arguments.sTableName#;
    </CFIF>

    <!--- If we don't truncate the table, get the initial row count. --->
    SET @iBeforeRaw = (SELECT Count(iID) FROM #Variables.sDB_Name#.#Arguments.sTableName#);

    LOAD DATA INFILE '#sEscFilePath#' INTO TABLE #Variables.sDB_Name#.#Arguments.sTableName#
    LINES TERMINATED BY '#Variables.sLineTerm#'
    (sName);

    SET @iAfterRaw = (SELECT Count(iID) FROM #Variables.sDB_Name#.#Arguments.sTableName#);

    SELECT @iAfterRaw - @iBeforeRaw AS iRowsInserted; <!--- CF ignores!!! --->
    </CFQUERY>

    <CFQUERY name="qTextImportStat" datasource="#Variables.sDSN#">
    SELECT @iAfterRaw - @iBeforeRaw AS iRowsInserted;
    </CFQUERY>
    </CFTRANSACTION>

    <CFRETURN qTextImportStat['iRowsInserted'][1]>
    </CFFUNCTION>


    <CFFUNCTION name="iExportTextData" output="no" returntype="numeric">
    <CFARGUMENT name="sTableName" type="variablename" required="yes">
    <CFARGUMENT name="sFilePath" type="string" required="yes">

    <!--- Escape path slashes. --->
    <CFSET var sEscFilePath = Replace (Arguments.sFilePath, "\", "\\", "ALL")>

    <CFIF FileExists (sEscFilePath)>
    <CFFILE action="delete" file="#Arguments.sFilePath#">
    </CFIF>

    <CFTRANSACTION>
    <!--- CF BUG! The trailing select in the first query is not handled by CF!!
    Thus we need to split the queries -- which we can do because we stay in
    the same MySQL session.
    --->
    <CFQUERY name="qTextExport" datasource="#Variables.sDSN#">
    <!--- Note that row_count() does not work well here.
    --->
    SELECT
    sName
    FROM
    #Variables.sDB_Name#.#Arguments.sTableName#
    INTO OUTFILE
    '#sEscFilePath#'
    LINES TERMINATED BY
    '#Variables.sLineTerm#'
    ;

    SET @iWriteCnt = (SELECT ROW_COUNT()); <!--- This is wrong for many mysql versions! --->
    </CFQUERY>

    <CFQUERY name="qTextExportStat" datasource="#Variables.sDSN#">
    SELECT
    Count(iID) AS iRowsInserted
    FROM
    #Variables.sDB_Name#.#Arguments.sTableName#;
    </CFQUERY>
    </CFTRANSACTION>

    <CFRETURN qTextExportStat['iRowsInserted'][1]>
    </CFFUNCTION>


    <CFFUNCTION name="iGetRowCount" output="no" returntype="numeric">
    <CFARGUMENT name="sTableName" type="variablename" required="yes">

    <CFQUERY name="qGetRowCount" datasource="#Variables.sDSN#">
    SELECT
    Count(iID) AS iNumRows
    FROM
    #Variables.sDB_Name#.#Arguments.sTableName#;
    </CFQUERY>

    <CFRETURN qGetRowCount['iNumRows'][1]>
    </CFFUNCTION>


    <CFFUNCTION name="iAddNewNamesToList" output="no" returntype="numeric">
    <CFARGUMENT name="sSrcTableName" type="variablename" required="yes">
    <CFARGUMENT name="sDestTableName" type="variablename" required="yes">
    <CFARGUMENT name="bResetTable" type="boolean" default="no">

    <CFTRANSACTION>
    <!--- CF BUG! The trailing select in the first query is not handled by CF!!
    Thus we need to split the queries -- which we can do because we stay in
    the same MySQL session.
    --->
    <CFQUERY name="qNameClean" datasource="#Variables.sDSN#">
    <CFIF Arguments.bResetTable>
    TRUNCATE TABLE #Variables.sDB_Name#.#Arguments.sDestTableName#;
    </CFIF>

    Use #Variables.sDB_Name#;

    INSERT INTO
    #Arguments.sDestTableName# (sName)
    SELECT DISTINCT
    tRaw.sName
    FROM
    #Arguments.sSrcTableName# tRaw
    LEFT JOIN
    #Arguments.sDestTableName# tFin ON tRaw.sName = tFin.sName
    WHERE
    tFin.sName IS NULL
    AND
    tRaw.sName IS NOT NULL
    AND
    Trim(tRaw.sName) <> ''
    ;

    SET @iNewRows = (SELECT ROW_COUNT());
    </CFQUERY>

    <CFQUERY name="qNameCleanStat" datasource="#Variables.sDSN#">
    SELECT IFNULL(@iNewRows, 0) AS iRowsInserted;
    </CFQUERY>
    </CFTRANSACTION>

    <CFRETURN qNameCleanStat['iRowsInserted'][1]>
    </CFFUNCTION>


    <CFFUNCTION name="iStripOutNiceNaughtyDoers" output="no" returntype="numeric">
    <CFARGUMENT name="sGoodTableName" type="variablename" required="yes">
    <CFARGUMENT name="sBadTableName" type="variablename" required="yes">
    <CFARGUMENT name="sTBD_TableName" type="variablename" required="yes">
    <CFARGUMENT name="bResetTable" type="boolean" default="no">

    <CFTRANSACTION>
    <!--- CF BUG! The trailing select in the first query is not handled by CF!!
    Thus we need to split the queries -- which we can do because we stay in
    the same MySQL session.
    --->
    <CFQUERY name="qTBD_Name" datasource="#Variables.sDSN#">
    <CFIF Arguments.bResetTable>
    TRUNCATE TABLE #Variables.sDB_Name#.#Arguments.sTBD_TableName#;
    </CFIF>

    Use #Variables.sDB_Name#;

    INSERT INTO
    #Arguments.sTBD_TableName# (sName)
    SELECT
    tBad.sName
    FROM
    #Arguments.sBadTableName# tBad
    INNER JOIN
    #Arguments.sGoodTableName# tGood ON tBad.sName = tGood.sName
    LEFT JOIN
    #Arguments.sTBD_TableName# tTBD ON tBad.sName = tTBD.sName
    WHERE
    tTBD.sName IS NULL
    ;

    SET @iNewRows = (SELECT ROW_COUNT());


    DELETE FROM
    tBad
    USING
    #Arguments.sBadTableName# tBad,
    #Arguments.sTBD_TableName# tTBD
    WHERE
    tBad.sName = tTBD.sName
    ;


    DELETE FROM
    tGood
    USING
    #Arguments.sGoodTableName# tGood,
    #Arguments.sTBD_TableName# tTBD
    WHERE
    tGood.sName = tTBD.sName
    ;
    </CFQUERY>

    <CFQUERY name="qTBD_NameStat" datasource="#Variables.sDSN#">
    SELECT IFNULL(@iNewRows, 0) AS iRowsInserted;
    </CFQUERY>
    </CFTRANSACTION>

    <CFRETURN qTBD_NameStat['iRowsInserted'][1]>
    </CFFUNCTION>

    </CFCOMPONENT>
  • Tuyen #
    Commented on 12-03-2006 at 4:58 AM
    Totally agree with you, Tony, about BULK INSERT. Wow, that was my first solution. Whenever dealing with MILLION records, BULK INSERT would be the best solution.... Or we can use PERL HASH TABLE (run on windows comnand) which is much faster than Coldfusion Struct (I guess)...

    Great Post!

Post Reply

Please refrain from posting large blocks of code as a comment. Use Pastebin or Gists instead. Text wrapped in asterisks (*) will be bold and text wrapped in underscores (_) will be italicized.

Leave this field empty