Weekly Project Progress Report - STEP Methodology Source format is CSV exported from MS-Excel Sheet : WeeklyReport.xls (prepared by Bernard H. July 2009) ********************************************************************* ** FOREWORD: This is a tutorial sample. ** ** ** ** IMPORTANT: understanding Regular Expressions is a MUST before ** ** looking any further; your guessing attempts will be defeated! ** ** A one hour tutorial is available at the web site. ** ********************************************************************* (every line not starting in column 1 is a comment or annotation ignored by the Parser) (best displayed with fixed-spacing font) Next lines support integrated testing facilities (within StylusStudio(TM)) #ONE=WeeklyReport.csv; #TWO=--N/A--; REVISIONS: - This version and associated samples have been developed to illustrate the parsing of CSV messages and are bound to the terms of use published at www.reversexsl.com DESIGN NOTES: * This sample definition parses a raw CSV-formatted export from MS-Excel(TM) (e.g. WeeklyReport.csv). * The target XML document is, here, defined for tutorial purposes. It is produced directly by the Parsing step of the ReverseXSL Transformer. * See WeeklyReport.xls for the MS-Excel form and CSV Export macro. * The present version produces an XML document in the built-in ReverseXSL namespace, as bound to the free transformation software. * The worksheet contains a simplified Export MACRO (a real one shall handle exceptions) Sub Export() '*** save the worksheet: ActiveWorkbook.Save '*** enforce co-located files: ChDir ActiveWorkbook.Path '*** erase any previous export: On Error Resume Next Kill "WeeklyReport.csv" '*** export as CSV: Dim w As Worksheet Set w = Worksheets(1) w.SaveAs Filename:="WeeklyReport.csv", FileFormat:=xlCSV, CreateBackup:=True '*** process exported data: Dim Taskid Taskid = Shell("ProcessWeeklyReport.bat", 1) '*** return to the original Excel sheet: Application.Workbooks.Open ("WeeklyReport.xls") w.Activate w.Application.ActiveWindow.Close (False) End Sub * The present Excel & Macro are such that your regional/language settings should not affect number formats in the exported CSV (by Microsoft theory...) * The Marco invokes via a DOS batch the Transformation command: java -cp ReverseXSL.jar;. com.reverseXSL.Transform AUTO SELECT WeeklyReport.csv NONE 0 10 1>WeeklyReport.xml 2>WeeklyReport.log * Please refer to the tutorial section at www.reverseXSL.com for a step by step explanation of the DEF file ---------- CONDITIONS --------------------- *** no conditions are verified *** ---------- MESSAGE DEFINITION ------------- MSG "^Weekly P.* P.* Rep" STEPReport M 1 1 ACC 1 T F "WeeklyReport root element" CUT-ON-NL |D "^(.*)$" SKIP M 2 2 ACC 2 T F "Skip first 2 rows" ASMATCHED |GRP "^Emp No" References M 1 1 ACC 1 T F "Rows 3 to 6 providing reporting references" ||D "^Emp No,(.*),,,,,$" EmployeeNbr M 1 1 ACC 1 T F "Row 3 - Employee Number" NUMERIC ||D "^Name,(.*),,,,,$" Name M 1 1 ACC 1 T F "Row 4 - Name" REPEATED-"[A-Za-z \-]" ||D "^Year,(.*),,,,,$" Year M 1 1 ACC 1 T F "Row 5 - Year" DIGIT ||D "^Week,(.*),\(start.*" WeekNo M 1 1 ACC 1 T F "Row 6 - Weeek number" DIGIT *** Skip rows containing only commas |D "^(,*)$" SKIP M 1 1 ACC 1 T F "Skip row 7" ASMATCHED |GRP "^Project,WBS,PLANNED" Planned M 1 1 ACC 1 T F "Table 1 - Planned Tasks Report" ||D "^(Project,WBS,PLANNED.*)$" SKIP M 1 1 ACC 1 T F "Skip Table 1 heading row" ASMATCHED *** Unfortunately, simple segmentation like CUT-ON-"," will fail whenever a , exists in a data value! *** Below is a first way to solve the issue (explained in the associated tutorial) ||SEG "^(?!,+$)" Task M 1 5 ACC 10 R W "Table 1 rows - Planned Task" CUT '^([^,]*),([^,]*),("[^"]*"|[^,]*),([^,]*),([^,]*),([^,]*)' |||D "(.*)" Project M 1 1 ACC 1 T F "Table 1 row - Col 1 - Project short name" REPEATED-"[A-Za-z _]" [1..] |||D "(.*)" WBS M 1 1 ACC 1 T F "Table 1 row - Col 2 - Work Breakdown Structure code" NUMERIC [1..] *** The next Data element regex strips off optional double quotes around the value |||D '^"?(.*?)"?$' Description O 0 1 ACC 1 T F "Table 1 row - Col 3 - Task Description" ASMATCHED [1..] |||GRP "" ManDays M 1 1 ACC 1 T F "Table 1 row - Col 4&5 - time report" ||||D "(.*)" Planned M 1 1 ACC 1 T F "Table 1 row - Col 4 - Planned man days" NUMERIC [1..] ||||D "(.*)" Effective M 1 1 ACC 1 T F "Table 1 row - Col 5 - Actual man days" NUMERIC [1..] |||D "(.*)%" OverallCompletionPercent O 0 1 ACC 1 T F "Table 1 row - Col 6 - overall task completion estimate" NUMERIC [1..] |D "^(,*)$" SKIP O 0 10 ACC 10 T F "Skip empty rows in/next to table 1" ASMATCHED *** this second grouping is about UNplanned task reports and is indeed *** very similar to the group above |GRP "^Project,WBS,UNPLANNED" Unplanned M 1 1 ACC 1 T F "Table 2 - UNPlanned Tasks Report" ||D "^(Project,WBS,UNPLANNED.*)$" SKIP M 1 1 ACC 1 T F "Skip Table 2 heading row" ASMATCHED *** Below is an alternative CUTting method based on a repeated pattern like 'start_of_line_or_comma followed by anything_up_to_next_double_quote or followed by anything_up_to_next_comma' *** (explained in the associated tutorial) ||SEG "^(?!,+$)" Task M 1 5 ACC 10 T F "Table 2 rows - Planned Task" CUT '(?:^|,)(?:"([^"]*)"|([^,]*))' |||D "(.*)" Project O 0 1 ACC 1 T F "Table 2 row - Col 1 - Project short name" REPEATED-"[A-Za-z _]" [1..] |||D "(.*)" WBS O 0 1 ACC 1 T F "Table 2 row - Col 2 - Work Breakdown Structure code" NUMERIC [1..] *** Thanks to the alternative segment-cutting method, we do not need to strip out double quotes from data element values *** (already removed in segment cuts) |||D "(.*)" Description M 1 1 ACC 1 T F "Table 2 row - Col 3 - Task Description" ASMATCHED [1..] |||GRP "" ManDays M 1 1 ACC 1 T F "Table 2 row - Col 4&5 - time report" ||||D "(.*)" SKIP O 0 1 ACC 1 T F "Table 2 row - Col 4 - Planned man days = Not Applicable!" NUMERIC [1..] ||||D "(.*)" Effective M 1 1 ACC 1 T F "Table 2 row - Col 5 - Actual man days" NUMERIC [1..] |||D "(.*)%" OverallCompletionPercent O 0 1 ACC 1 T F "Table 2 row - Col 6 - overall task completion estimate" NUMERIC [1..] |||D "(.*)" SKIP O 0 5 ACC 5 R W "Table 2 row - trailing columns - ignored" ASMATCHED |D "^(,*)$|^(,,Total .*)$" SKIP O 0 10 ACC 10 T F "Skip empty rows in/next to table 2" ASMATCHED END