Regex, Active Patterns, and F#
At work, I am working on automating a lot of the business processes (my own initiative). One challenge I ran into today was that I needed to extract dates out of filenames. These files were generated by different business groups. Within each group, their filenames were consistent, but from my perspective, I saw three different formats.
Foo_Bar_8-19-2016.csvNew Foo_Bar_20160819.csv or New Foo_bar_20160627_Q2-2016.csv (quarterly file)
Underscores_everywhere_2016_08_22.csv
Let's jump straight to the 10 lines of F# code needed to retrieve the dates from the files, and then I'll explain.
let ExtractDate (fileName: string) = let (|DateNumbers|_|) pattern input = let m = Regex.Match(input, pattern) if m.Success then [for g in m.Groups -> g.Value] |> List.tail |> Some else None match fileName with | DateNumbers @"(\d{4})_(\d{1,2})_(\d{1,2})" [year; month; day] -> DateTime(int year, int month, int day) | DateNumbers @"(\d{1,2})-(\d{1,2})-(\d{4})" [month; day; year] -> DateTime(int year, int month, int day) | DateNumbers @"(\d{4})(\d{2})(\d{2})" [year; month; day] -> DateTime(int year, int month, int day) | _ -> failwithf "not a valid date"
ExtractDate is a function that takes in a string and returns a DateTime object. The first thing we did was to create an active pattern called DateNumbers that takes both a regular expression string along with an input string (fileName). With the Groups property of the .NET Regex (proper parenthesis placement was paramount here), and F#'s powerful pattern matching, I am easily able to extract the year, month, and day in order to convert my string into a proper DateTime object.
You can read and understand more about active patterns here