Regex, Active Patterns, and F#

At work, I am working on automating a lot of the business processes (my own initiative). One challenge I ran into today was that I needed to extract dates out of filenames. These files were generated by different business groups. Within each group, their filenames were consistent, but from my perspective, I saw three different formats.

Foo_Bar_8-19-2016.csv
New Foo_Bar_20160819.csv or New Foo_bar_20160627_Q2-2016.csv (quarterly file)
Underscores_everywhere_2016_08_22.csv

Let's jump straight to the 10 lines of F# code needed to retrieve the dates from the files, and then I'll explain.

let ExtractDate (fileName: string) =
  let (|DateNumbers|_|) pattern input =
    let m = Regex.Match(input, pattern)
    if m.Success then [for g in m.Groups -> g.Value] |> List.tail |> Some
    else None

  match fileName with
  | DateNumbers @"(\d{4})_(\d{1,2})_(\d{1,2})" [year; month; day] -> DateTime(int year, int month, int day)
  | DateNumbers @"(\d{1,2})-(\d{1,2})-(\d{4})" [month; day; year] -> DateTime(int year, int month, int day)
  | DateNumbers @"(\d{4})(\d{2})(\d{2})" [year; month; day] -> DateTime(int year, int month, int day)
  | _ -> failwithf "not a valid date"

ExtractDate is a function that takes in a string and returns a DateTime object. The first thing we did was to create an active pattern called DateNumbers that takes both a regular expression string along with an input string (fileName). With the Groups property of the .NET Regex (proper parenthesis placement was paramount here), and F#'s powerful pattern matching, I am easily able to extract the year, month, and day in order to convert my string into a proper DateTime object.


You can read and understand more about active patterns here