Skip to content

Latest commit

 

History

History
194 lines (156 loc) · 8.26 KB

README.md

File metadata and controls

194 lines (156 loc) · 8.26 KB

Mr. MIME (Multipurpose Internet Mail Extensions)

mrmime is a library to parse and generate mail according several RFCs:

  • RFC822: Standard For The Format of ARPA Internet Text Messages
  • RFC2822: Internet Message Format
  • RFC5321: Simple Mail Transfer Protocol
  • RFC5322: Internet Message Format
  • RfC2045: MIME Part One: Format of Internet Message Bodies
  • RFC2046: MIME Part Two: Media Types
  • RFC2047: MIME Part Three: Message-Header Extensions for Non-ASCII Text
  • RFC2049: MIME Part Five: Conformance Criteria and Examples
  • RFC6532: Internationalized Email Headers

mrmime was made with angstrom to be able to parse mails and try to do the best-effort. From a bunch of mails (2 billions), mrmime is able to parse all of them - however, results can diverge from what you expect.

In other side, mrmime is able to generate valid mail from an OCaml description. Generation follows some rules:

  • stream produced emits only line per line
  • we do the best-effort to limit lines by 78 characters
  • we follows RFC6532 and emit UTF-8 mail

How to parse a mail?

We have different ways to parse a mail and it's depends of what you want. In fact, in some ways, you should be interesting only by the header part. In some others cases, you probably want bodies. We decide to separate these tasks into 2 API (which differ) to fit under some constraints.

For example, if you want to extract only the header, we probably want to take care about memory consumption - if you want, for example, to implement a SMTP server and where only the header is interesting.

An stream API is provided in this case and from this, we are able to implement a DKIM checker which needs only one-pass to verify your mail.

In other side, if you want to extract bodies of your mail, parser provided is not a stream parser where we need to extract bodies from a multipart mail. An explanation of how to use it is given in this document.

Parse only the header part

For many purposes, we are mostly interesting to parse only the header part of a mail. In this case, Hd sub-module should be what you want.

A complex example of Hd is available on the ocaml-dkim project which wants to extract DKIM signature from header.

let dkim_signature = Mrmime.Field_name.v "DKIM-Signature"

let extract_dkim () =
  let open Mrmime in
  let tmp = Bytes.create 0x1000 in
  let buffer = Bigstringaf.create 0x1000 in
  let decoder = Hd.decoder buffer in
  let rec decode () = match Hd.decode decoder with
    | `Field field ->
      ( match Location.prj field with
      | Field.Field (field_name, Unstructured, v)
          when Field_name.equal field_name dkim_signature ->
        Fmt.pr "%a: %a\n%!" Field_name.pp dkim_signature Unstructured.pp v
      | _ -> decode () )
    | `Malformed err -> failwith err
    | `End rest -> ()
    | `Await ->
      let len = input stdin tmp 0 (Bytes.length tmp) in
      ( match Hd.src decoder (Bytes.unsafe_to_string tmp) 0 len with
        | Ok () -> decode ()
        | Error (`Msg err) -> failwith err ) in
  decode ()

This little snippet will parse a mail which is encoded with CRLF end-of-line from stdin (so you should map your mail with this newline convention). When it reachs a DKIM field, it prints a well-parsed value of it (in our case, an unstructured value). [Other] corresponds to other fields - DKIM signature can appear here where we failed to parse value as an unstructured value.

Parse entirely a mail

Of course, the initial goal of mrmime is to parse an entire mail. In this case, you should use the Mail sub-module which provides angstrom parser.

Bodies can be weight and if you want to store them by yourself, we provide an API which expects consumers to consume bodies (and store them, for example, into UNIX files).

A complex example is available on ptt to extract bodies and save them into UNIX files. For this we use:

val stream : emitters:(Header.t -> (string option -> unit) * 'id) -> (Header.t * 'id t) Angstrom.t

Which will call emitters at any part of your mail. parser will decode properly part (according Content-Transfer-Encoding) and give you inputs into your consumer.

How to emit a mail?

mrmime is able to generate a mail from an OCaml description of it. You have several ways to craft informations like address or Content-Type field for a specific part.

Many sub-modules of mrmime provide a way to construct an information like a subject needed for you mail or recipients of it. For example, the sub-module Mailbox provides an easy way to construct an address:

let romain_calascibetta =
  let open Mrmime.Mailbox in
  Local.[ w "romain"; w "calascibetta" ] @ Domain.(domain, [ a "x25519"; a "net" ])

Documentation was done to help you to construct many of these values. Of course, Header will be the module to construct an header:

let header =
  let open Mrmime in
  Field.[ Field (Field_name.subject, Unstructured,
                 Unstructured.Craft.(compile [ v "Simple"; sp 1; v "Email" ]))
        ; Field (Field_name.v "To", Addresses, [ `Mailbox romain_calascibetta ])
        ; Field (Field_name.date, Date, (Date.of_ptime ~zone:GMT (Ptime_clock.now ()))) ]
  |> Header.of_list

Then, Header provides a to_stream function which will emit your header line per line (with the CRLF newline convention) - mostly to be able to branch it into a SMTP pipe.

Finally, for a multipart mail, the Mt sub-module is the most interesting to make part from stream (stream from a file or from standard input) associated to Content fields (like Content-Transfer-Encoding). mrmime takes care about how to encode your stream (base64 or quoted-printable).

A complex example of how to use Mt module is available in facteur project which is able to send a multipart mail.

Encoding

A real effort was made to consider any inputs/outputs of mrmime as UTF-8 string. This result is done by some underlying packages:

  • rosetta as universal unifier to unicode
  • uuuu as mapper from ISO-8859 to Unicode
  • coin as mapper from KOI8-{U,R} to Unicode
  • yuscii as mapper from UTF-7 to Unicode

SMTP protocol constraints bodies to use only 7 bits per byte (historial limitation). By this way, encoding such as quoted-printable or base64 are used to encode bodies and respect this limitation. mrmime uses:

  • pecu as a stream encoder/decoder
  • base64 (base64.rfc2045 sub-package) as a stream encoder/decoder

Status of the project

mrmime is really experimental. Where it wants to take care about many purposes (encoding or multipart), API should change often. We reach a first version because we are able to send a well formed multipart mail from it - however, it's possible to reach weird case where mrmime can emit invalid mail.

About parser, the same advise is done where Mail format is not really respected by implementations in many cases and the parser should fail on some of them for a weird reason.

Of course, feedback is expected to improve it. So you can use it, but you should not expect an industrial quality - I mean, not yet. So play with it, and enjoy your hacking!

mrmime has received funding from the Next Generation Internet Initiative (NGI) within the framework of the DAPSI Project.