-
I used Mojo's (v. 0.7.0) dictionary data structure to calculate the frequency of words in a file with 230+ million words, and did the same with Python. Surprisingly, Python was 7x times faster than Mojo. I imagine I did something wrong, as I was expecting "the usability of Python with the performance of C". Anyone have an idea? I'll link to my Mojo and Python scripts below. My Mojo script: My Python script: A video in which I test the above two scripts: |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 5 replies
-
Thank you for sharing your insights. This is precisely what I was looking for; A detailed comparison of the identical implementation in both languages. It's important not to make unsubstantiated claims about Mojo being faster than Rust without a thorough examination. Now, Imagine if we could contrast that with Rust; it would make Mojo significantly slower, with a performance difference like ~100X. |
Beta Was this translation helpful? Give feedback.
-
In your case, my guess is that it is memory allocation that takes the most time, because Notice large part of Mojo's standard library is still WIP. To be honest, you could make Mojo look even worse, for your Python implementation is not optimal. It could be shorter and faster by using |
Beta Was this translation helpful? Give feedback.
-
Hello @ekbrown , nice ! ➡️ Here is how you could use references, with # mojo nightly 2024.6.1912 (f72bd5ea)
fn get_freqs(in_path: String) raises -> String:
var txt: String = get_text(in_path)
var wds: List[String] = txt.split(" ")
var t0 = now()
var freqs = Dict[String, UInt64]()
for i in wds:
if i[] in freqs:
freqs.__get_ref(i[]) += 1
else:
freqs[i[]] = 1
var t1 = now()
print((t1-t0)/1_000_000_000) 🥳 This should at least double the performances ! (1.98x here )
Mojo now support auto dereference, which reduce the amount of empty It is just a question of time before they turn into a new (a lot of thoughts are devoted to design theses API's)
|
Beta Was this translation helpful? Give feedback.
In your case, my guess is that it is memory allocation that takes the most time, because
String
doesn't have small string optimisation.Notice large part of Mojo's standard library is still WIP.
Dict
for one, is only introduced in 0.7.0 (the latest version), and it's more "place holder" to make sure things mash together than being fully optimised implementation. Introduce them early helps getting the API right, and can serve to replace the many hand rolled implementation by the community. There are both minimal hand-rolled hash map and hash function that perform better than the standard library ones, which proves my point. Notice Mojo is a system programming language like C++ and Rust. It…