Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multiple font use within each text #494

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

okomestudio
Copy link
Contributor

Sometimes the use of more than one font is desired, as in cases where multiple languages are mixed within a text and using a font with a fuller set of symbols as well as emojis is necessary. This PR implements this by allowing extra_font_paths to be defined, which uses regex to decide which font to apply for the matched (sub)text pattern.

Some comments on the code and other changes:

  • The LayoutItem class is introduced to make it easier to keep track of each item in WordCloud.layout_, so that unpacking becomes simpler. It also exposes methods for drawing a text and recoloring to reduce code redundancy.

  • Text rendering with font(s) is refactored into the FontInfo class. The basic idea is for a text that contains patterns that match regex objects defined in extra_font_paths, it keeps track of which font to apply at which text indices and switch fonts accordingly when actually drawing mask. If the text should be rendered with default (i.e., WordCloud.font_path) rendering behaves exactly the same way as before.

  • examples/emoji.py is renamed to examples/emoji_example.py to avoid a package import issue with an external package named emoji (which is only used in the new example script).

  • See examples/mix_fonts.py, as well as an example output examples/mix_fonts.png for usage.

Note that I have not fully taken care of Python 2.7/3.x compatibility (I'm on 3.7 personally), documentation, etc., as I would first like to know if the upstream benefits from this PR implementing this new feature, which involves modest refactoring. If a general agreement arises to eventually merge this PR, I would then work on cleaning things up to bring the PR to the standards.

@amueller
Copy link
Owner

amueller commented Sep 3, 2019

Honestly I'm not sure how useful this is. If your goal is to support multiple languages and emojis, you can just combine multiple fonts into a single font.
Your regex approach is clearly more flexible, but if your goal is supporting multiple languages, it might not be the easiest to use as you need to specify regex for different languages, which might be tricky in general?

@amueller
Copy link
Owner

amueller commented Sep 3, 2019

you can probably use https://github.com/fonttools/fonttools and pyftmerge to merge them on the fly even, right?

@amueller
Copy link
Owner

amueller commented Sep 3, 2019

There's also tools for merging multiple noto fonts: https://github.com/googlefonts/nototools/blob/master/nototools/merge_fonts.py

@changbowen
Copy link

changbowen commented Jan 22, 2020

I would vote for supporting multiple fonts... even only for displaying English words in different fonts
perhaps make font_path accept a folder with font files inside?

@amueller
Copy link
Owner

@changbowen and how would you assign the fonts to words? Randomly? That could work though it's a different feature than this.

@jnothman
Copy link

jnothman commented Jan 22, 2020 via email

@amueller
Copy link
Owner

@jnothman yeah though colors can be done afterwards while changing the font changes the geometry.

@okomestudio
Copy link
Contributor Author

I'd be happy to adjust the PR for whatever makes sense so that it can be merged eventually (though I'll be tied up for about a week and won't be able to start working again on this immediately).

A feature like @changbowen suggested would probably require a minor API adjustment, but if people want the feature, then this PR could be a good timing to discuss how it could be implemented without intrusive changes.

As for @amueller's initial suggestion on creating a new font (sorry I didn't respond earlier), I find it very tedious to merge fonts only for temporary use of rendering images. My initial motivation was to generate word clouds in multiple languages (like English and Japanese, as well as math/emoji symbols), and it's not very practical to repeat font merging every time I desire even a slight design change.

At the same time, I wanted to make sure that the effort would be warranted before pushing the multiple-font feature (as well as relatively modest refactoring involved). I'd rather contribute the code to the upstream library than keeping around a forked version for my personal use, but I do understand if there exists a desire to keep the library minimal and light for general use.

I needed to see the PR is aligned in the direction the library maintainers would wish to go.

@changbowen
Copy link

@amueller Yes a random font for each string would be enough for most people I think. Probably it's not something as complicated as this PR :)
@okomestudio Totally agree that merging fonts shouldn't be something required to generate a word cloud.

@okomestudio
Copy link
Contributor Author

The PR has been updated to propagate changes to the recently included SVG output rendering. Three new examples are added with their SVG outputs (which should probably removed at some point) to demonstrate how font_path accommodates multiple fonts as well as the substring match feature.

I can spend some more time to reflect feedback on the code and APIs, adding more tests and documentation, etc., if this PR were to be pushed at some point. I'm fairly neutral as to what and how to do with the feature itself; if the feature itself is desired but the PR is too much of change, I can adjust the code accordingly, but I need feedback.

Either way, I would like to know at some point if I should keep trying to improve the PR with the expectation of eventually merging to the upstream, or just keep the feature as a fork for my personal use. Thanks!

@amueller amueller mentioned this pull request Sep 4, 2020
@amueller
Copy link
Owner

amueller commented Sep 4, 2020

I think it might actually be nice to merge this, but the PR is quite hard to review right now because it moves around a lot of code. I'm not sure if it's possible to make the refactoring be a separate PR?
It would also be nice not to let the main file get too big.

@okomestudio
Copy link
Contributor Author

I can help with refactoring (in separate PRs) if you (or some other commiter(s) who are well-acquainted with the repo) give me some guideline as to how I should go about refactoring the code into a few components/submodules. Then I could comment and make suggestions based on what's included in this PR, if any.

The code is the way it is in the PR only because of the need to carefully keep track of text positioning when multiple fonts are mixed within a text, while sticking to DRY as much a possible. I certainly didn't want to move some many components around at once, and I'm obviously not the best person to make design decisions for the whole code base.

But I can help with coding, since I actually like cleaning up code and refactoring.

@okomestudio
Copy link
Contributor Author

If possible, I'd like to take care of this sooner than later now that I have some time to work on this. (With open-source efforts it's often difficult to expect sustained, consistent commitment due to job change, etc.)

So let me make refactoring suggestion here, so that I don't have to change so much in this single PR, which is potentially intrusive due to necessary internal change.

My suggestion would be to break up into two steps:

  1. Refactor image rendering code into the renderers submodule, so that the common pattern does not need to be duplicated for rendering bitmap images and SVG images from a WordCloud object. The submodule will basically consists of the Renderer classes, ImageRendere for bitmap and SVGRenderer for SVG, just as in this PR (but without the multi-font support to keep things simple). This should be almost pure refactoring of the current master, without any new features introduced.

  2. Implement the multi-font feature which is the main thing in this PR.

If there are smaller components that can be independently refactored more or less, the two steps above should be further subdivided to smaller PRs to make code review easier, but currently the major reason why this PR appears harder to review is that the two things above are done at once.

Once there is some sort of consensus on this, I'll make a PR for each item, starting from (1).

How does this sound?

@walkowl
Copy link

walkowl commented Apr 12, 2022

Such a shame it wasn't finished and merged...I vote for it anyway!

@AnkushChharri
Copy link

We needed multiple fonts support to resolve all glyphs or unicodes. Why you not do that? @amueller @okomestudio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants