Visualizing Africa-China Ag. Trade

Though China’s largest overall agricultural trading partners are outside of Africa, there is a substantial volume of agricultural trade flowing from the continent to China. The UN commodities trading database (COMTRADE) tracks these flows over time and down to specific commodity (for the most part). There’s plenty of data to crunch here; there’s also plenty of data to visualize.

But how?

As we’re dealing with trade flows, my first though was to use a flow map to highlight which African countries were exporting more agricultural goods to China than others. I used ESRI ArcMap’s built in flow map tool to do so.

4blog_flows

While it gets the job done, the flow map is easily cluttered (even more so with labels, which I left off above). Talent with Illustrator or similar software would probably result in much cleaner flow maps. It’s a work in progress.

4blog_trade

I find this second version easier to read – one glance and I understand Zimbabwe exports the most to China re: agriculture goods. However, this simplified version looses the easy connection with place that you get from using country borders as the background.

4blog_partners

This third version eschews looking at trade ($) volume all together and instead focuses on major commodity type. Straightforward.

I used a combo of the second and third images for a recent conference poster. They’re effective, but I still think the flow map could be refined into a more visually striking representation of Africa-China ag. trade. Though, none of the versions shown above even start to touch on changes in trade over time. That’s the next hurdle.

Advertisements

Too Many Topics

My colleagues, Dr. Achberger and Junle Ma from CAU, and I are working on a topic modeling project that compares major themes from collections of English and Mandarin academic texts on the same subject. While we’re consolidating the majority of the work and results into an article for submission, there is a portion of the project delegated to blog posts. Namely: the topic modeling runs that ‘didn’t work.’

We’re using the MALLET software to train our topic models and MALLET, like most LDA topic models, makes the major assumption that the user knows the ‘correct’ number of topics present in your collection of text beforehand.

Oftentimes this assumption is not true. Thus begins the trial and error of figuring out what the ‘right’ number of topics is. What range to test (i.e. small, zero to a couple dozen, or large, into the hundreds or thousands) depends on the size of your corpus and what level of detail you want from the topics. For example, are you looking for every possible topic across the texts or just the major-level topics that themselves may contain several subtopics?

Given our small corpus and interest in broad themes, we ran experimental topic models between 5 and 20 topics. It very quickly became clear that 15+ topics was several topics too many.

Here, I share the results of our topic model run with 15 and 20 topics and highlight what a cohesive topic might look like and how our results did not always achieve that cohesion.

Results

EnglishFullText_15-and-20 and ChineseFullText_15-and-20

Let’s talk about what does work first. Here are two examples, one from each language, from the 15 topic run. The top 20 key words for each topic are listed. We can see that the key words for Topic #3 from the Mandarin run mostly work together to contextualize the state of African (agricultural) economies. From the English example, we can see how words like “migrants”, “farms”, “embassy”, and “vegetables” together form the Chinese Farmers (in Africa) topic.

exampleC-goodexampleE-good

I picked what I thought were the most cohesive topics for the examples above. From there, the key words become more and more muddled until any interpret able topic definition seems lost. Now we get to what doesn’t work.

Topic #8 in Chinese and #10 in English both feature a hodgepodge of words that, while one or three might relate to each other, do not work together as a whole to describe a topic. For example, from the Mandarin group, we can see how “field survey” might relate to “sustainability” but not as readily to “processing plant”. The English key words are even less easy to interpret, with several verbs and adjectives (e.g. “explained”, “providing”, “greater”).

exampleC-badexampleE-bad

Now this is not to say the above results are ‘garbage’; they just don’t work for our research purposes. As we’re looking for concrete, major themes from our collections of text, it seems we’d be better served running topic models with a smaller number of topics. And that’s exactly what we did.

MALLET Code

For those interested, here are the lines of code we fed MALLET in order to run the topic models:

bin\mallet train-topics  –input chinesefull.mallet  –num-topics 15 –output-state Cfull15.gz  –output-topic-keys Cfull15_keys.txt –output-doc-topics Cfull15_composition.txt

The code uses the base mallet file that contains all the articles’ full text as input. The number of topics is set at 15, and outputs both a key word and a document composition text file. The green text can be edited to change the name of the files and/or number of topics tested.

References

For further reading on LDA topic models: Blei, David M. “Probabilistic topic models.” Communications of the ACM 55.4 (2012): 77-84.

MALLET: McCallum, Andrew Kachites. “MALLET: A Machine Learning for Language Toolkit.” http://mallet.cs.umass.edu. 2002.

China’s Scholarships for African Students & FOCAC

AfricaScholarships

From 2003 until 2008, the Chinese Ministry of Education (MOE) reports on international students in China included a by-region breakdown for Chinese government scholarship data.

Starting in 2006, the Chinese government included at each Forum on China-Africa Cooperation (FOCAC) summit scholarship targets for bringing African students to study in China. In order to evaluate how China has upheld these pledges, Dr. Moore and I used the 2003-2008 scholarship data to estimate the number of Chinese scholarships to African students from 2009 onwards. We created a range of possible values based on the assumption of limited growth (linear) and best-fitting growth (exponential). Based on these estimates, China is most likely upholding the FOCAC scholarship pledges. 

FOCACpledges

* We’re aware this upper boundary figure for the 2018 estimate is not plausible. The farther out the prediction, the more likely it is that the exponential curve no longer fits reality. The exponential curve, though the best-fit using the provided 2003-2008 scholarship data, is probably just capturing the early portion of a logarithmic function that we would expect for something that is tied to population growth. Thus the choice to include a range of scholarships-given using both linear and exponential future growth.

Still, as shown in the figure below, even using only the linear estimates keeps China’s provided scholarships in pace with FOCAC pledges.

Figure2_scholarships_v2

The only known comparison we have is that at FOCAC in 2006, China declared they would “increase the number of Chinese government scholarships to African students from the current 2,000 per year to 4,000 per year by 2009.” According to the MOE, 1,861 African students received Chinese government scholarships in 2006, so the 2,000 estimate quoted in FOCAC was rounded up slightly.

Updates: 

After The Conversation article, some Twitter feeds and friends have led to a few more reports.

  1. The continued strength of China’s educational aid to Africa from the Institute of International and Comparative Education
  2. Guangzhou, that which African students love and hate from the Southern Metro Daily