Using Natural Language Processing to Extract Data about CO2 Reduction Reaction from the Scientific Literature
In the last 70 years, CO2 emissions in the Earth’s atmosphere have drastically surged, leading to a
1ºC increase in the Earth’s temperature. Climate models project this rise to reach 2.1 to 2.5ºC in 2100.
An alternative to mitigate this issue is to convert carbon dioxide into compounds that can be used as
chemical feedstocks and fuels, creating a closed CO2 cycle. However, this process has to be mediated
by a catalyst that must be stable, selective, active, and easily accessible to be economically viable. It
is therefore understandable and desirable that the topic of CO2 reduction reaction (CO2RR) has been
addressed by several research groups, with more than 16000 articles already published. However, all
this literature hampers a manual and comprehensive review of all the structures and methods utilized.
Therefore, we employed natural language processing (NLP) to analyze the data already published on
this topic in the scientific literature. We have devised an in-house code to process and separate sentences
according to the sections they extracted. With these samples, we created a model to classify new sentences
or unidentified sentences into “abstract”, “introduction”, “methodology”, “results and discussion”, and
“conclusions”. Later, we used the cleaned text to generate word embedding models and assessed their
quality based on their ability to cluster common terms in CO2RR literature. Finally, we leveraged regular
expressions to extract information about materials composition, electrolytes, and some important metrics
reported in the literature in our corpus, the faradaic efficiency and the applied potential. We found that
Ni is one of the elements highly used in catalysts for CO2RR, being in the top-3 rank along with Cu and
Ag, and spotted a Cu-based material with an astonishingly low FE for methane. We plan to amplify
the data collected, so we can create an exhaustive database that may be used for reviewing this field
and maybe steer future research by providing some insights into materials and approaches that seem
promising.