Current Report – Some concentration information may be disclosed in Form 8-K. For example, the signing of a merger agreement for the first time or, if an acquisition does not require shareholder approval, information about the company to be acquired. We can then view data as part solicit_df of data either inside the Jupyter laptop or by exporting numbers. Figure 3 shows z.B. a histogram of the basic frequency by appointment for the “solicit” concept. In this limited example, we can see that 7 out of 20 (35%) the agreements do not contain any mention of “request,” “invitation” or other watchword. However, the other 13 agreements have at least one deposit and 8 have at least two incidents. Finally, for each chord, we use LexNLP ([10]) to extract the stems for each set and count the number of strains “requested” by agreement. This number is tracked with the document`s basic metadata, then converted into a panda data frame for further analysis. While a number of more sophisticated clause extraction and classification protocols can be developed using [10] and other open and closed source tools, we offer this simple example as an illustrative starting point. The Securities and Exchange Commission`s (SEC) EDGAR database, which contains regulatory statements from publicly traded U.S. companies.

Our first example is based on previous scientific studies conducted by two authors of this paper in [15] in which we analyzed the 10-K annual reports of more than 34,000 companies that file for more than 23 years. We have reviewed this form of 10K reports on references to U.S. laws and federal authorities, and we have created a database of more than 4.5 million records, including hundreds of regulations and regulators. To replicate this analysis or conduct similar research, it is essential that scientists have access to 10K submissions and their content. The Securities and Exchange Commission (SEC) requires state-owned enterprises, certain distributors of companies and brokers, regular financial statements and other information. Financial experts and investors are sticking to SEC submissions to assess whether they want to invest in a company to make informed decisions. SEC submissions can be accessed free of charge from EDGAR, the Commission`s online database. OpenEDGAR is an open source python framework designed to quickly create research databases based on the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system of the U.S. Securities and Exchange Commission (SEC). OpenEDGAR is based on Django application work, distributed calculation on one or more servers and includes functions for recovery and analysis of EDGAR indexing and transfer data, (ii) creating tables for key metadata such as form type and thread, (iii) CIK recovery, analysis and update on matches and industry classifications; (iv) extracting content and metadata from the archiving documents and (v) file content. OpenEDGAR is designed for use in academic research and industrial applications and is marketed under ANST on github.com/LexPredict/openedgar.