Blog Detail

  • 【python爬虫课程设计】从懂球帝爬取中超联赛知名运动员数据+数据可视化

    一、选题背景:

    中超联赛作为中国顶级足球赛事,吸引了广泛的关注,其球员数据包含了丰富的信息,涵盖球员技术、表现和比赛策略等方面。随着数据科学技术的不断发展,对于足球俱乐部和教练来说,充分利用这些数据进行分析和挖掘,以制定更有效的战术和管理策略变得愈发重要。选题背景重点:1. 数据驱动的足球管理:中超俱乐部和教练需要通过深度分析球员数据来了解球员表现、评估战术,以及预测比赛结果,从而制定更有效的管理和竞技策略。2. 决策支持和智能化分析:利用大数据分析、机器学习和统计建模等技术,为决策者提供智能化的分析工具,辅助他们做出更准确的战术和球员管理决策。3. 培养数据科学与足球运动结合的跨学科能力:通过这门课程,学生将能够学习和应用数据科学技术,结合足球运动领域的实际问题进行数据分析和解决方案设计。4. 促进足球运动的科技创新:通过分析球员数据,发掘潜在的技术创新机会,为球队带来竞争优势和创新性的解决方案。这个选题背景将侧重于中超联赛球员数据的分析,强调数据科学在提升足球运动管理、战术决策和推动创新方面的作用。

    二、目标实现设计方案:

    1.数据获取:

    数据来源:从官方网站、API或其他数据提供商获取中超联赛球员数据。数据类型:球员基本信息、比赛统计数据(进球数、助攻数、传球成功率等)、位置信息等。数据格式:采用 JSON、CSV 或其他常用数据格式。2. 数据处理与清洗:数据清洗:处理缺失值、重复值和异常值。数据整合:整合多个数据源,确保数据格式一致性。特征工程:构建新特征、转换数据类型,以支持后续的分析需求。3. 数据分析:基本统计分析:球员得分、助攻、传球成功率等基本指标的统计。比较分析:不同球队、位置或赛季的数据对比分析。预测分析:使用机器学习或统计模型预测球员未来表现或比赛结果。4.结果呈现与报告:数据解释与结论:解释分析结果,提出洞察和结论。数据分析报告:要求学生提交数据分析报告或进行展示,以呈现他们的分析发现和建议。三、主题页面的结构特征分析:

    四、网络爬虫程序设计:

    导入所需要的库,并验证获取信息的网站是否可靠

    1 import urllib

    2 import csv

    3 from bs4 import BeautifulSoup

    4 from lxml import etree

    5

    6 # 检查是否存在球员

    7 def checkHtml(num):

    8 url = "https://www.dongqiudi.com/player/%s.html" % num

    9 html = askURL(url)

    10 soup = BeautifulSoup(html, "html.parser")

    11 name = soup.find('p', attrs={'class': 'china-name'})

    12 if (name == None):

    13 print('无效网站')

    14 return 'none'

    15 else:

    16 return soup

    17

    18 # 获取数据,并储存

    19 def getData(soup):

    20 # url = "https://www.dongqiudi.com/player/%s.html" % num

    21 # html = askURL(url)

    22 # soup = BeautifulSoup(html, "html.parser")

    获取球员详细信息

    1 # 姓名

    2 name = soup.find('p', attrs={'class': 'china-name'})

    3 name = str(name)

    4 con = etree.HTML(name)

    5 namestr = con.xpath("//p/text()")

    6 name = namestr[0]

    7 print(name)

    8

    9

    10 # 获取详细信息list

    11 detail_list = []

    12 detail_info_div = soup.find('div',attrs={'class': 'detail-info'})

    13 # con2 = etree.HTML(detail_info_div)

    14 detail_info_ul = detail_info_div.find_all('li')

    15 for each in detail_info_ul:

    16 detail = each.text.strip()

    17 detail_list.append(detail)

    18 # print(detail_list)

    19

    20 # 俱乐部

    21 club = str(detail_list[0]).replace('俱乐部:' ,'')

    22 # print('俱乐部', club)

    23 # 国籍

    24 contry = str(detail_list[1]).replace('国 籍:' ,'')

    25 # print('国籍', contry)

    26 # 身高

    27 height = 0

    28 heightstr = str(detail_list[2]).replace('CM', '')

    29 heightstr = heightstr.replace('身 高:', '')

    30 if heightstr != '':

    31 height = int(heightstr)

    32 # print('身高', height)

    33 # 位置

    34 location = str(detail_list[3]).replace('位 置:', '')

    35 # print('位置', location)

    36 # 年龄

    37 age = 0

    38 agestr = str(detail_list[4]).replace('年 龄:', '')

    39 agestr = agestr.replace('岁', '')

    40 if agestr != '':

    41 age = int(agestr)

    42 # print('年龄', age)

    43 # 体重

    44 weight = 0

    45 weightstr = str(detail_list[5]).replace('体 重:', '')

    46 weightstr = weightstr.replace('KG', '')

    47 if weightstr != '':

    48 weight = weightstr

    49 # print('体重', weight)

    50 # 号码

    51 number = 0

    52 numberstr = str(detail_list[6]).replace('号 码:', '')

    53 numberstr = numberstr.replace('号', '')

    54 if numberstr != '':

    55 number = int(numberstr)

    56 # print('号码', number)

    57 # 生日

    58 birth = str(detail_list[7]).replace('生 日:', '')

    59 # print(birth)

    60 # 惯用脚

    61 foot = str(detail_list[8]).replace('惯用脚:', '')

    62 # print(foot)

    63

    64 # 获取俱乐部比赛数据详细信息list

    65 total_con_wrap_div = soup.find('div', attrs={'class': 'total-con-wrap'})

    66 total_con_wrap_td = str(total_con_wrap_div.find_all('p', attrs={'class': 'td'}))

    67 con3 = etree.HTML(total_con_wrap_td)

    68 detail_info_list = con3.xpath("//p//span/text()")

    69 detail_info_list_years = con3.xpath("//p")

    70

    71 # 一线队时间(年)

    72 years = len(detail_info_list_years) - 1

    73 # print('一线队时长', len(detail_info_list_years) - 1)

    74

    75 # 总计上场次数

    76 total_session = 0

    77 for i in range(2, len(detail_info_list), 9):

    78 if detail_info_list[i] == '~':

    79 detail_info_list[i] = 0

    80 total_session = total_session+int(detail_info_list[i])

    81 # print('累计出场数', total_session)

    82

    83 # 总计进球数

    84 total_goals = 0

    85 for i in range(4, len(detail_info_list), 9):

    86 if detail_info_list[i] == '~':

    87 detail_info_list[i] = 0

    88 total_goals = total_goals + int(detail_info_list[i])

    89 # print('累计进球数', total_goals)

    90

    91 # 总计助攻数

    92 total_assist = 0

    93 for i in range(5, len(detail_info_list), 9):

    94 if detail_info_list[i] == '~':

    95 detail_info_list[i] = 0

    96 total_assist = total_assist + int(detail_info_list[i])

    97 # print('累计助攻数', total_assist)

    98

    99 # 总计黄牌数

    100 total_yellow_card = 0

    101 for i in range(6, len(detail_info_list), 9):

    102 if detail_info_list[i] == '~':

    103 detail_info_list[i] = 0

    104 total_yellow_card = total_yellow_card + int(detail_info_list[i])

    105 # print('累计黄牌数', total_yellow_card)

    106

    107 # 总计红牌数

    108 total_red_card = 0

    109 for i in range(7, len(detail_info_list), 9):

    110 if detail_info_list[i] == '~':

    111 detail_info_list[i] = 0

    112 total_red_card = total_red_card + int(detail_info_list[i])

    113 # print('累计红牌数', total_red_card)

    114

    115

    116 # 获取总评分

    117 average = 0

    118 speed = 0

    119 power = 0

    120 guard = 0

    121 dribbling = 0

    122 passing = 0

    123 shooting = 0

    124 grade_average = soup.find('p', attrs={'class': 'average'})

    125 if grade_average != None:

    126 con4 = etree.HTML(str(grade_average))

    127 average = con4.xpath("//p//b/text()")

    128 average = int(average[0])

    129 # print('综合能力', average)

    130 # 详细评分

    131 grade_detail_div = soup.find('div', attrs={'class': 'box_chart'})

    132 if grade_detail_div != None:

    133 con5 = etree.HTML(str(grade_detail_div))

    134 grade_detail = con5.xpath("//div//span/text()")

    135 # 速度

    136 speed = int(grade_detail[0])

    137 # print(speed)

    138 # 力量

    139 power = int(grade_detail[1])

    140 # print(power)

    141 # 防守

    142 guard = int(grade_detail[2])

    143 # print(guard)

    144 # 盘带

    145 dribbling = int(grade_detail[3])

    146 # print(dribbling)

    147 # 传球

    148 passing = int(grade_detail[4])

    149 # print(passing)

    150 # 射门

    151 shooting = int(grade_detail[5])

    152 # print(shooting)

    将获取到的信息写入文件

    1 csv.writer(f).writerow([name, club, contry, height, location, age, weight, number, birth, foot, years, total_session,

    2 total_goals, total_assist, total_yellow_card, total_red_card, average, speed, power,

    3 guard, dribbling, passing, shooting])

    得到指定一个URL的网页内容

    1 def askURL(url):

    2 head = { # 模拟浏览器头部信息,向豆瓣服务器发送消息

    3 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36 Edg/96.0.1054.29"

    4 }

    5 # 用户代理,表示告诉豆瓣服务器,我们是什么类型的机器、浏览器(本质上是告诉浏览器,我们可以接收什么水平的文件内容)

    6

    7 request = urllib.request.Request(url, headers=head)

    8 html = ""

    9 try:

    10 response = urllib.request.urlopen(request)

    11 html = response.read().decode("utf-8")

    12 except urllib.error.URLError as e:

    13 if hasattr(e, "code"):

    14 print(e.code)

    15 if hasattr(e, "reason"):

    16 print(e.reason)

    17 return html

    将数据写入csv文件

    1 f = open("足球运动员.csv", mode="a", encoding='utf-8')

    2 # csv.writer(f).writerow(["姓名","俱乐部","国籍","身高(CM)","位置","年龄(岁)","体重(KG)","号码","生日","惯用脚","职业生涯(年)",

    3 # "累计出场","累计进球","累计助攻","累计黄牌","累计红牌","综合能力","速度","力量","防守","盘带","传球","射门"])

    4 for num in range(50184113, 50184150):

    5 print(num)

    6 soup = checkHtml(num)

    7 if soup != 'none':

    8 getData(soup)

    9 # getData(num)

    结果截图

    在获取我所需要的数据后,制作一个中超球员的年龄散点图

    1 import csv

    2 import matplotlib.pyplot as plt

    3 import matplotlib

    4

    5 # 设置中文字体,确保中文显示正常

    6 matplotlib.rcParams['font.sans-serif'] = ['SimHei'] # 设置中文字体为黑体

    7 matplotlib.rcParams['axes.unicode_minus'] = False # 解决坐标轴负号'-'显示问题

    8

    9 # 读取CSV文件并提取年龄数据

    10 ages = []

    11 with open('足球运动员.csv', mode='r', encoding='utf-8') as csv_file:

    12 csv_reader = csv.reader(csv_file)

    13 next(csv_reader) # 跳过标题行

    14 for row in csv_reader:

    15 age = int(row[5]) # 年龄在CSV文件的第6列(索引为5)

    16 ages.append(age)

    17

    18 # 创建散点图

    19 plt.figure(figsize=(8, 6))

    20 plt.scatter(range(1, len(ages) + 1), ages, color='blue', alpha=0.5)

    21 plt.title('年龄散点图')

    22 plt.xlabel('球员编号')

    23 plt.ylabel('年龄')

    24 plt.grid(True)

    25 plt.show()

    我们都知道中国足球很大程度上依赖归化球员,所以一支球队往往有来自多个国家,不同国籍的球员,在知道了他们年龄的分布状态后,我还需要知道他们的国籍分布状态

    1 import csv

    2 import matplotlib.pyplot as plt

    3 import matplotlib

    4

    5 # 设置中文字体,确保中文显示正常

    6 matplotlib.rcParams['font.sans-serif'] = ['SimHei']

    7 matplotlib.rcParams['axes.unicode_minus'] = False

    8

    9 # 读取CSV文件并统计各个国籍的球员数量

    10 nationalities = {}

    11 with open('足球运动员.csv', mode='r', encoding='utf-8') as csv_file:

    12 csv_reader = csv.reader(csv_file)

    13 next(csv_reader) # 跳过标题行

    14 for row in csv_reader:

    15 nationality = row[2] # 国籍在CSV文件的第3列(索引为2)

    16 if nationality in nationalities:

    17 nationalities[nationality] += 1

    18 else:

    19 nationalities[nationality] = 1

    20

    21 # 获取国籍和对应的球员数量

    22 countries = list(nationalities.keys())

    23 player_counts = list(nationalities.values())

    24

    25 # 创建散点图

    26 plt.figure(figsize=(10, 6))

    27 plt.scatter(countries, player_counts, color='red', alpha=0.7)

    28 plt.title('球员各国籍散点图')

    29 plt.xlabel('国籍')

    30 plt.ylabel('球员数量')

    31 plt.xticks(rotation=45) # 旋转x轴标签,防止重叠

    32 plt.grid(True)

    33 plt.tight_layout()

    34 plt.show()

    绘制球员的身高柱状图

    1 import csv

    2 import matplotlib.pyplot as plt

    3

    4 # 读取CSV文件并统计不同身高范围内的球员数量

    5 height_ranges = {'150-160': 0, '161-170': 0, '171-180': 0, '181-190': 0, '191-200': 0, '200以上': 0}

    6 with open('足球运动员.csv', mode='r', encoding='utf-8') as csv_file:

    7 csv_reader = csv.reader(csv_file)

    8 next(csv_reader) # 跳过标题行

    9 for row in csv_reader:

    10 height = int(row[3]) # 身高在CSV文件的第4列(索引为3)

    11 if 150 <= height <= 160:

    12 height_ranges['150-160'] += 1

    13 elif 161 <= height <= 170:

    14 height_ranges['161-170'] += 1

    15 elif 171 <= height <= 180:

    16 height_ranges['171-180'] += 1

    17 elif 181 <= height <= 190:

    18 height_ranges['181-190'] += 1

    19 elif 191 <= height <= 200:

    20 height_ranges['191-200'] += 1

    21 else:

    22 height_ranges['200以上'] += 1

    23

    24 # 获取身高范围和对应的球员数量

    25 height_labels = list(height_ranges.keys())

    26 player_counts = list(height_ranges.values())

    27

    28 # 创建柱状图

    29 plt.figure(figsize=(10, 6))

    30 plt.bar(height_labels, player_counts, color='blue')

    31 plt.title('球员身高柱状图')

    32 plt.xlabel('身高范围')

    33 plt.ylabel('球员数量')

    34 plt.xticks(rotation=45) # 旋转x轴标签,防止重叠

    35 plt.tight_layout()

    36 plt.show()

    五、总代码:

    1 import urllib

    2 import csv

    3 from bs4 import BeautifulSoup

    4 from lxml import etree

    5

    6 # 检查是否存在球员

    7 def checkHtml(num):

    8 url = "https://www.dongqiudi.com/player/%s.html" % num

    9 html = askURL(url)

    10 soup = BeautifulSoup(html, "html.parser")

    11 name = soup.find('p', attrs={'class': 'china-name'})

    12 if (name == None):

    13 print('无效网站')

    14 return 'none'

    15 else:

    16 return soup

    17

    18 # 获取数据,并储存

    19 def getData(soup):

    20 # url = "https://www.dongqiudi.com/player/%s.html" % num

    21 # html = askURL(url)

    22 # soup = BeautifulSoup(html, "html.parser")

    23

    24 # 姓名

    25 name = soup.find('p', attrs={'class': 'china-name'})

    26 name = str(name)

    27 con = etree.HTML(name)

    28 namestr = con.xpath("//p/text()")

    29 name = namestr[0]

    30 print(name)

    31

    32

    33 # 获取详细信息list

    34 detail_list = []

    35 detail_info_div = soup.find('div',attrs={'class': 'detail-info'})

    36 # con2 = etree.HTML(detail_info_div)

    37 detail_info_ul = detail_info_div.find_all('li')

    38 for each in detail_info_ul:

    39 detail = each.text.strip()

    40 detail_list.append(detail)

    41 # print(detail_list)

    42

    43 # 俱乐部

    44 club = str(detail_list[0]).replace('俱乐部:' ,'')

    45 # print('俱乐部', club)

    46 # 国籍

    47 contry = str(detail_list[1]).replace('国 籍:' ,'')

    48 # print('国籍', contry)

    49 # 身高

    50 height = 0

    51 heightstr = str(detail_list[2]).replace('CM', '')

    52 heightstr = heightstr.replace('身 高:', '')

    53 if heightstr != '':

    54 height = int(heightstr)

    55 # print('身高', height)

    56 # 位置

    57 location = str(detail_list[3]).replace('位 置:', '')

    58 # print('位置', location)

    59 # 年龄

    60 age = 0

    61 agestr = str(detail_list[4]).replace('年 龄:', '')

    62 agestr = agestr.replace('岁', '')

    63 if agestr != '':

    64 age = int(agestr)

    65 # print('年龄', age)

    66 # 体重

    67 weight = 0

    68 weightstr = str(detail_list[5]).replace('体 重:', '')

    69 weightstr = weightstr.replace('KG', '')

    70 if weightstr != '':

    71 weight = weightstr

    72 # print('体重', weight)

    73 # 号码

    74 number = 0

    75 numberstr = str(detail_list[6]).replace('号 码:', '')

    76 numberstr = numberstr.replace('号', '')

    77 if numberstr != '':

    78 number = int(numberstr)

    79 # print('号码', number)

    80 # 生日

    81 birth = str(detail_list[7]).replace('生 日:', '')

    82 # print(birth)

    83 # 惯用脚

    84 foot = str(detail_list[8]).replace('惯用脚:', '')

    85 # print(foot)

    86

    87 # 获取俱乐部比赛数据详细信息list

    88 total_con_wrap_div = soup.find('div', attrs={'class': 'total-con-wrap'})

    89 total_con_wrap_td = str(total_con_wrap_div.find_all('p', attrs={'class': 'td'}))

    90 con3 = etree.HTML(total_con_wrap_td)

    91 detail_info_list = con3.xpath("//p//span/text()")

    92 detail_info_list_years = con3.xpath("//p")

    93

    94 # 一线队时间(年)

    95 years = len(detail_info_list_years) - 1

    96 # print('一线队时长', len(detail_info_list_years) - 1)

    97

    98 # 总计上场次数

    99 total_session = 0

    100 for i in range(2, len(detail_info_list), 9):

    101 if detail_info_list[i] == '~':

    102 detail_info_list[i] = 0

    103 total_session = total_session+int(detail_info_list[i])

    104 # print('累计出场数', total_session)

    105

    106 # 总计进球数

    107 total_goals = 0

    108 for i in range(4, len(detail_info_list), 9):

    109 if detail_info_list[i] == '~':

    110 detail_info_list[i] = 0

    111 total_goals = total_goals + int(detail_info_list[i])

    112 # print('累计进球数', total_goals)

    113

    114 # 总计助攻数

    115 total_assist = 0

    116 for i in range(5, len(detail_info_list), 9):

    117 if detail_info_list[i] == '~':

    118 detail_info_list[i] = 0

    119 total_assist = total_assist + int(detail_info_list[i])

    120 # print('累计助攻数', total_assist)

    121

    122 # 总计黄牌数

    123 total_yellow_card = 0

    124 for i in range(6, len(detail_info_list), 9):

    125 if detail_info_list[i] == '~':

    126 detail_info_list[i] = 0

    127 total_yellow_card = total_yellow_card + int(detail_info_list[i])

    128 # print('累计黄牌数', total_yellow_card)

    129

    130 # 总计红牌数

    131 total_red_card = 0

    132 for i in range(7, len(detail_info_list), 9):

    133 if detail_info_list[i] == '~':

    134 detail_info_list[i] = 0

    135 total_red_card = total_red_card + int(detail_info_list[i])

    136 # print('累计红牌数', total_red_card)

    137

    138

    139 # 获取总评分

    140 average = 0

    141 speed = 0

    142 power = 0

    143 guard = 0

    144 dribbling = 0

    145 passing = 0

    146 shooting = 0

    147 grade_average = soup.find('p', attrs={'class': 'average'})

    148 if grade_average != None:

    149 con4 = etree.HTML(str(grade_average))

    150 average = con4.xpath("//p//b/text()")

    151 average = int(average[0])

    152 # print('综合能力', average)

    153 # 详细评分

    154 grade_detail_div = soup.find('div', attrs={'class': 'box_chart'})

    155 if grade_detail_div != None:

    156 con5 = etree.HTML(str(grade_detail_div))

    157 grade_detail = con5.xpath("//div//span/text()")

    158 # 速度

    159 speed = int(grade_detail[0])

    160 # print(speed)

    161 # 力量

    162 power = int(grade_detail[1])

    163 # print(power)

    164 # 防守

    165 guard = int(grade_detail[2])

    166 # print(guard)

    167 # 盘带

    168 dribbling = int(grade_detail[3])

    169 # print(dribbling)

    170 # 传球

    171 passing = int(grade_detail[4])

    172 # print(passing)

    173 # 射门

    174 shooting = int(grade_detail[5])

    175 # print(shooting)

    176

    177

    178 # 写进文件

    179

    180 csv.writer(f).writerow([name, club, contry, height, location, age, weight, number, birth, foot, years, total_session,

    181 total_goals, total_assist, total_yellow_card, total_red_card, average, speed, power,

    182 guard, dribbling, passing, shooting])

    183

    184 # 得到指定一个URL的网页内容

    185 def askURL(url):

    186 head = { # 模拟浏览器头部信息,向豆瓣服务器发送消息

    187 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36 Edg/96.0.1054.29"

    188 }

    189 # 用户代理,表示告诉豆瓣服务器,我们是什么类型的机器、浏览器(本质上是告诉浏览器,我们可以接收什么水平的文件内容)

    190

    191 request = urllib.request.Request(url, headers=head)

    192 html = ""

    193 try:

    194 response = urllib.request.urlopen(request)

    195 html = response.read().decode("utf-8")

    196 except urllib.error.URLError as e:

    197 if hasattr(e, "code"):

    198 print(e.code)

    199 if hasattr(e, "reason"):

    200 print(e.reason)

    201 return html

    202

    203

    204

    205 f = open("足球运动员.csv", mode="a", encoding='utf-8')

    206 # csv.writer(f).writerow(["姓名","俱乐部","国籍","身高(CM)","位置","年龄(岁)","体重(KG)","号码","生日","惯用脚","职业生涯(年)",

    207 # "累计出场","累计进球","累计助攻","累计黄牌","累计红牌","综合能力","速度","力量","防守","盘带","传球","射门"])

    208 for num in range(50184113, 50184150):

    209 print(num)

    210 soup = checkHtml(num)

    211 if soup != 'none':

    212 getData(soup)

    213 # getData(num)

    214

    215

    216

    217

    218 import csv

    219 import matplotlib.pyplot as plt

    220 import matplotlib

    221

    222 # 设置中文字体,确保中文显示正常

    223 matplotlib.rcParams['font.sans-serif'] = ['SimHei'] # 设置中文字体为黑体

    224 matplotlib.rcParams['axes.unicode_minus'] = False # 解决坐标轴负号'-'显示问题

    225

    226 # 读取CSV文件并提取年龄数据

    227 ages = []

    228 with open('足球运动员.csv', mode='r', encoding='utf-8') as csv_file:

    229 csv_reader = csv.reader(csv_file)

    230 next(csv_reader) # 跳过标题行

    231 for row in csv_reader:

    232 age = int(row[5]) # 假设年龄在CSV文件的第6列(索引为5)

    233 ages.append(age)

    234

    235 # 创建散点图

    236 plt.figure(figsize=(8, 6))

    237 plt.scatter(range(1, len(ages) + 1), ages, color='blue', alpha=0.5)

    238 plt.title('年龄散点图')

    239 plt.xlabel('球员编号')

    240 plt.ylabel('年龄')

    241 plt.grid(True)

    242 plt.show()

    243 import csv

    244 import matplotlib.pyplot as plt

    245 import matplotlib

    246

    247 # 设置中文字体,确保中文显示正常

    248 matplotlib.rcParams['font.sans-serif'] = ['SimHei']

    249 matplotlib.rcParams['axes.unicode_minus'] = False

    250

    251 # 读取CSV文件并统计各个国籍的球员数量

    252 nationalities = {}

    253 with open('足球运动员.csv', mode='r', encoding='utf-8') as csv_file:

    254 csv_reader = csv.reader(csv_file)

    255 next(csv_reader) # 跳过标题行

    256 for row in csv_reader:

    257 nationality = row[2] # 假设国籍在CSV文件的第3列(索引为2)

    258 if nationality in nationalities:

    259 nationalities[nationality] += 1

    260 else:

    261 nationalities[nationality] = 1

    262

    263 # 获取国籍和对应的球员数量

    264 countries = list(nationalities.keys())

    265 player_counts = list(nationalities.values())

    266

    267 # 创建散点图

    268 plt.figure(figsize=(10, 6))

    269 plt.scatter(countries, player_counts, color='red', alpha=0.7)

    270 plt.title('球员各国籍散点图')

    271 plt.xlabel('国籍')

    272 plt.ylabel('球员数量')

    273 plt.xticks(rotation=45) # 旋转x轴标签,防止重叠

    274 plt.grid(True)

    275 plt.tight_layout()

    276 plt.show()

    277 import csv

    278 import matplotlib.pyplot as plt

    279

    280 # 读取CSV文件并统计不同身高范围内的球员数量

    281 height_ranges = {'150-160': 0, '161-170': 0, '171-180': 0, '181-190': 0, '191-200': 0, '200以上': 0}

    282 with open('足球运动员.csv', mode='r', encoding='utf-8') as csv_file:

    283 csv_reader = csv.reader(csv_file)

    284 next(csv_reader) # 跳过标题行

    285 for row in csv_reader:

    286 height = int(row[3]) # 假设身高在CSV文件的第4列(索引为3)

    287 if 150 <= height <= 160:

    288 height_ranges['150-160'] += 1

    289 elif 161 <= height <= 170:

    290 height_ranges['161-170'] += 1

    291 elif 171 <= height <= 180:

    292 height_ranges['171-180'] += 1

    293 elif 181 <= height <= 190:

    294 height_ranges['181-190'] += 1

    295 elif 191 <= height <= 200:

    296 height_ranges['191-200'] += 1

    297 else:

    298 height_ranges['200以上'] += 1

    299

    300 # 获取身高范围和对应的球员数量

    301 height_labels = list(height_ranges.keys())

    302 player_counts = list(height_ranges.values())

    303

    304 # 创建柱状图

    305 plt.figure(figsize=(10, 6))

    306 plt.bar(height_labels, player_counts, color='blue')

    307 plt.title('球员身高柱状图')

    308 plt.xlabel('身高范围')

    309 plt.ylabel('球员数量')

    310 plt.xticks(rotation=45) # 旋转x轴标签,防止重叠

    311 plt.tight_layout()

    312 plt.show()

    六、总结:

    中超足球联赛自诞生以来一直面临着诸多挑战和困难,这些问题导致了其在国际足球舞台上的落后。其中,影响最大的原因包括:1. 财政投入不足: 尽管中超俱乐部在引进外援和球员方面投入了大量资金,但整体对基础设施、青训体系以及联赛的长期发展投入不足。这导致了对足球整体发展的限制,与一些欧洲豪门相比,中国俱乐部在发展的长期规划和整体实力上仍有较大差距。2. 青训体系不完善:近年来有关注青训的努力,但中国足球的青训系统仍处于起步阶段。与一流足球国家相比,中国足球基层训练和青少年培养的体系和质量有较大差距。这使得在培养本土球星和提高整体水平方面存在难度。3.管理层面问题 :中超联赛中的俱乐部管理、赛事组织、裁判水平等方面存在一定问题。这些问题可能导致比赛质量和整体联赛形象下降,也可能影响球员的发展和态度。4. 国内外援政策的调整: 针对外援政策的不断调整也影响了中超联赛的整体水平。过度依赖外援导致国内球员发展受限,而频繁的政策变化可能给球队战术体系和球员建设带来困扰。虽然中超面临着诸多问题,但也不乏着改善和发展的希望。需要各方共同努力,提高青训水平,完善联赛管理体系,加大对足球基础设施的投入,并长期持续地推动足球发展,这样才能逐渐缩小与国际顶级联赛之间的差距。最后,在完成该项目后,我深刻明白自己的水平实属不行,很多代码功能不能按照自己的设想去实现,作品距离真正的作为分析数据的工具是远远不够的,我还需要继续努力学习。